Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

thebigk's avatar
Level 13

How to speed up this script to verify 90K ULRs for their HTTP status code

I've a list of 90K URLs that I need to verify for the http status code they return. I've prepared a script to do that; but it's slow. Would appreciate if someone can help me speed this up -

public function handle()
    {
        $counter = 0;
        DB::table('internal_links')->whereNull('status')->orderBy('id')->chunk(5, function($links, $counter) {
           
           foreach($links as $link) {
                stream_context_get_default([
                    'http' => ['method' => 'HEAD']
                ]);
                $status_code = @get_headers($link->href)[0];
                if(!$status_code) {
                    $status_code = 404;
                } else {
                    $status_code = substr($status_code, 9,3);
                }
                DB::table('internal_links')->where('href', $link->href)->update(['status' => $status_code]);
           }
        });
        return Command::SUCCESS;
    }

I wish to attempt this using the inbuilt Http client; but I don't know how to only make a HEAD request; and have make concurrent requests for URLs fetched from the database.

I look forward to your suggestions. Thank you!

0 likes
11 replies
tisuchi's avatar

@thebigk does it work for you?


public function handle()
{
    $client = new Client();
    DB::table('internal_links')->whereNull('status')->orderBy('id')->chunk(5, function($links, $counter) use ($client) {
        $urls = [];
        foreach($links as $link) {
            $urls[] = $link->href;
        }
        try {
            $responses = $client->request('HEAD', $urls, [
                'concurrency' => 10,
            ]);
        } catch (\Exception $e) {
            $this->error($e->getMessage());
            return Command::FAILURE;
        }

        foreach($responses as $response) {
            $status_code = $response->getStatusCode();
            DB::table('internal_links')->where('href', $response->getEffectiveUrl())->update(['status' => $status_code]);
        }
    });
    return Command::SUCCESS;
}

thebigk's avatar
Level 13

@tisuchi - thank you. It threw the following -

URI must be a string or UriInterface
tisuchi's avatar

@thebigk Yes, it should be.

Maybe you can what exactly you are passing by dd().


// Some code. 

 try {
        dd($urls);

            $responses = $client->request('HEAD', $urls, [
                'concurrency' => 10,
            ]);
        }
thebigk's avatar
Level 13

@tisuchi It's a valid array of the URLs (hrefs) - viz -

array:5 [ // app/Console/Commands/FastCrawler.php:40
  0 => "https://makemoneyonline.net.ph/"
  1 => "https://www.CrazyEngineers.com"
  2 => "https://superblog.crazyengineers.com"
  3 => "https://ranjanbox.blogspot.com"
  4 => "https://crazyengineers.com/forum/showthread.php?t=24"
]
thebigk's avatar
Level 13

@tisuchi Update - looks like we can't pass an array of urls as second parameter to the request method of the Guzzle Client. It works good with just 1 url passed as string.

tisuchi's avatar

@thebigk hm...

I changed my mind a bit. What if you follow this approach?


public function handle()
{
    // Initialize the Guzzle HTTP client
    $client = new GuzzleHttp\Client();

    // Set up the requests array
    $requests = [];

    // Get the URLs from the database
    $urls = DB::table('internal_links')->whereNull('status')->orderBy('id')->get();

    // Loop through the URLs and add a request for each URL to the requests array
    foreach ($urls as $url) {
        $requests[] = $client->requestAsync('HEAD', $url->href);
    }

    // Wait for all requests to complete
    GuzzleHttp\Promise\unwrap($requests);

    // Loop through the completed requests and update the status codes in the database
    foreach ($requests as $request) {
        $statusCode = $request->getStatusCode();
        DB::table('internal_links')->where('href', $request->getUri())->update(['status' => $statusCode]);
    }

    return Command::SUCCESS;
}

FYI, the GuzzleHttp\Promise\unwrap function to wait for all of the requests to complete before continuing with the script. It also loops through the completed requests and updates the status codes in the database for each URL.

⚠️ You will need to install the Guzzle HTTP client package by running composer require guzzlehttp/guzzle before you can use it in your Laravel project.

thebigk's avatar
Level 13

@tisuchi I'm attempting this; but don't you think we should chunk the DB? 90K URLs at once could go heavy on the system.

PS: Yes, I have Guzzle installed.

tisuchi's avatar

@thebigk Even you can you laravel HTTP Client also. Depends on your choice.

For example-


use Illuminate\Support\Facades\Http;

public function handle()
{
    $counter = 0;
    DB::table('internal_links')->whereNull('status')->orderBy('id')->chunk(5, function($links, $counter) {
        $urls = [];
        foreach($links as $link) {
            $urls[] = $link->href;
        }

        $responses = Http::get($urls);
        foreach($responses as $response) {
            $status_code = $response->status();
            $url = $response->effectiveUrl();
            DB::table('internal_links')->where('href', $url)->update(['status' => $status_code]);
        }
    });
    return Command::SUCCESS;
}

tisuchi's avatar

@thebigk

90K URLs at once could go heavy on the system.

Totally agree with you. 90k it a big amount of data. It's better to use chunk in a performant way.

thebigk's avatar
Level 13

@tisuchi - How do I send a 'head' request when using Laravel's Http client? I think it'd speed the whole thing.

tisuchi's avatar

@thebigk You can simply use head() method.

For example-

use Illuminate\Support\Facades\Http;

$response = Http::head($url);

$statusCode = $response->status();
1 like

Please or to participate in this conversation.