Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

delocal's avatar

Query with million records very slow

I have a database of images stored in AWS. At the moment it has 12M records. I need to check if each image really exists in AWS, and if so, copy it to another server.

So far this is ok.

The problem is that for about 5M images one of the fields (u_id) is 0, these should be filtered out, they should not be included in the query.

This is the basic query:

    do {  
	    $pixes = Pix::query()  
	    ->where('adp_bucket', '=', 1)  
	    ->limit($limit)  
	    ->get();  
	      
	    foreach ($pixes as $pix) {  
    	    $advert = Advert::find($pix->ad_id);
    	    //do something with the pix...
	    }
   } while($pixes);

the $limit is 10000

As soon as I put in the condition that u_id != 0 the query becomes extremely slow, it takes several minutes to complete the first (!) transaction.

$pixes = Pix::query()
    ->where('adp_bucket', '=', 1)
    ->where('u_id', '!=', 0)
    ->limit($limit)
    ->get();

note: the u_id field is not boolean, it is an id.

how to optimize this query?

Thank you for your help!

Cheers, Daniel

0 likes
7 replies
LaryAI's avatar
LaryAI
AI
Best Answer
Level 58

To optimize the query, you can try the following approaches:

  1. Indexing: Make sure that the adp_bucket and u_id columns are indexed in the database. This will speed up the search process.

  2. Chunking: Instead of retrieving all the records at once, you can use the chunk method to retrieve the records in smaller chunks. This will reduce memory usage and improve performance. Here's an example:

Pix::query()
    ->where('adp_bucket', '=', 1)
    ->where('u_id', '!=', 0)
    ->chunk(1000, function ($pixes) {
        foreach ($pixes as $pix) {
            $advert = Advert::find($pix->ad_id);
            // do something with the pix...
        }
    });

In this example, the chunk method retrieves 1000 records at a time and processes them in the callback function.

  1. Eager loading: If you need to access the Advert model for each Pix record, you can use eager loading to reduce the number of queries. Modify your query like this:
Pix::with('advert')
    ->where('adp_bucket', '=', 1)
    ->where('u_id', '!=', 0)
    ->chunk(1000, function ($pixes) {
        foreach ($pixes as $pix) {
            $advert = $pix->advert;
            // do something with the pix and advert...
        }
    });

Make sure you have defined the relationship between Pix and Advert models using the belongsTo method.

These optimizations should help improve the performance of your query.

1 like
s4muel's avatar

silly question, but does u_id column have an index?

krisi_gjika's avatar

try adding an index to your column:

public function up()
{
	Schema::table('pix', function (Blueprint $table) {
		$table->index('u_id');
	});
}

public function down()
{
		Schema::table('pix', function (Blueprint $table) {
			$table->dropIndex('u_id');
		});
}
krisi_gjika's avatar

@delocal measure how much time each query is taking. Also I don't see you applying an offset to your query?!

Please or to participate in this conversation.