Feb 26, 2022

Level 4

Bad performance with doesntHave query

For simplicity, I'm gonna use different table names than my current project.

I have tables t1, t2 with many to many relationships and pivot table t1_t2 which have t1_id and t2_id columns, and at the beginning, I start using doesntHave query like this.

T1::doesntHave('t2')->get();

After a while, the tables become bigger and bigger with millions of records and the query took more than a minute, then I used joins instead to get the same results with better performance, like this.

T1::select(['t1.id','t1.column_name'])
		->leftJoin('t1_t2', 't1.id', '=', 't1_t2.t1_id')
        ->whereNull('t1_t2.t1_id')
        ->limit(10)
        ->get();

Now, t1 has 5M records and t2 has 100K and t1_t2 has 11M and the join query takes from 7 to 10 seconds and we expect more records.

Are there any other ways I can do to get records from t1 which doesn't have a relationship with t2 with better performance?

I think I missing something cause for sure MySQL performance is better than that.

The only thing I get in my mind, for now, is making a new column "has_relashion_with_t2" with boolean values and default value 0 in t1 table and automatically converting the value to 1 when attaching t1 to t2 to avoid join query, but I don't like this method which will lead to redundancy in data and increase bugs.

Any help?

MohamedTammam

4 years ago

Level 51

Your solution will lead to redundancy of course, but if you prefer the speed over that, it's okay to go with that solution.

For your current situation, make sure that you are indexing the columns that you are using.

moelkomy

4 years ago

Level 4

@MohamedTammam I put this into my consideration, but I'm waiting for a better solution.

tjanuki

4 years ago

Level 42

Hi @moelkomy , have you consider adding an index on the t1_t2 table?

When I need to improve SQL performance, I use the explain keyword to check the SQL results and then add some indexes on the table.

In this case, adding new index on t1_t2 table might improve the performance.

    Schema::table('t1_t2', function (Blueprint $table) {
        $table->index('t1_id');
    });

I did some googling and found an article that might help. https://deliciousbrains.com/optimizing-laravel-database-indexing-performance/

moelkomy

4 years ago

Level 4

@tjanuki I already have foreign keys in my pivot table like this.

Schema::create('t1_t2', function (Blueprint $table) {
       $table->foreignId('t1')->unsigned()->constrained()->onDelete('cascade');
       $table->foreignId('t2')->unsigned()->constrained()->onDelete('cascade');
});

And I don't think it's an indexing problem, when I use whereNotNull instead of whereNull it takes about 0.001 seconds, so the next query is so much faster.

T1::select(['t1.id,'t1.column_name''])
		->leftJoin('t1_t2', 't1.id', '=', 't1_t2.t1_id')
        ->whereNotNull('t1_t2.t1_id')
        ->limit(10)
        ->get();

What I notice when I used explain to the two joins is that the one with whereNull doesn't use t1 primary key whereas the one with whereNotNull actually uses t1 primary key.

Perhaps that happens because null values didn't index in t1 ids.

I mean if I replace - I tested this -

->whereNull('t1_t2.t1_id')

With something like this

->where('t1.column_name', 'text')

The query will be fast as well.

theProfit

4 years ago

Level 7

@moelkomy and if you do it like

Where(“name”,null)

I have had this problem also a time ago whereNotNull is just a function if you write it youre self and it is faster then checkout whereNotNull what is it doing? I am not on a laptop but it is worth investing.

Tray2

4 years ago

Level 73

@moelkomy The t1_t2.t1_id should never be null so there is no reason to do that check. If it by any chance is null you have a major flaw in your database.

1 like

theProfit

4 years ago

Level 7

@Tray2 mist that part you are sharp and awaker 😁💻

moelkomy

4 years ago

Level 4

@tray2 t1_t2 table is only a pivot table that has t1_id and t2_id columns, no primary key or timestamp in it.

Another example will be a hospital database with doctors and patients tables with doctor_patient as a pivot table. doctor_patient have doctor_id, patient_id and appoinment_date.

Assume a patient who wants to book an appointment with a doctor tomorrow at a specific time, so we will search for doctors who don't have patients tomorrow at this time. and you will find us using doesntHave method or join method - with a complex syntax of course - as I mentioned in the post.

Do I miss anything or is there's a better structure than that?

moelkomy

4 years ago

Level 4

@theProfit I tried to use where instead of whereNull but it doesn't return any records at all.

tjanuki

4 years ago

Level 42

@moelkomy I've read your response and your original post several times and fully understand your situation. I agree with @roksprogar 's approach, but it seems to be the same performance as your original post.

In this situation, I would try an approach like yours, adding a column, or add default relationship in the t2 table and removing it when another relation is added, but not what you're looking for as it requires additional logic and and might cause some bugs.

I wish someone knew of another way.

Tray2

4 years ago

Level 73

@moelkomy no fields in a pivot should ever be null so you don't have to check for it. Each field in the pivot is a foreign key that references a field in the source tables.

If you need to see which doctor that doesn't have a patient you use either a left or a right join.

roksprogar

4 years ago

Level 2

DB::table('t1')->whereNotIn('id', function($q){
    $q->select('t1_t2.t1_id')->from('t1_t2');
})->get();

moelkomy

4 years ago

Level 4

@roksprogar This query takes the same time as my join example, thanks for it I will use it if I didn't find any other solution.

Please read my replay to @tjanuki I added more information about my example.

Please or to participate in this conversation.