poma's avatar
Level 1

Poor whereHas performance

I want to apply a where condition to relation. Here's what I do:

Replay::whereHas('players', function ($query) {
    $query->where('battletag_name', 'test');
})->limit(100);

It generates the following query:

select * from `replays` 
where exists (
    select * from `players` 
    where `replays`.`id` = `players`.`replay_id` 
      and `battletag_name` = 'test') 
order by `id` asc 
limit 100;

Which executes in 70 seconds. If I manually rewrite query like this:

select * from `replays` 
where id in (
    select replay_id from `players` 
    where `battletag_name` = 'test') 
order by `id` asc 
limit 100;

It executes in 0.4 seconds. Why where exists is the default behavior if it's so slow? Is there a way to generate the correct where in query with query builder or do I need to inject raw SQL? Maybe I'm doing something wrong altogether?

replays table has 4M rows, players has 40M rows, all relevant columns are indexed, dataset doesn't fit into MySQL server memory.

0 likes
13 replies
poma's avatar
Level 1

found that the correct query can be generated as:

Replay::whereIn('id', function ($query) {
    $query->select('replay_id')->from('players')->where('battletag_name', 'test');
})->limit(100);

Still have a question why exists performs so poorly and why it is the default behavior

By the way ability to use a Closure in whereIn is not mentioned in docs

1 like
tisuchi's avatar

Honestly, I can't see any issue here-

Replay::whereHas('players', function ($query) {
    $query->where('battletag_name', 'test');
})->limit(100)->get();

Even its load without any issue in my side.

BTW, how do you echo in your view?

poma's avatar
Level 1

The issue is that this query takes 70 seconds to execute on my dataset

tisuchi's avatar

Can you show your view page?

How many data you have?

tisuchi's avatar

I missed that man...

40M!...

I don't know how can help you. However, I can suggest you to watch this series. It's all about optimization.

https://serversforhackers.com/laravel-perf

This series can be helpful for you. Specially check "Database Chunking" video.

2 likes
gregrobson's avatar

WHERE IN or EXISTSwill not perform well on large tables. You would be better served by using the query builder to generate a join:

SELECT
 *
-- Filter the * down to just the columns you require.
FROM `replays`
INNER JOIN `players` ON `replays`.`id` = `players`.`replay_id`
WHERE `players`.`battletag_name` = 'test'
ORDER BY `replays`.`id` ASC
LIMIT 100;

Joins are (typically) optimised better by database engines. On larger tables you'll be better off building queries yourself and returning collections.

You also mention that your database does not fit in memory, how much memory does the database engine have available? How big is the active data set (tables and indexes combined)?

poma's avatar
Level 1

I did a test with join and execution takes 400ms, exactly like with whereIn.

My data set takes 25Gb in database and instance memory is only 1.5Gb RAM. This is an open source project running on donations so a price for appropriate production grade DB server greatly exceeds total budget.

1 like
gregrobson's avatar

I see, perhaps the DB engine has weighed up the row count of the table and has might have optimised the query to a JOIN in the background.

The minimal RAM will definitely be a factor in query time, but as you say it's running on donations. I hope you can raise more donations and get a bit more RAM for your database.

seomike's avatar

I'm on Amazon RDS xxlarge with plenty of RAM. The issue is how Eloquent does its query for whereHas(). it does a full table scan on the model that calls it. Our table was 21.5 million rows, I could go pop popcorn, eat it and come back and the query would still be running...

$channel = Channel::find(1); // 21.5 million rows in table
$videos = $channel->whereHas('videos', function($query) {
        $query->where('status', 1); // << should only be 1300 videos back. :/
})->get();

If we ran EXPLAIN on the query that hangs

EXPLAIN select * from `channel` 
where exists (select * from `video` inner join `channel_video` on `video`.`id` = `channel_video`.`video_id` where `channel`.`id` = `channel_video`.`channel_id` and `status` = 1) 

We saw where it is doing a full table lookup on Channel.

So on the Videos model we just added an Anonymous scope builder.

class Channels extends Model
{

      public function videos()
      {
    return $this->belongsToMany(Videos::class, 'channel_video', 'channel_id', 'video_id');
       }

}
class Videos extends Model 
{
    protected static function boot()
    {
        parent::boot();

        static::addGlobalScope('live', function (Builder $builder) {
            $builder->where('status', 1); // Speed up the belongsToMany!!!
        });
    }
}
$channel = Channel::find(1); // 21.5 million rows in table
$videos = $channel->videos()->get(); // fast belongsToMany :D
1 like
brunobarros's avatar

I had the same issue. The fix was to add indexes to the columns used on where clauses.

atiadjaber's avatar

i had same problem and fix it using whereRaw like this:

$query->whereRaw('replays.id in (select replay_id from players where battletag_name LIKE ?)', ['%' . $request->name . '%'])

Please or to participate in this conversation.