I just ran across what looks like a subtle bug in how Eloquent transforms a query into its SQL equivalent. My application uses Laravel 8 (we haven't upgraded for reasons outside my control), so perhaps this is fixed in a newer version, but I'm stuck with what I have.
This app tracks inventory (products) and each product has a list of "flags" associated with it in a one-to-many relationship. We get incoming data feeds from multiple sources every day that require updating our own product records. A daily cron job is supposed to check products for any such updates in certain fields, by making a list of products that have any of the relevant flags and then updating them using the new feed data. This is the code for the query that retrieves the products for checking:
$products = Product::whereHas('flags', function ($query) use ($messages) {
foreach ($messages as $message) {
$query->orWhere(function ($query) use ($message) {
$query->where('message', 'LIKE', $message);
$query->where('is_valid', 1);
});
}
})->where('parent_id', '!=', 0)->get();
This query looks like it should retrieve "all products that have a non-empty parent id and there exists a flag that matches the product id AND the flag is valid and its message matches any of the listed ones." The whereHas() condition limits it to flags that match any of the products found by the outer query, which is "all products that have a parent id". This will return a strictly limited subset of products. But some server load issues led me to take a close look at it, and this is the query that MariaDB is actually executing:
select * from `products` where exists
(select * from `flags` where `products`.`id` = `flags`.`product_id` or
((`message` LIKE 'FLAG: [SELF] The attribute \'%\' has changed.' and `is_valid` = 1)
or (`message` LIKE 'FLAG: [SELF] The attribute \'%\' is new.' and `is_valid` = 1)
or (`message` LIKE 'FLAG: The attribute \'%\' has changed.' and `is_valid` = 1) or
(`message` LIKE 'FLAG: This product has been reassigned from master%' and `is_valid` = 1) or
(`message` LIKE 'FLAG: The attribute \'image_large\' is new%' and `is_valid` = 1) or
(`message` LIKE 'FLAG: The attribute \'image_small\' is new%' and `is_valid` = 1)))
and `parent_id` != 0 and `products`.`deleted_at` is null
In English, I read this as "all products that have a non-empty parent id and there exists a flag such that its product id matches the product id OR the flag is valid and its message matches any of the listed ones." That subquery will be true for every product, which means it's checking the whole flags table for every matching product, and the only effective limit on the result set is the parent_id condition. Turns out that's a real cpu-killer when you've got more than half a million such records in the Products table, and several million valid flags in the Flags table.
Why are the two conditions on the 'flags' subquery - (product id matches) and (the "matches message" criteria) - joined by an OR instead of an AND, as I would expect from the Eloquent query code? Can I quickly rewrite this query to do what I want it to, or do I have to resort to something like getting the list of (parent id is not empty) products and then checking those for any of the target flags with a foreach loop?