Filter them in the database instead, it is much faster, and that is what the database is for.
Laravel slow collection filtering
I've discovered that filtering large Laravel collections is inherently slow. You would think that because the operation occurs in memory that it would be faster than querying the database, but that is not the case. I have around 60,000 items in a collection, loaded from the database before a foreach loop. Each iteration of the foreach loop needs to filter by a "product id". I've found discussions on how you can use keyBy() and get() instead of where() to speed up performance
Reference:
https://marcus-obst.de/blog/slow-where-query-in-laravel-collections
https://laracasts.com/discuss/channels/laravel/working-on-collections-is-so-slow
However, by 60,000 items has duplicate keys for the filter condition I want. According to the Laravel documentation, "If multiple items have the same key, only the last one will appear in the new collection".
So I'm lost as to how to efficiently return a smaller collection of filtered results from a bigger collection in an efficient way. Right now it takes about 2 seconds (And there are 20,000 different keys to iterate through, so you can how this becomes a very slow operation)
Pseudo code:
Standard way
$productIds = $this->repository->getProductIds(); // 20,000 results
$productInventory = $this->repository->getInventoryForProducts($productIds->toArray()); // 60,000 results
$productIds->each(function (int $productId) use ($productInventory) {
$productInventoryForProduct = $productInventory->where('product_id', $productId); // ~2 seconds
}
Using keyId()
$productIds = $this->repository->getProductIds(); // 20,000 results
$productInventory = $this->repository->getInventoryForProducts($productIds->toArray())->keyBy('product_id'); // 60,000 results
$products->each(function (int $productId) use ($productInventory) {
$productInventoryForProduct = $productInventory->get($productId); // fast... but only returns one record
// Processing of $productInventoryForProduct
}
As @snapey said, it is not a good idea to load 20,000 records in memory to process it. But locally, it didn't take two seconds iterating over all the collection using Collection@groupBy:
<?php
use App\Models\Product;
use App\Models\ProductInventory;
use Illuminate\Support\Facades\Artisan;
Artisan::command('test', function () {
$this->info(now()->toDateTimeString());
$productIds = Product::query()->pluck('id');
$productInventory = ProductInventory::query()
->whereIn('product_id', $productIds)
->get()
->groupBy('product_id');
foreach ($productIds as $productId) {
$productInventoryForProduct = $productInventory[$productId] ?? collect([]);
$this->info(
vsprintf('[%s] %05d: [%s]', [
now()->toDateTimeString(),
$productId,
$productInventoryForProduct->pluck('id')->implode(','),
]),
);
}
$this->info(now()->toDateTimeString());
});
Output:
$ php artisan test
2022-12-03 18:52:46
[2022-12-03 18:52:46] 00001: [1,2,3]
[2022-12-03 18:52:46] 00002: [4,5,6]
[2022-12-03 18:52:46] 00003: [7,8,9]
[2022-12-03 18:52:46] 00004: [10,11,12]
[2022-12-03 18:52:46] 00005: [13,14,15]
[2022-12-03 18:52:46] 00006: [16,17,18]
[2022-12-03 18:52:46] 00007: [19,20,21]
[2022-12-03 18:52:46] 00008: [22,23,24]
[2022-12-03 18:52:46] 00009: [25,26,27]
[2022-12-03 18:52:46] 00010: [28,29,30]
... truncated for brevity
[2022-12-03 18:52:47] 19991: [59971,59972,59973]
[2022-12-03 18:52:47] 19992: [59974,59975,59976]
[2022-12-03 18:52:47] 19993: [59977,59978,59979]
[2022-12-03 18:52:47] 19994: [59980,59981,59982]
[2022-12-03 18:52:47] 19995: [59983,59984,59985]
[2022-12-03 18:52:47] 19996: [59986,59987,59988]
[2022-12-03 18:52:47] 19997: [59989,59990,59991]
[2022-12-03 18:52:47] 19998: [59992,59993,59994]
[2022-12-03 18:52:47] 19999: [59995,59996,59997]
[2022-12-03 18:52:47] 20000: [59998,59999,60000]
2022-12-03 18:52:47
For reference, I created a dummy project with a products and a product_inventories tables and seeded them with:
<?php
namespace Database\Seeders;
use App\Models\Product;
use App\Models\ProductInventory;
use Illuminate\Database\Seeder;
class DatabaseSeeder extends Seeder
{
public function run()
{
Product::factory(20_000)
->has(ProductInventory::factory()->count(3), 'inventory')
->create();
}
}
Please or to participate in this conversation.