Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

nufje's avatar
Level 3

Bulk insert Many to Many

Hello,

FIrst post :-) I have an imported_products table, a categories table and a category_imported_product (with category and product_id) table.

I have a collection with 40.000 records with the following simplified structure

[category_id_1] -> multiple products

[category_id_2] -> multiple products

[category_id_3] -> multiple products etc.

I want to insert the collection into my imported products table. I have following code, but it is so slow. I think there is a much better way, but i can't find out. I use firstOrCreate, because of duplicate records in the collection.

foreach ($AllProductsFromDb as $categoryId => $importedProducts) {
            foreach ($importedProducts as $importedProduct) {
                $insertData = $importedProduct->toArray();

                DB::transaction(function () use($insertData, $categoryId) {
                    $importedProduct = ImportedProduct::firstOrCreate(
                        ['product_id' => $insertData['product_id']],
                        $insertData
                    );

                    $category = Category::find($categoryId)->first();

                    $importedProduct->categories()->save($category);
                });
            }
        }

My first solution was a bulk insert, this is fast, but then i cannot insert the many to many relation for the category. Hopefully everything is clear...

Is there a better way? Thanks in advance :-)

0 likes
13 replies
bugsysha's avatar

Where did you define the constraints regarding uniqueness of the data? PHP or database?

nufje's avatar
Level 3

I have an unique index on the database

bugsysha's avatar
bugsysha
Best Answer
Level 61

Then you can cut down number of queries in half by replacing firstOrCreate with create. Also no need to find category when you can achieve everything with $categoryId. Use some of the following to create pivot records

$importedProduct->categories()->sync([$categoryId]);

$importedProduct->categories()->syncWithoutDetaching([$categoryId]);

$importedProduct->categories()->toggle([$categoryId]);

Give feedback once you try it. Maybe there is more. What is your current runtime for this?

nufje's avatar
Level 3

Thank you for all your suggestions. When i use the create method instead of firstCreate. I get following error:

Integrity constraint violation: 1062 Duplicate entry '9200000075410071' for key 'imported_products_product_id_unique'

because of the unique index :-) Did i something wrong with the index?

bugsysha's avatar

Point was that if you have your uniqueness covered by the database and everything wrapped in a transaction then you should not care about it failing since it can not screw up data integrity.

Of course, you need to wrap it with try-catch.

nufje's avatar
Level 3

Ok clear. I did some measurements:

i have changed the code to:

foreach ($AllProductsFromDb as $categoryId => $importedProducts) {
      foreach ($importedProducts as $importedProduct) {
                $insertData = $importedProduct->toArray();
                $insertData['feed_id'] = $feed->id;
                DB::transaction(function () use($insertData, $categoryId) {
                    try {
                        $importedProduct = ImportedProduct::create($insertData);
                        $importedProduct->categories()->syncWithoutDetaching([$categoryId]);
                    } catch(Exception $e) {

                    }
                });
            }
        }

And this code has a runtime of 30 min. With FirstOrCreate the runtime is also 30 min.

bugsysha's avatar

Try following

Reduce it to 10.000 records, then time it. Also, reduce it to 1.000 records and then time it.

Maybe it is just due to the speed of the database. If the time it takes to finish is increasing linearly then there is not much to do. But if it is quadratic/exponential then it can be fixed.

nufje's avatar
Level 3

Thank you for your suggestion. Times are linearly :-( So i have to accept the long times.

Thank you for helping!

bugsysha's avatar

No problem. Sorry that you are not satisfied.

Is that on your computer or on a server? 30 minutes for 40k seems very long to me.

You can enable a query log and check how many queries have been executed and which queries. Maybe something is hidden.

nufje's avatar
Level 3

Yes for me it seems also very long. It is on my local machine (no SSD drive, but fast processor and a lot of memory), i have nginx and mysql via Lando - default config (https://lando.dev/). I was thinking of another solution to rearrange the data before loading into the database. I was thinking of following solution:

Delete the many to many relation, and add a category_ids field (with comma separated values for categories) to imported_products

then map and merge data

[category_id_1][product_A]
[category_id_1][product_B]
[category_id_1][product_C]
[category_id_2][product_A]
[category_id_2][product_C]
[category_id_2][product_D]
[category_id_3][product_B]
[category_id_3][product_E]
etc

to

[PRODUCT_A] -> with categorie_ids 1,2
[PRODUCT_B] -> with categorie_ids 1,3
[PRODUCT_C] -> with categorie_ids 1,2
[PRODUCT_D] -> with categorie_ids 2
[PRODUCT_E] -> with categorie_ids 3
etc

But i'am not very good with arrays/collections and don't know how to do this :-) Maybe you know how to do it....

When i have this data then i can do a bulk insert and that is fast :-)

In the meantime, i will check what the execution time is on my server.

bugsysha's avatar

no SSD drive

It has to be due to HDD.

Delete the many to many relation

Don't. You can have many complications later on with data manipulation or places where bugs can sneak into.

But if you decide not to listen and go down that route, here you go

$a = [
    'category_id_1' => [
            'product_A',
            'product_B',
            'product_C',
        ],
    'category_id_2' => [
        'product_A',
        'product_C',
        'product_D',
    ],
    'category_id_3' => [
        'product_B',
        'product_E',
    ],
];

foreach($a as $index => $item) {
  foreach ($item as $value) {
    $r[$value][] = (int) str_replace('category_id_', '', $index);
  }
}

var_export($r);
nufje's avatar
Level 3

It was indeed the harddrive. On my server the running time is 9 minutes :-)

Thank you for all your time and i learned a few new things too :-)

bugsysha's avatar

You are very welcome. Your initial idea was not far off. Just keep analysing the issue and you will get there.

Please or to participate in this conversation.