Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

Nahbyr's avatar

Factory seeding large amount of records

I'm currently testing our database structure and select queries with more realistic record amounts. I'm very new to Laravel, so I might not do things as optimal as possible.

Situation We have 1 table (candidates) that will contain 15K records easily. Each of these records will have 2 pivot tables connected to other (smaller) tables (connectors and skills). In total about 795K records will be created to make the correct links and data realistic.

Problem I've run into a problem when trying to create the 15K records with an ->each(function()) on each creation. I run into memory problems, even though I'm saving the pivot table records for each user before moving on to the next ->each().

What's the flow for seeding? Does the Laravel Seeder save each user first and then loop through them all and execute what's in the anon-function? Or does the Laravel Seeder save a user and then execute the anon-function before moving on to the next user?

I've added in a progressbar for testing purposes (and up it 1 step whenever I've created a test record for one of the pivot tables) and it's not moving once I start the seeder. For other table seeders it finishes too fast to follow so I'm confused why it's not counting as soon as I start the command.

Code

public function run()
    {
      $output = new Symfony\Component\Console\Output\ConsoleOutput();
      $output->writeln("----------------> Candidates <----------------");
      $this->command->getOutput()->progressStart(795000);
      
      factory(App\Candidate::class, 15000)->create()->each(function ($candidate) {
        $faker = Faker::create();
        
        //Init insert bulk
        $connectorBulk = array();
        $skillBulk = array();
        
        for ($i = 0; $i < 3; $i++) {          
          $array = [
            'candidate_id' => $candidate->id,
            'connector_id' => $connectorUnique, //I get this outside of the anon-function. This has been tested and works.
            'created_at' => Carbon\Carbon::now(),
            'updated_at' => Carbon\Carbon::now(),
          ];
          $connectorBulk[] = $array;
          $this->command->getOutput()->progressAdvance();
        }
          DB::table('candidate_connector')->insert($connectorBulk);
        
        for ($i = 0; $i < 50; $i++) {
          //Uniques
          $skillUnique = $this->unique(App\Skill::inRandomOrder(), $skillArray, 'Skill');
          
          $array = [
            'candidate_id' => $candidate->id,
            'skill_id' => $skillUnique, //I get this outside of the anon-function. This has been tested and works.
            'type_value_id' => App\TypeValue::all()->random()->id,
            'related_id' => $faker->numberBetween(1, 15),
            'created_at' => Carbon\Carbon::now(),
            'updated_at' => Carbon\Carbon::now(),
          ];
          $skillBulk[] = $array;
          $this->command->getOutput()->progressAdvance();
        }
          DB::table('candidate_skill')->insert($skillBulk);
      });
      $this->command->getOutput()->progressFinish();
    }

If I run a smaller number (tested up to 12.5K) of Candidates to create it takes a while for the progressbar to start counting up, but it does eventually, which leads me to think that it creates the Candidates before looping through them. Seems like a memory hog to me if you want to seed large amounts of data.

Please correct my assumptions if they're wrong. Also, if there are easier ways to populate pivot tables let me know. I have a few of them and couldn't find a decent way other than DB::table()->insert().

0 likes
1 reply
Nahbyr's avatar

Update: I've checked the database and it looks like it creates the 15K records first and then loops over them. I guess I could have figured that out earlier. Create() is the method called first and each() is executed on the response from that.

Again, this sounds like a memory hog to me if working with large seeders. Is there a way to execute the anon-function in the each on every record created before moving on to saving the next one?

Possible options I can come up with:

  • Split the no of Candidates into lower numbers and rerun the seed command x times.
  • Add the anon-function to the factory for the Candidates. (This will have to be executed after the Candidates record is created because in the anon-function I use the id for the newly created record)

Are there other options?

Please or to participate in this conversation.