Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

nolannordlund's avatar

Slow database seeder in Vapor

I have a fairly simple seeder which reads data from a csv and makes inserts using firstOrCreate() but I'm having issues with it when trying to seed my database in Vapor. Locally, everything works great. It does take about 15 seconds to seed the data from the csv. Running the same seeder in Vapor however causes a timeout. The last time I tried it, it took about 3 minutes to insert 325 records then it just stopped. Any ideas?

0 likes
2 replies
LaryAI's avatar
Level 58

When dealing with slow database seeders, especially in serverless environments like AWS Lambda (which Vapor is built on top of), you need to consider the constraints of the environment and optimize your database interactions. Here's a solution that should help you speed up your seeder in Vapor:

  1. Batch Inserts: Instead of using firstOrCreate() for each record, which results in a lot of individual database queries, try to batch the inserts. You can collect all the records and insert them in one go, or in smaller batches if the dataset is very large.

  2. Disable Query Logging: If you're using Laravel, by default, it keeps a log of all queries that have been run for the current request. This can consume memory and slow down your seeding process. You can disable this in your seeder by adding DB::disableQueryLog().

  3. Increase Lambda Timeout: AWS Lambda functions have a default timeout of 3 seconds, which can be increased up to 15 minutes. Make sure your Lambda function timeout is set high enough to allow your seeder to complete.

  4. Optimize CSV Reading: Make sure the way you're reading the CSV is efficient. If you're reading the entire file into memory and then looping through it, consider streaming the file line by line instead.

Here's an example of how you might batch insert records in Laravel:

use Illuminate\Support\Facades\DB;
use App\Models\YourModel;

// Disable query log for performance
DB::disableQueryLog();

// Read the CSV file
$csvFile = fopen('path_to_your_csv.csv', 'r');

// Skip the header row if necessary
fgetcsv($csvFile);

// Prepare an array to hold the records
$records = [];

while (($row = fgetcsv($csvFile)) !== FALSE) {
    // Create an associative array from the CSV row (adjust the keys to match your CSV)
    $record = [
        'column1' => $row[0],
        'column2' => $row[1],
        // ...
    ];

    // Add the record to the batch
    $records[] = $record;

    // Insert in batches of 100, for example
    if (count($records) === 100) {
        YourModel::insert($records);
        $records = []; // Reset the batch
    }
}

// Insert any remaining records
if (!empty($records)) {
    YourModel::insert($records);
}

fclose($csvFile);

Remember to replace 'path_to_your_csv.csv', YourModel, and the associative array keys with the actual values relevant to your application.

By batching the inserts and optimizing the CSV reading process, you should see a significant improvement in the performance of your seeder in Vapor. If you're still experiencing timeouts, consider increasing the Lambda function's timeout setting.

nolannordlund's avatar

Here is the code for my seeder:

    public function run(): void
    {
        $csvFile = file(database_path('datasets/lz-county-zips.csv'));
        $countyData = collect();
        foreach ($csvFile as $line) {
            $countyData->push(str_getcsv($line));
        }

        $countyData->each(function ($line) {
            $state = State::where('name', $line[0])->first();

            $county = $state->counties()->create([
                'name' => $line[1],
            ]);

            $codes = collect(explode(',', $line[2]))->filter();
            foreach ($codes as $code) {
                $county->postalCodes()->firstOrCreate(['code' => $code]);
            }
        });
    }

Please or to participate in this conversation.