Using Jobs and Queues to Export CSVs

I've wracked my brain about as much as possible on this one, and am still struggling! I have 2 tables of data that need to be exported with my applicaton: one is ~26k records, the other is ~60k records. I was first trying to handling this on the frontend using typical request/responses, even trying to use a streamed response as described in this artcle: https://medium.com/@barryvdh/streaming-large-csv-files-with-laravel-chunked-queries-4158e484a5a2

However, these approaches always lead to out of memory errors. I finally decided to try using a Job and Queue worker to process the export. While this is "working," it takes forever. The larger table takes about 6 minutes to finish. What other packages or services can help reduce the amount of time it takes to perform the export? Here is my code:

class ExportServiceHistory
{
    public function fire($job, $data)
    {

        // Generate our file name
    $filename="claims_export_" . date("YmdHis") . time() . ".csv";

    // Set the file path to the temporary location
        $handler = temp_path('export') . $filename;
        
        // Open file handler for writing output
        $file = fopen($handler, 'w');

        /*
         * Add headers
         */
         $headers = [
            'type',
            'plan',
            'component_name',
            'claim_id',
            'service_date',
            'odometer',
            'service_name',
            'vin',
            'year',
            'make',
            'model',
            'claim_date',
            'status',
            'payout',
            'distributorship'
        ];
        fputcsv($file, $headers);

        ServiceHistory::chunk(200, function ($records) use ($headers, $file) {
            foreach ($records as $record) {
                $row = [];
                foreach ($headers as $column) {   
                    $value = $record->{$column};
                    $row[] = $value;
                }
                fputcsv($file, $row);
            }
        });

        fclose($file);

        \Storage::disk('gcs')->put($handler, $file);

        $job->delete();
    }

    public function failed($data)
    {
        // Called when the job is failing...
    }
}

Also, as a note, the application is using Laravel 5.5 and cannot be upgraded at this time.

Any feedback is greatly appreciated, thanks!

Snapey

6 years ago

Best Answer

Level 122

Why not just get just the fields you want and write the data straight to the csv?

ServiceHistory::select($headers)
    ->get()
    ->each(function($r) {
        fputcsv($file,$r->toArray());
    });

You should not see the slowdown you are seeing unless this model (ServiceHistory) is doing loads of work pulling in data from other tables in the background?

BGWeb

6 years ago

Level 7

Hey @snapey, thanks for the reply! With this many records, I'm almost certain I need to use chunk(). I will try the toArray() method to see if this helps.

I agree that the performance should not be suffering like it is. I have tried also using raw Db queries, which does not help either. The ServiceHistory model is very basic, and references a MySQL view which composes the necessary data.

I also tried using PHPLeage's CsvWriter and the SplTempFileObject class, which does not seem to be helping either.

As an example, I have limited the chunk to 100 records and it is still painstakingly slow:

ServiceHistory::limit(100)->chunk(10, function ($records) use ($headers, $csv) {
            foreach ($records as $record) {
                $row = [];
                foreach ($headers as $column) {   
                    $value = $record->{$column};
                    $row[] = $value;
                }
                $csv->insertOne($row);
            }
        });

I'll follow with an update for your suggestion soon!

Snapey

6 years ago

Level 122

still iterating needlessly over every column, creating redundant $value object each time ?

BGWeb

6 years ago

Level 7

@snapey Thanks much! I believe I have something working now :) I did end up calling toArray() rather than the nested loop, as you suggested. The queue and job are working well also, and it takes ~2 min to process 60k records.

/*
  * Prepare CSV
  */
        ServiceHistory::select($headers)
            ->orderBy('claim_date', 'desc')
            ->chunk(1000, function ($records) use ($file) {
                foreach ($records as $record) {
                    fputcsv($file, $record->toArray());
                }
            });

I also found that bumping the chunk size up helps, because it reduces the number of queries required. I suppose the balance is using a chunk size that is not too big, or too small.

Thanks again for your input!

Please or to participate in this conversation.