Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

anibabbar's avatar

Laravel Jobs with 2.5 Million data taking 3 Days to complete. Need help

Hi everyone,

I have 90k retailers in retailers table.

I have 2.5 million retailer orders in retailer_order table.

While preparing some complex reports we were getting timeout errors. those reports involved another table with again over 2 million records.

Then, I thought of preparing a table to store values matching my report.

Then, I created below job.

Now my job is working but its slow and creating 20 records per minute thus taking close to 3 days to complete the job.

Can i improve the timing or follow any better approach. I am sure there would be better approaches to handle this situation.

 public function handle()
    {
        // Determine the year range based on the month
       
        if ($this->month <= 3) {
            $year = ($this->year - 1) . '-' . $this->year;
        } else {
            $year = $this->year . '-' . ($this->year + 1);
        }

        // Calculate the sum of order values for the retailer for the specified month and year
        $totalOrderValue = reatailerorder::where('retailer_id', $this->retailerId)
            ->whereYear('created_at', $this->year)
            ->whereMonth('created_at', $this->month)
            ->sum('orderValue');

        // Convert month number to month name
        $monthName = date('F', mktime(0, 0, 0, $this->month, 10));
     
        $monthNameinCol = strtolower($monthName);
       
        pobSale::updateOrCreate(
            [
                'retailer_id' => $this->retailerId,
                'year' => $year,
            ],
            [$monthNameinCol => $totalOrderValue]
        );
    }
0 likes
9 replies
jlrdw's avatar

Have you done a toSql() to see the actual query.

Try a regular query using straight PDO outside of laravel and compare the difference.

Is that all the code or part of a larger foreach loop. I have reports take 2 to 4 minutes, but not that long on something like a SUM.

Tray2's avatar

A wild guess here, you don't have any indexes on any of those columns. I would probably use virtual columns for month and year, that way I wouldn't need to convert created_at to month and year for every record, and of course index those virtual columns.

However, the reports and the millions of records will most likely just work fine with the right indexes, so there is really no need to come up with a temporary intermittent storage of the data.

anibabbar's avatar

@Tray2 Sharing Indexes below

+-----------------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table           | Non_unique | Key_name        | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| reatailerorders |          0 | PRIMARY         |            1 | id          | A         |     2321485 |     NULL | NULL   |      | BTREE      |         |               |
| reatailerorders |          1 | idx_created_at  |            1 | created_at  | A         |     2321485 |     NULL | NULL   | YES  | BTREE      |         |               |
| reatailerorders |          1 | idx_retailer_id |            1 | retailer_id | A         |      193457 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

I tried this query, its taking almost 3 minutes

SELECT r.id,
       r.name,
       SUM(ro.orderValue) AS OrderValue,
       YEAR(ro.created_at) AS `Year`,
       SUM(CASE WHEN MONTH(ro.created_at) = 1 THEN ro.orderValue ELSE 0 END) AS `Jan`,
       SUM(CASE WHEN MONTH(ro.created_at) = 2 THEN ro.orderValue ELSE 0 END) AS `Feb`,
       SUM(CASE WHEN MONTH(ro.created_at) = 3 THEN ro.orderValue ELSE 0 END) AS `Mar`,
       SUM(CASE WHEN MONTH(ro.created_at) = 4 THEN ro.orderValue ELSE 0 END) AS `Apr`,
       SUM(CASE WHEN MONTH(ro.created_at) = 5 THEN ro.orderValue ELSE 0 END) AS `May`,
       SUM(CASE WHEN MONTH(ro.created_at) = 6 THEN ro.orderValue ELSE 0 END) AS `Jun`,
       SUM(CASE WHEN MONTH(ro.created_at) = 7 THEN ro.orderValue ELSE 0 END) AS `Jul`,
       SUM(CASE WHEN MONTH(ro.created_at) = 8 THEN ro.orderValue ELSE 0 END) AS `Aug`,
       SUM(CASE WHEN MONTH(ro.created_at) = 9 THEN ro.orderValue ELSE 0 END) AS `Sep`,
       SUM(CASE WHEN MONTH(ro.created_at) = 10 THEN ro.orderValue ELSE 0 END) AS `Oct`,
       SUM(CASE WHEN MONTH(ro.created_at) = 11 THEN ro.orderValue ELSE 0 END) AS `Nov`,
       SUM(CASE WHEN MONTH(ro.created_at) = 12 THEN ro.orderValue ELSE 0 END) AS `Dec`
FROM reatailerorders ro
JOIN retailers r ON ro.retailer_id = r.id
WHERE ro.created_at BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY r.id, r.name, `Year`
ORDER BY r.id, `Year`;
rodrigo.pedra's avatar

@anibabbar 3 minutes to aggregate 90K retailer IDs, over 2.5 million records, is not actually bad.

See the composite index tip I sent in my other response. That would help bring it down.

Tray2's avatar

@anibabbar That is a very poorly written query, you should make it a row per year and month instead of a column for each month. Something like

SELECT id, name, TO_CHAR(created_at, 'YYYY-MM') month, SUM(order_value) 
FROM orders 
GROUP by id, name, TO_CHAR(created_at, 'YYYY-MM');
rodrigo.pedra's avatar

Questions:

  1. How are you dispatching your jobs?
  2. How many jobs are there?

Also, when using ->whereMont() and ->whereYear() you end up not using the created_at index at all.

Try using a ->whereBetween() for the created_at column, to use that index:

$reference = CarbonImmutable::create($this->year, $this->month, 1);

// Calculate the sum of order values for the retailer for the specified month and year
$totalOrderValue = retailerorder::query()
    ->where('retailer_id', $this->retailerId)
    ->whereBetween('created_at', [$reference->startOfMonth(), $reference->endOfMonth()])
    ->sum('order_value');

Furthermore, I'd add a composite index to the retailer_orders table:

Schema::table('retailer_orders', function (Blueprint $table) {
    $table->index(['retailer_id', 'created_at'], 'orders_report_index');
});

So, both columns are used at once with the where conditions on your report, as MySQL only uses one index at a time.

rodrigo.pedra's avatar

With 90k retailers, 12 months, and considering your job's code, you might be dispatching around 1 million jobs.

One job for each report cell... That is a lot!

The raw SQL using GROUP BY, which currently takes 3 minutes, seems to be a better use of your DB.

Consider adding the composite index I suggested above to see if the SQL query runs a bit faster.

rodrigo.pedra's avatar

Also, for the aggregate query, you don't need to join to get the retailer's name, as in your job you are only saving the retailer's ID.

Removing that join might make things a bit faster, as the DB has less data to fetch.

You can fetch the retailer's name later when presenting the report.

rodrigo.pedra's avatar

On my machine, it processes around 6,700 records per minute.

Considering the 1,080,000 jobs, that would take about 2 hours and 40 minutes.

On the other hand, the aggregate query takes 18 seconds on average. (Without the join and with the composite index)

It seems you are better off using a INSERT..SELECT than dispatching a job for each report cell.

Please or to participate in this conversation.