Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

kalvinmizzi's avatar

MySQL Window functions for cumulative totals, improving performance by relying on MySQL over PHP, and using sum for groups within window functions

I wanted to share a before and after post of the problem I originally presented in: https://laracasts.com/discuss/channels/eloquent/laravel-slow-collection-filtering

In this post I will share how I did the following to take my script from running out of memory and taking 3.5 minutes to execute (after initial optimization) to taking 50 seconds and avoiding any memory issues at all.

I was able to transition more responsibility to MySQL instead of PHP, which I thought was impossible for my use case.

It took using some advanced MySQL including:

  • CASE functions
  • Window functions while also summing within groups
  • Correlated subqueries
  • Use of temporary tables

I will let the code speak for itself but I'll give a brief synopsis of the business logic.

We want to manage cached inventory snapshot data to be able to answer quickly what was my inventory quantities and valuation at any time in the past.

We invalidate the snapshots anytime underlying inventory movements get created/updated/deleted for a given product/date.

The TakeInventorySnapshot's job is to validate the cached inventory snapshots and it does so in the most efficient way by taking the latest valid snapshot as a starting point, then only querying inventory movements after the date of that last valid snapshot. A cumulative inventory level is calculated. All of this is done in one fell swoop in MySQL without having to loop through and build arrays in PHP.

Finally once we have the newly validated data, we bulk upsert into the database (saveBulk).

Here is the code:

<?php

namespace App\Actions;

use App\Repositories\InventoryMovementRepository;
use App\Repositories\InventorySnapshotRepository;

class TakeInventorySnapshot
{
    /**
     * @var array
     */
    protected array $productIds;

    protected InventoryMovementRepository $inventoryMovementRepository;
    protected InventorySnapshotRepository $inventorySnapshotRepository;

    /**
     *
     * @param array $productIds
     */
    public function __construct(array $productIds = [])
    {
        $this->inventoryMovementRepository = app(InventoryMovementRepository::class);
        $this->inventorySnapshotRepository = app(InventorySnapshotRepository::class);
        $this->productIds = $productIds;
    }

    public function handle(): void
    {
        /*
         * Seeding inventory snapshots for each missing product date with an inventory movement
         * We only seed dates that have inventory movements for a product.  So if a product has inventory movements
         * 3 years apart, we don't seed 365 * 3 worth of data, just 2 rows
         */
        $this->inventorySnapshotRepository->seed($this->productIds);

        // Get product ids that have at least 1 invalid snapshot, chunked 10,000 product ids at a time
        foreach ($this->inventorySnapshotRepository->getProductIdsWithInvalidCache($this->productIds)->chunk(10000) as $chunk)
        {
            /*
             * Delete any valid snapshots that exist after the oldest invalid snapshot for a product.  This case should
             * not exist, but we do this as a safety measure.
             */
            $this->inventorySnapshotRepository->sanitizeValidSnapshots($chunk);

            /*
             * Builds validated inventory snapshot data from a combination of previous validated snapshots and new
             * data built from inventory movements
             */
            $data = $this->inventorySnapshotRepository->getNewlyValidatedData($chunk);

            if (empty($data))
            {
                return;
            }

            $this->inventorySnapshotRepository->saveBulk($data);
            unset($data);
        }
    }
}

Relevant methods from InventorySnapshotRepository:

public function seed(array $productIds = []): void
    {
        $this->saveBulk($this->getMissingProductDays($productIds));
    }
public function sanitizeValidSnapshots(array|Arrayable $productIds = []) : void
    {
        InventorySnapshot::query()
            ->whereIn('product_id', $productIds)
            ->whereIn('id', function($query) {
                $query->select('id')
                    ->from('inventory_snapshots as is1')
                    ->whereIn('product_id', function($query) {
                        $query->select('product_id')
                            ->from('inventory_snapshots')
                            ->where('is_cache_valid', 0)
                            ->groupBy('product_id');
                    })
                    ->where('date', '>', function($query) {
                        $query->select(DB::raw('MIN(date)'))
                            ->from('inventory_snapshots')
                            ->where('is_cache_valid', 0)
                            ->whereColumn('product_id', 'is1.product_id');
                    })
                    ->where('is_cache_valid', 1);
            })
            ->update(['is_cache_valid' => 0]);
    }

/**
     *
     * Builds validated inventory snapshot data from a combination of previous validated snapshots and new
     * data built from inventory movements
     *
     * @param array|Arrayable $productIds
     * @return array
     */
    public function getNewlyValidatedData(array|Arrayable $productIds): array
    {
        $inventoryMovements = InventoryMovement::query()
            ->from('inventory_movements as im')
            ->select([
                'im.product_id',
            ])
            /*
             * getSqlUtcStartOfLocalDate allows us to store a date in UTC timezone in the database that represents the
             * start time of a User's day
             */
            ->selectRaw(Helpers::getSqlUtcStartOfLocalDate('im.inventory_movement_date') . ' as date')
            /*
             * We use SUM(SUM(... since the outer SUM is a MySQL window function, and the inner SUM is to SUM within
             * the group.  We also use CASE WHEN THEN ELSE syntax since there are multiple inventory movement
             * statuses that correspond to a single inventory snapshot record
             */
            ->selectRaw('SUM(SUM(CASE WHEN inventory_status = "' . InventoryMovement::INVENTORY_STATUS_ACTIVE . '" THEN quantity ELSE 0 END)) OVER (PARTITION BY product_id ORDER BY ' . Helpers::getSqlUtcStartOfLocalDate('im.inventory_movement_date') . ') AS inventory_available')
            ->selectRaw('SUM(SUM(CASE WHEN inventory_status = "' . InventoryMovement::INVENTORY_STATUS_RESERVED . '" THEN quantity ELSE 0 END)) OVER (PARTITION BY product_id ORDER BY ' . Helpers::getSqlUtcStartOfLocalDate('im.inventory_movement_date') . ') AS inventory_reserved')
            ->selectRaw('SUM(SUM(CASE WHEN inventory_status = "' . InventoryMovement::INVENTORY_STATUS_IN_TRANSIT . '" THEN quantity ELSE 0 END)) OVER (PARTITION BY product_id ORDER BY ' . Helpers::getSqlUtcStartOfLocalDate('im.inventory_movement_date') . ') AS inventory_in_transit')
            // Only FIFO layers will result in inventory value
            ->selectRaw('SUM(IFNULL(SUM(quantity * (fifo_layers.total_cost / fifo_layers.original_quantity)), 0)) OVER (PARTITION BY product_id ORDER BY ' . Helpers::getSqlUtcStartOfLocalDate('im.inventory_movement_date') . ') AS inventory_stock_value')
            ->selectRaw('1 as is_cache_valid')
            ->leftJoin('fifo_layers', function(JoinClause $join) {
                $join->on('im.layer_id', 'fifo_layers.id')
                    ->where('layer_type', FifoLayer::class);
            })
            ->where(function (Builder $query) {
                /*
                 * For improved performance, we only get inventory movements for dates after the last validated snapshot
                 * for a product.  We later add the validated snapshot values to each cumulative total
                 */
                $query->orWhere(DB::raw(Helpers::getSqlUtcStartOfLocalDate('im.inventory_movement_date')), '>', function ($query) {
                    $query
                        ->from('inventory_snapshots as is1')
                        ->select('date')
                        ->where('is_cache_valid', 1)
                        ->where('date', function($query) {
                            $query->selectRaw('MAX(date)')
                                ->from('inventory_snapshots')
                                ->where('is_cache_valid', 1)
                                ->whereColumn('product_id', 'is1.product_id')
                                ->groupBy('product_id');
                        })
                        ->whereColumn('is1.product_id', 'im.product_id');
                });
                // If no validated snapshot exists, we get all inventory movements for a product
                $query->orWhereNotExists(function ($query) {
                    $query
                        ->from('inventory_snapshots as is1')
                        ->where('is_cache_valid', 1)
                        ->whereColumn('is1.product_id', 'im.product_id');
                });
            })
            ->groupBy([
                DB::raw(Helpers::getSqlUtcStartOfLocalDate('im.inventory_movement_date')),
                'im.product_id',
            ])
            ->orderBy('inventory_movement_date');

        if (!empty($productIds))
        {
            $inventoryMovements->whereIn('im.product_id', $productIds);
        }

        /*
        |--------------------------------------------------------------------------
        | These temporary tables add the latest validated snapshot data to each cumulative
        | total
        |--------------------------------------------------------------------------
        */

        $createTemporaryTableQuery = '
        CREATE TEMPORARY TABLE temp_snapshot_data AS (' .
            str_replace("\", "\\", $inventoryMovements->toSqlWithBindings()) . '
        )';

        DB::statement($createTemporaryTableQuery);

        $addIndexesQuery = <<<SQL
        ALTER TABLE temp_snapshot_data ADD PRIMARY KEY (`product_id`, `date`)
        SQL;

        DB::statement($addIndexesQuery);

        $latestValidatedSnapshotsQuery = InventorySnapshot::query()
            ->from('inventory_snapshots as is1')
            ->select([
                'product_id',
                'is_cache_valid',
                'date',
                'inventory_available',
                'inventory_reserved',
                'inventory_in_transit',
                'inventory_stock_value'
            ])
            ->where('is_cache_valid', 1)
            ->where('date', function($query) {
                $query->selectRaw('MAX(date)')
                    ->from('inventory_snapshots')
                    ->where('is_cache_valid', 1)
                    ->whereColumn('product_id', 'is1.product_id')
                    ->groupBy('product_id');
            });

        if (!empty($productIds))
        {
            $latestValidatedSnapshotsQuery->whereIn('product_id', $productIds);
        }

        $createLatestValidatedSnapshotsTemporaryTable = '
        CREATE TEMPORARY TABLE temp_latest_validated_snapshots AS (' .
            $latestValidatedSnapshotsQuery->toSqlWithBindings() . '
        )
        ';

        DB::statement($createLatestValidatedSnapshotsTemporaryTable);

        $addIndexesQuery = <<<SQL
        ALTER TABLE temp_latest_validated_snapshots ADD PRIMARY KEY (`product_id`, `date`)
        SQL;

        DB::statement($addIndexesQuery);

        $updateSnapshotsQuery = <<<SQL
        UPDATE temp_snapshot_data AS tsd
        LEFT JOIN temp_latest_validated_snapshots AS tlvs
            ON tlvs.product_id = tsd.product_id
        SET tsd.inventory_available = tsd.inventory_available + IFNULL(tlvs.inventory_available, 0),
            tsd.inventory_reserved = tsd.inventory_reserved + IFNULL(tlvs.inventory_reserved, 0),
            tsd.inventory_in_transit = tsd.inventory_in_transit + IFNULL(tlvs.inventory_in_transit, 0),
            tsd.inventory_stock_value = tsd.inventory_stock_value + IFNULL(tlvs.inventory_stock_value, 0)
        SQL;

        DB::statement($updateSnapshotsQuery);

        $pdo = DB::getPdo();
        $statement = $pdo->prepare('SELECT * FROM temp_snapshot_data');
        $statement->setFetchMode(PDO::FETCH_ASSOC);
        $statement->execute();

        $data = $statement->fetchAll();

        // We must drop temporary tables since they persist in between chunks of products
        DB::statement('DROP TABLE temp_snapshot_data');
        DB::statement('DROP TABLE temp_latest_validated_snapshots');

        return $data;
    }

Thank you who those who helped in the original post. Hopefully this is helpful to anyone who has a similar challenge or especially is struggling with certain nuances I solved such as summing within a group when using window functions.

0 likes
0 replies

Please or to participate in this conversation.