motinska94's avatar

What is the best way to handle LARGE amount of data?

I'm holding around 200k records of sales information. (All dummy data, deployed project can have unlimited amount -can be millions after a few years- of records.)

I need to access bar charts of these records, grouped by this week - last week (7 days x 2) this month - last month (28 to 31 days x 2) this year - last year (12 months x 2) sale price_totals.

I'm trying to do this and it's taking way too long (like 15 to 20 seconds long)

How would you approach a task like this? I'm also showing monthly and weekly sale price_totals as float numbers in dashboard homepage and that also takes around 10 seconds to load with 200k data.

Sales table : table

If it's not too much to ask, if you could explain it in simpler terms or with some keywords that I could search tutorials with, that would be super helpful. I'm still a beginner in laravel.

Clockwork load time

clockwork

Clockwork queries

queries

Example table I need (this times 3, for weekly and yearly comparisons)

sample table

My Model


    public function index()
    {
        $this_week = $this->getWeekly();
        $last_week = $this->getWeekly( 1);

        $this_month = $this->getMonthly();
        $last_month = $this->getMonthly(1);

        $this_year = $this->getYearly();
        $last_year = $this->getYearly(1);

        return view('employee.stats.index', compact('this_week', 'last_week', 'this_month', 'last_month', 'this_year', 'last_year'));

    }

    public function getMonthly($minus = 0)
    {
        $data = Sales::query()
            ->whereYear('created_at', '=', date('Y'))
            ->where('status', '>', '1')
            ->orderBy('created_at')
            ->get(["price_total", 'created_at'])
            ->groupBy(fn($item) => intval($item->created_at->format('m')));


        $get_this = $data[date('m') - $minus]->groupBy(fn($item) => $item->created_at->format('d'));

        $this_arr = [];
        foreach ($get_this as $day){
            $this_arr[intval($day->first()->created_at->format('d'))] = $day->sum('price_total');
        }

        return json_encode($this_arr);

    }
    public function getWeekly($minus = 0)
    {
        $data = Sales::query()
            ->whereYear('created_at', '=', date('Y'))
            ->where('status', '>', '1')
            ->orderBy('created_at')
            ->get(["price_total", 'created_at'])
            ->groupBy(fn($item) => intval($item->created_at->format('W')));


        $get_this = $data[date('W') - $minus]->groupBy(fn($item) => $item->created_at->format('D'));


        $this_arr = [];
        foreach ($get_this as $day){
            $this_arr[$day->first()->created_at->format('D')] = $day->sum('price_total');
        }

        return json_encode($this_arr);

    }


    public function getYearly($minus = 0)
    {
        $data = Sales::query()
            ->whereYear('created_at', '=', date('Y') - $minus)
            ->where('status', '>', '1')
            ->orderBy('created_at')
            ->get(["price_total", 'created_at'])
            ->groupBy(fn($item) => $item->created_at->format('m'));


        $this_arr = [];
        foreach ($data as $day){
            $this_arr[intval($day->first()->created_at->format('m'))] = $day->sum('price_total');
        }

        return json_encode($this_arr);

    }

Suggestions I saw :

Caching : I can't cache this because new sale records are being inserted into sales table constantly and I need to show an up-to-date representation of sales.

Chunking : I don't think that works with my problem, because as I stated above I'm also showing monthly and weekly sale price_totals as float numbers in dashboard homepage and that also takes around 10 seconds to load with 200k data.

Thank you all so much for your comments and suggestions in advance. I've been dealing with this slowness issue for the last 4 days, I got 2 brand new grays in my beard and a migraine.

0 likes
7 replies
Snapey's avatar

you need to do the grouping in the database not in memory

1 like
motinska94's avatar

@Snapey Thanks! Do you have any idea how can I transform ->groupBy(fn($item) => $item->created_at->format('m')) and ->groupBy(fn($item) => intval($item->created_at->format('W')) into a SQL query?

Tray2's avatar

@motinska94 I think you should store historical data in another table, So everything older than the current month is precalculated.

Then you can use a union to get all the data

SELECT YEAR(created_at) AS y MONTH(created_at) AS m,  SUM(sales) AS total_sales
FROM sales
WHERE YEAR(created_at) = 2022
AND MONTH(created_at) = 11
GROUP BY 1, 2
UNION ALL
SELECT y, m total_sales
FROM sales_log
WHERE y = 2021
AND m = 11
1 like
motinska94's avatar

@Tray2 Thanks so much! It actually makes more sense this way than storing it json formatted 😅 Sorry for the late response.  

Just to make sure I don't do something stupid, when exactly should I call this query? Every time I'm inserting something into my sales table?  

Also again, to avoid doing something stupid, am I only gonna add year-month and sales_total columns to the sales_log table or do I need something else that you can see off the bat?

Tray2's avatar
Tray2
Best Answer
Level 74

@motinska94 That depends on the applications needs so it's hard to give a straight answer, but I suggest keeping the older data somewhere just in case, so you don't just outright delete it.

I would make it a database view and then query that view.

Maybe call it sale_stats_views , and then you can use a model for it SaleStatsView .

Check this post on how to use a view insread of a complex query https://tray2.se/posts/use-a-view-instead-of-a-complex-eloquent-query-in-your-laravel-application

1 like
motinska94's avatar

@Tray2 Thanks again! Blog post looks so clean. I'll dig into that right now.

1 like
Rooligan's avatar

The YEAR and MONTH functions in the where are terribly slow because it needs to perform that function on every row. Better to use WHERE created_at BETWEEN '2022-11-01 00:00:00' AND '2022-12-01 00:00:00'. And add an index to the created_at column. For more tips take a look at the Eloquent perfomance patterns serie. https://laracasts.com/series/eloquent-performance-patterns

Please or to participate in this conversation.