paulsantos's avatar

Pagination with a large table

I have a MySQL table with around 50,000 rows and using query builder's magic method paginate(). It works but kind of slow. So, I began searching on how can I optimize the code to make it load faster.

I found out that behind the scene, paginate() counts the total number of rows in my table first before it actually pulls the right data using skip() and take() functions. In my case, it takes around 2.798 seconds to perform a simple SELECT COUNT(*) query.

Right now, I wrote my own LengthAwarePagination implementation that caches the total number of rows. Surely, my first few pages load faster now but once the page (or the offset value) becomes higher it gets more sluggish. Some says that I should keep track of the last row of the current page then pass it along to the next page. That sounds convincing if you're using simple pagination with just a previous and next buttons, but how can you do this if you want to show a number of pages? Any help guys?

0 likes
11 replies
click's avatar

50.000 rows should not be a problem for mysql. I think you miss some indexes on your table.

How does your query look like? Is it sorting or filtering on any columns?

See this ticket on SO on the subject that your query is getting slower after each page: https://stackoverflow.com/questions/4481388/why-does-mysql-higher-limit-offset-slow-the-query-down

There is one comment that does what you say, keep track of the last id. And include that in your query with where id > last id.

paulsantos's avatar

My table includes a description (text), country (varchar), and salary_avg (decimal) fields. I filter the result by country and salary_avg min and max value.

click's avatar

Do you have indexes on those fields?

Please dump you full query. What do you with min() max()? It assumes that you are grouping some data?

paulsantos's avatar

Yes, I already added indexes for country and salary_avg fields. This is just a simple query.

select `id`, `position`, `company`, `source`, `industry`, `work_location`, `salary_avg` from `job_descriptions` where `country` = ? and `salary_avg` > ? and `salary_avg` < ? and `job_descriptions`.`deleted_at` is null
newbie360's avatar

i didn't use Soft Deleting, so i'm not sure, can u try add the deleted_at index

i think the better way is move all deleted records to another table

click's avatar

Soft delete does not automatically set an index as far as I can see at this moment. It does:

public function softDeletes($column = 'deleted_at', $precision = 0)
{
   return $this->timestamp($column, $precision)->nullable();
}

So it could be worth a shot to add an index on it. It feels strange that this isn't done by default.

It is hard to see from here what you could do. Just to clearfiy, we are talking about the actual query that returns the records right? Not the count() query that you cached. What do you define as slow at this moment?

You could try using the EXPLAIN method on a full query (including the limit that the pagination would add).

Something like:

EXPLAIN select `id`, `position`, `company`, `source`, `industry`, `work_location`, `salary_avg` from `job_descriptions` where `country` = ? and `salary_avg` > ? and `salary_avg` < ? and `job_descriptions`.`deleted_at` is null LIMIT 20000, 100

@newbie360 deleted records should not be moved to another table. That would create a architectural nightmare for different reasons. Deleted records should be deleted or marked as deleted (in the same table).

Snapey's avatar

Are you trying to load the whole dataset and then paginate the collection?

I can't see any offset and limit on your sql query?

paulsantos's avatar

Guys, there is something weird going in with my table. When I tried this:

SELECT COUNT(*) FROM job_descriptions;

It took me around 8 seconds to produce the result, but when I do this:

SELECT COUNT(country) FROM job_descriptions;

The response took less than a second to process, any insights? Btw, the country field is a varchar(255) nullable.

Cronix's avatar

Because when using *, the db has to manually figure out the columns, just like "select * from blah" isn't nearly as fast as being specific, like "select col1, col2, col3 from blah". It's always better to be specific.

jlrdw's avatar

Before you do anything I would seriously learn to properly paginate and then proceed.

In other words go through a tutorial where you are doing the math and everything to thoroughly understand pagination.

like it was said above looks like you are trying to load the whole data set and then paginate where as with a search and proper pagination things will go real fast.

Please or to participate in this conversation.