JoshuwayMorris's avatar

MySQL Join and OrderBy statement very slow, 77 seconds, despite indexes (Laravel/Eloquent ORM)

I'm having performance issues with MySQL, despite having indexes when using both join and order by . The query is currently taking 77 seconds to complete.

The purpose of the query is to order the table alphabetically.

Background

I have a MySQL database that contains two InnoDB tables.

  • report_urls; and
  • urls

Each table contains 500,000 records. There is a one-to-one relationship between report_urls and url. Whereas there is a one-to-many relationship between url and report_urls.

The tables have the following structure:

  • report_urls
    • id (Primary Key, char36, UUID, indexed)

    • url_id (char36, UUID, indexed)

  • urls
    • id (Primary Key, char36, UUID, indexed)

    • title (varchar 255, indexed)

I need to join these tables and order the joined results by alphabetically by the title column.

The query

I am using Laravel and the Eloquent ORM, but the SQL statement that is generated is:

select
  *
from
  `report_urls`
  inner join `urls` as `urls_title_sort` on `report_urls`.`url_id` = `urls_title_sort`.`id`
where
  `report_id` = '8f6598f0-34d0-48b0-be8b-41c1b8f61ffd'
order by
  `urls_title_sort`.`title` asc
limit
  11 offset 0

The migrations

The controller

// Retrieve report URLs
$urls = app(Pipeline::class)
    ->send(new Filter(
        ReportURL::where('report_id', $report->id)->with('url'),
        $request,
        [
            'groups' => $groups
        ]
    ))
    ->through([
        ReportUrlSearchFilter::class,
        ReportUrlImpactFilter::class,
        ReportUrlGroupFilter::class,
        ReportUrlRuleFilter::class,
        ReportUrlResolutionFilter::class,
        ReportUrlSortFilter::class,
    ])
    ->thenReturn()
    ->builder;

$urls = $urls->simplePaginate(10);

MySQL Explain

Running the above query with the EXPLAIN command outputs the following. Apologies, I cannot format the table in a better way on Laracasts.

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE report_urls NULL ref report_urls_url_id_foreign,report_urls_report_id_created_at_index report_urls_report_id_created_at_index 144 const 235931 100.00 Usingtemporary;Usingfilesort
1 SIMPLE urls_title_sort NULL eq_ref PRIMARY PRIMARY 144 accessibility.report_urls.url_id 1 100.00 NULL

Question

I need to join these tables together and sort them alphabetically by title.

The query currently takes 77 seconds to execute. I was hoping that MySQL could use the title index, but it doesn't seem to use the index.

Can anyone suggest any performance improvements?

0 likes
16 replies
Tray2's avatar
Tray2
Best Answer
Level 74

Yeah, the index isn't going to help you much. The sorting likely isn't the problem, the join is, you are using uuids as your primary keys, and that is usually a bad idea. You really should use integers, it's way faster since they are sequential.

Since you aren't sharing your migrations, I'm guessing you haven't create a proper foreign key constraint either.

So to make it faster.

  1. Use integer as your primary keys.
  2. Use proper foreign key constraints.

If you do those two, the sorting will be much faster, and of course you should paginate.

I suggest giving this a read

https://tray2.se/posts/properly-formed-foreign-keys-are-your-best-friends

1 like
Snapey's avatar

and if you need uuid as an external identifier, then that can be added as an additional column

1 like
JoshuwayMorris's avatar

Thanks @tray2 and @snapey for your replies. I have updated the question to include my migrations.

Thanks @tray2 for linking your helpful article. Unfortunately, both tables already use foreignUuid columns that are constrained and cascade on delete.

I however, did not consider that using UUIDs could be impacting on the performance. I'll make changes to the database, reseed and test again. Hopefully using integer values will increase the performance.

Thank you both.

1 like
JussiMannisto's avatar

UUID is a bit of a red herring here and switching to incremented keys won't help much*. The issue is that you're combining joining and sorting by the target table. The entire urls table has to be sorted into a temporary table for this to work.

This is tricky since you're also matching by report_id and doing pagination. You should consider adding a redundant title column in the report_urls table, if sorting by title is something you really need. I assume the title doesn't change after it's created?

* UUID indexes are B+ trees just like those of incrementing columns, so the lookup speed isn't markedly worse. Insert performance can be a lot worse with unordered UUIDs because it causes expensive re-balancing of the tree. But if you're using Laravel's HasUuids trait, you get timestamp-ordered UUIDs, which mitigates this.

You should always use the native UUID type (128 bits) instead of 36 chars (288 bits!). But integer primary keys should be the default. I always use integer PKs with a secondary ULID column if needed, like @snapey suggested.

Tray2's avatar

@JussiMannisto

UUID is a bit of a red herring here and switching to incremented keys won't help much*. The issue is that you're combining joining and sorting by the target table. The entire urls table has to be sorted into a temporary table for this to work.

Not really true, any and all filters will be applied first, before the sorting, so the sorting will only apply to the resulting rows. There is no problem ordering on a joint table, if it were then 99% of the database driven app would not function.

This is tricky since you're also matching by report_id and doing pagination. You should consider adding a redundant title column in the report_urls table, if sorting by title is something you really need. I assume the title doesn't change after it's created?

This is really bad advice, you should almost never store the same data in two different tables, it will give inconsistency in your data. There are some cases when this is a good idea, but then we are talking last resort.

  • UUID indexes are B+ trees just like those of incrementing columns, so the lookup speed isn't markedly worse. Insert performance can be a lot worse with unordered UUIDs because it causes expensive re-balancing of the tree. But if you're using Laravel's HasUuids trait, you get timestamp-ordered UUIDs, which mitigates this.

Well, UUID is not incrementing, and is harder to index, yes you can index them, but they are not stored in the order you think, since they aren't sequential, ULID was created to mitigate this issue, and is a little better.

However, both UUID and ULID takes up more space than a regular integer primary key, this might cause unwanted read to disk, and that is something you want to avoid if possible.

Like you said, some say that there is no major difference in performance, perhaps not, bu I much rather have 100 000 000 integers stored and indexed in my database, than 100 000 000 of these

6891c550-9ae0-48a4-a239-277be64d3d5f
f06bf9b8-c6aa-48c8-88e5-d0ade7922d9d
70617dae-71ed-4561-837e-b07855a395f2
7459b701-5cbd-447b-a70f-887075259be6
a880fde8-a548-4a19-88d3-dcf72214ada3
a0c8ff84-886b-4fc4-9466-04fb0119afcf
42cb5c90-0bb8-4adc-a5d8-8b1fdc21351d
f16b58b6-5cec-4321-ab3c-9aa3039b8aef
196ae4fe-1bcc-4564-a940-59028487b62a
f68221f1-6b4b-44ba-9f8d-17d873c5bd37
JussiMannisto's avatar

@Tray2

Not really true, any and all filters will be applied first, before the sorting, so the sorting will only apply to the resulting rows. There is no problem ordering on a joint table, if it were then 99% of the database driven app would not function.

His query takes 77 seconds on a mere 500k rows, and in his explain output shows following:

Using temporary;Using filesort

I wrote this SQL snippet as a demo. It has basically the same elements as the query in question. There are two explain queries at the end, and the only difference between them is which title column is used for sorting; the one in table a or b. Compare the Extra columns of the output after running the snippet.

This is really bad advice, you should almost never store the same data in two different tables, it will give inconsistency in your data. There are some cases when this is a good idea, but then we are talking last resort.

There's a place for redundancy if it solves a real life problem. I purposefully asked if the title ever changes after being created. It's only a suggestion, but it's definitely a good option if the alternative is a 77s query.

However, both UUID and ULID takes up more space than a regular integer primary key, this might cause unwanted read to disk, and that is something you want to avoid if possible.

Like you said, some say that there is no major difference in performance, perhaps not, bu I much rather have 100 000 000 integers stored and indexed in my database, than 100 000 000 of these

I'm not disagreeing, int PKs are simply better if UUID isn't required. But switching to ints doesn't solve the problem that @joshuwaymorris has.

Tray2's avatar

@JussiMannisto This is most likely due to the fact that the index isn't working as intended

Using temporary;Using filesort

https://www.percona.com/blog/what-does-using-filesort-mean-in-mysql/

The usual answer is something like “rows are being placed into a temporary table which is too big to fit in memory, so it gets sorted on disk.” Unfortunately, this is not the same thing. First of all, this is Using temporary. Secondly, temporary tables may go to disk if they are too big, but EXPLAIN doesn’t show that. (If I interview you, I might ask you what “too big” means, or I might ask you the other reason temporary tables go to disk!)

The truth is, filesort is badly named. Anytime a sort can’t be performed from an index, it’s a filesort. It has nothing to do with files. Filesort should be called “sort.” It is quicksort at heart.

If the sort is bigger than the sort buffer, it is performed a bit at a time, and then the chunks are merge-sorted to produce the final sorted output. There is a lot more to it than this. I refer you to Sergey Petrunia’s article on How MySQL executes ORDER BY. You can also read about it in our book, but if you read Sergey’s article you won’t need to.

The biggest issue here is that it doesn't use the indexes, and that the result is too big for the sort cache, thus needing to push it into file sort.

However, the url table isn't in the migrations shared, and that is most likely missing an index on the titles column.

JussiMannisto's avatar

@Tray2 There might be an issue with the title index, and that would explain it. But the original post said it's indexed.

FWIW, the reason why the temporary table is needed is mentioned here:

The server creates temporary tables under conditions such as these:

  • Evaluation of statements that contain (...), or for which the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue.

https://dev.mysql.com/doc/refman/8.4/en/internal-temporary-tables.html

1 like
Tray2's avatar

@Lumethys like everything else, it depends, in this case there were two issues, one was the fact that the OP was using varchars longer than 255 characters, the other is that the OP used UUID as primary and foreign keys in his database model.

Yes, you can sort it lexicographically like you said, but it becomes a bigger index than using integers.

When you use integers the order becomes

  • 1
  • 2
  • 3

But when using uuids it becomes

  • 01930b0b-023e-798e-aa76-973bb815f17c
  • 01930b0b-622d-7011-87a7-01717e3d0d38
  • 01930b0b-8a03-7d56-ab6e-849403b8b755

The index and table size difference will also be huge.

36x3 = 108 characters or 1x3 = 3 integers

That size also affects the performance.

Single table where you use the uiid primary key to find one record is fine, but with a join it is not just one records, there are thousands what needs to be filtered and sorted.

I'm just going to quote the OP

From 77 seconds to 0.0011 seconds by switching UUID to standard IDs, and switching the length of the title column to a standard VARCHAR 255.

JoshuwayMorris's avatar

Hi everyone,

Thank you for the help @tray2 , @snapey and @jussimannisto.

I have updated the migrations to use regular id() columns and foreignId() columns. These are constrained and cascade on delete.

I followed @snapey's suggestion of keeping a UUID column, but these will only used for public-facing records that would be exposed in a URL and not used for internal operations.

The UUID to ID changes have really helped. Previously, the query took 77 seconds to complete, it now takes roughly 12 seconds. Thank you @tray2 for the suggestion.

I understand that I am trying to order 500,000 records alphabetically, but I wondered if anyone had any suggestions for improving the performance any further?

For clarification, the title rarely changes. This is part of an automated crawling tool that runs on a monthly cycle. If the page title changes after a month, I update the URL table with the new page title.

Could you think of any further optimisations? Would a prefix/suffix index help? Perhaps a database view?

Lastly, these database operations will be cached once they have been completed. However, ideally, I don't want the user to have to wait 12 seconds the first time they perform the operation. Particularly as navigating to the next page with pagination also takes 12 seconds.

Thank you for all of your help. For completeness, please find below my new migrations and the new query:

select
  `report_urls`.*
from
  `report_urls`
  inner join `urls` as `urls_title_sort` on `report_urls`.`url_id` = `urls_title_sort`.`id`
where
  `report_id` = 1
order by
  `urls_title_sort`.`title` desc
limit
  11 offset 0
JoshuwayMorris's avatar

After writing this, I've just noticed that the title column on urls is set as a string with length 1024 that I believe is too long for the standard index.

I had changed this before completing a fresh database migration to only be a standard VARCHAR(255). However, since migrating, the column has reverted to 1024 in length. I wonder if this could be causing the slowness.

I'll use an alter table statement to adjust the column to a 255 length VARCHAR and see if that helps.

Tray2's avatar

@JoshuwayMorris The length of the VARCHAR column shouldn't affect the index that much, it only uses the space it needs, so if the string is 100 characters, the database stores 100 characters, not 1024. The longer the string is the more power is needed to sort the records, if there are many similar titles, that might also affect performance.

There are different types of indexes, and it might be a good idea to take a look at the other types.

JoshuwayMorris's avatar

@Tray2 thanks for your help on this.

I did end up switching the column to a VARCHAR 255, and the query went from 12 seconds to 0.0011 seconds, which was a real shock!

I read on a Stack Overflow question that:

With deprecated utf8mb3, you can store up to 255 characters in an index, while with utf8mb4, up to 191, when using COMPACT or REDUNDANT row format. With COMPRESSED or DYNAMIC row format, index key prefixes can be up to 3072 bytes. With them, you can index up to 1024 characters for utf8mb3, and 768 characters for utf8mb4.

I'm using "utf8mb4_unicode_ci" for my collation. I need to read up more on COMPRESSED or DYNAMIC formats, as I'd like to more efficiently index the urls themselves, and not just the title column.

I've ordered a copy of "High Performance MySQL" to try and give me some additional pointers.

From 77 seconds to 0.0011 seconds by switching UUID to standard IDs, and switching the length of the title column to a standard VARCHAR 255.

Thank you for all of your help!

1 like
Snapey's avatar

@JoshuwayMorris there is an argument for making your index string much shorter so that say, only the first 40 characters are indexed. This is very situational of course, and it also depends on the criticality of the index. If the titles vary after 40 characters, does it make any practical difference?

Please or to participate in this conversation.