Hi everyone,
Thank you for the help @tray2 , @snapey and @jussimannisto.
I have updated the migrations to use regular id() columns and foreignId() columns. These are constrained and cascade on delete.
I followed @snapey's suggestion of keeping a UUID column, but these will only used for public-facing records that would be exposed in a URL and not used for internal operations.
The UUID to ID changes have really helped. Previously, the query took 77 seconds to complete, it now takes roughly 12 seconds. Thank you @tray2 for the suggestion.
I understand that I am trying to order 500,000 records alphabetically, but I wondered if anyone had any suggestions for improving the performance any further?
For clarification, the title rarely changes. This is part of an automated crawling tool that runs on a monthly cycle. If the page title changes after a month, I update the URL table with the new page title.
Could you think of any further optimisations? Would a prefix/suffix index help? Perhaps a database view?
Lastly, these database operations will be cached once they have been completed. However, ideally, I don't want the user to have to wait 12 seconds the first time they perform the operation. Particularly as navigating to the next page with pagination also takes 12 seconds.
Thank you for all of your help. For completeness, please find below my new migrations and the new query:
Schema::create('urls', function (Blueprint $table) {
$table->id();
$table->uuid('uuid')->index();
$table->foreignId('team_id')->constrained()->cascadeOnDelete();
$table->string('title', 1024)->nullable()->index();
$table->string('url', 2048)->nullable()->fullText();
$table->timestamp('last_reported_at')->nullable()->index();
$table->timestamp('created_at')->useCurrent();
$table->timestamp('updated_at')->useCurrent()->useCurrentOnUpdate();
});
Schema::create('reports', function (Blueprint $table) {
$table->id();
$table->uuid('uuid')->index();
$table->foreignId('team_id')->constrained()->cascadeOnDelete();
$table->unsignedInteger('url_count')->default(0);
$table->unsignedInteger('violation_count')->default(0);
$table->unsignedInteger('resolved_violation_count')->default(0);
$table->unsignedInteger('violation_minor')->default(0);
$table->unsignedInteger('violation_moderate')->default(0);
$table->unsignedInteger('violation_serious')->default(0);
$table->unsignedInteger('violation_critical')->default(0);
$table->json('meta')->nullable();
$table->enum('status', ['in_progress', 'failed', 'completed'])->default('in_progress')->index();
$table->string('crawler_version')->nullable();
$table->timestamp('created_at')->useCurrent()->index();
$table->timestamp('updated_at')->useCurrent()->useCurrentOnUpdate();
$table->timestamp('finished_at')->nullable();
});
Schema::create('report_urls', function (Blueprint $table) {
$table->id();
$table->foreignId('report_id')->constrained()->cascadeOnDelete();
$table->foreignId('url_id')->constrained()->cascadeOnDelete();
$table->unsignedInteger('violation_count')->default(0);
$table->unsignedInteger('resolved_violation_count')->default(0);
$table->unsignedInteger('violation_minor')->default(0);
$table->unsignedInteger('violation_moderate')->default(0);
$table->unsignedInteger('violation_serious')->default(0);
$table->unsignedInteger('violation_critical')->default(0);
$table->timestamp('created_at')->useCurrent();
$table->timestamp('updated_at')->useCurrent()->useCurrentOnUpdate();
// Indices
$table->index(['report_id', 'violation_count']);
$table->index(['report_id', 'resolved_violation_count']);
$table->index(['report_id', 'violation_minor']);
$table->index(['report_id', 'violation_moderate']);
$table->index(['report_id', 'violation_serious']);
$table->index(['report_id', 'violation_critical']);
$table->index(['report_id', 'created_at']);
$table->index(['report_id', 'url_id']);
$table->index(['url_id', 'report_id']);
});
select
`report_urls`.*
from
`report_urls`
inner join `urls` as `urls_title_sort` on `report_urls`.`url_id` = `urls_title_sort`.`id`
where
`report_id` = 1
order by
`urls_title_sort`.`title` desc
limit
11 offset 0