How to quickly update millions of records

I have two very large tables, each with over 59 million rows. And I need to generate a slug combining information from both, currently I'm doing it like this:

public function gerarSlugEmMassa()
{

// Processes the establishments that do not yet have a generated slug
Estabelecimento::where('slug_gerado', 0) // Only the records that do not have a generated slug
->orderBy('cnpj_base')
->chunk(1000, function ($estabelecimentos) {
foreach ($estabelecimentos as $estabelecimento) {

// Query the 'cnpj_base' in the 'Empresa' table to get the 'razao_social'
$empresa = Empresa::where('cnpj_base', $estabelecimento->cnpj_base)->first();

if ($empresa) {
// Gets the company's corporate name and the establishment's full CNPJ
$razaoSocial = $empresa->razao_social; $cnpjInteiro = $estabelecimento->cnpj_inteiro;

// Generating the slug
$slug = str_slug($cnpjInteiro . '-' . $razaoSocial);

Estabelecimento::where('cnpj_base', $estabelecimento->cnpj_base)
->update([
'slug' => $slug, // Updates the 'slug' field
'slug_gerado' => 1 // Marks the slug as generated
]);
}
}
});

return response()->json(['message' => 'Slugs generated successfully!']);
}

I've done some other tests, but both are slow, it seems like it will take days to finish, has anyone come across this scenario and could give me some insight?

jlrdw

1 year ago

Level 75

Are fields indexed as needed? With millions it won't be super fast, but with indexing it should help.

There are actually companies that run large updates over night at times due to the number of database records.

Could some older records be archived in a backup database?

Make sure to study and implement good database management techniques.

Only suggestions.

1 like

mvnobrega

1 year ago

Level 3

@jlrdw yes, they are all indexed (Y)

Tray2

1 year ago

Best Answer

Level 74

@mvnobrega Why are you doing this using php? This sounds like a one time thing to me, so my suggestion is to do it in the database using SQL. You are still going to need to do it in chunks but it will be lots faster.

Other than that I would suggest not ordering the records, since you are updating a value, you should just check for it not being updated.

2 likes

mvnobrega

1 year ago

Level 3

@Tray2

Good tip, I'll test it directly with SQL. But I used order because I don't have the ID column, and chunck tries to order by the ID column.

mvnobrega

1 year ago

Level 3

@Tray2

I performed the procedure with pure SQL and it really increased the speed of the process. In less than 2 minutes almost 2 million rows were updated, now the whole process should take less than 2 hours :))) Thank you for your help

1 like

Please or to participate in this conversation.