I have two very large tables, each with over 59 million rows. And I need to generate a slug combining information from both, currently I'm doing it like this:
public function gerarSlugEmMassa()
{
// Processes the establishments that do not yet have a generated slug
Estabelecimento::where('slug_gerado', 0) // Only the records that do not have a generated slug
->orderBy('cnpj_base')
->chunk(1000, function ($estabelecimentos) {
foreach ($estabelecimentos as $estabelecimento) {
// Query the 'cnpj_base' in the 'Empresa' table to get the 'razao_social'
$empresa = Empresa::where('cnpj_base', $estabelecimento->cnpj_base)->first();
if ($empresa) {
// Gets the company's corporate name and the establishment's full CNPJ
$razaoSocial = $empresa->razao_social; $cnpjInteiro = $estabelecimento->cnpj_inteiro;
// Generating the slug
$slug = str_slug($cnpjInteiro . '-' . $razaoSocial);
Estabelecimento::where('cnpj_base', $estabelecimento->cnpj_base)
->update([
'slug' => $slug, // Updates the 'slug' field
'slug_gerado' => 1 // Marks the slug as generated
]);
}
}
});
return response()->json(['message' => 'Slugs generated successfully!']);
}
I've done some other tests, but both are slow, it seems like it will take days to finish, has anyone come across this scenario and could give me some insight?
@mvnobrega Why are you doing this using php? This sounds like a one time thing to me, so my suggestion is to do it in the database using SQL. You are still going to need to do it in chunks but it will be lots faster.
Other than that I would suggest not ordering the records, since you are updating a value, you should just check for it not being updated.
I performed the procedure with pure SQL and it really increased the speed of the process. In less than 2 minutes almost 2 million rows were updated, now the whole process should take less than 2 hours :))) Thank you for your help