How can I clean data performing excel imports I am importing a large number of records ('probably 10,000+) but the issue is that one column has this variation. The row name is certifications and from the file, I get these multiple variations that need to be standardized, For example, I when I dd() the group certifications I get
^ array:25 [▼
"RA CERTIFICATION" => Illuminate\Database\Eloquent\Collection {#1707 ▶}
"RA" => Illuminate\Database\Eloquent\Collection {#1689 ▶}
"RFA" => Illuminate\Database\Eloquent\Collection {#7561 ▶}
"RA/ISO CERTIFICATION" => Illuminate\Database\Eloquent\Collection {#7560 ▶}
"" => Illuminate\Database\Eloquent\Collection {#7559 ▶}
"R/A" => Illuminate\Database\Eloquent\Collection {#7558 ▶}
"RFA/ISO" => Illuminate\Database\Eloquent\Collection {#7557 ▶}
"RA CERTIFIED" => Illuminate\Database\Eloquent\Collection {#7556 ▶}
"RA CERTIFICATION AND ISO" => Illuminate\Database\Eloquent\Collection {#7555 ▶}
"RA AND ISO CERTIFIED" => Illuminate\Database\Eloquent\Collection {#7554 ▶}
"ISO RA" => Illuminate\Database\Eloquent\Collection {#7553 ▶}
"ISO" => Illuminate\Database\Eloquent\Collection {#7552 ▶}
"FSMS,RA" => Illuminate\Database\Eloquent\Collection {#7551 ▶}
"ISO AND RA CERTIFICATION" => Illuminate\Database\Eloquent\Collection {#7550 ▶}
"RA CERTIFICARION" => Illuminate\Database\Eloquent\Collection {#7549 ▶}
"N/A" => Illuminate\Database\Eloquent\Collection {#7548 ▶}
"RA/ISO22000" => Illuminate\Database\Eloquent\Collection {#7547 ▶}
"ISO 22000 :2018" => Illuminate\Database\Eloquent\Collection {#7546 ▶}
"RFA CERTIFIED" => Illuminate\Database\Eloquent\Collection {#7545 ▶}
"RA & FT CERTIFICATION" => Illuminate\Database\Eloquent\Collection {#7544 ▶}
"ISO 22000, RA" => Illuminate\Database\Eloquent\Collection {#7543 ▶}
"RAIN FOREST" => Illuminate\Database\Eloquent\Collection {#7542 ▶}
"RAIN FOREST AND FAIR TRADE" => Illuminate\Database\Eloquent\Collection {#7541 ▶}
"RA/ ISO CERTIFICATION" => Illuminate\Database\Eloquent\Collection {#7540 ▶}
"NON RA CERTIFICATION" => Illuminate\Database\Eloquent\Collection {#7539 ▶}
]
as you can see, RA, RFA, and RAIN FOREST all mean the same but still have variations and might include CERTIFICATION or CERTIFIED. How can I clean this data before saving them in my data base while importing an excel file?
@kevinwakhisi as a way - write lookup table with the mapping between imported column names and actual db columns and check this lookup table on import
Really - it's always helpful to see your actual code to understand what you're trying to do and suggest something more detailed
@SilenceBringer what i did was
$catalogues = Auctioncatalogue::get()->groupBy(function ($data) {
return $data->certification;
})->dd();
that gave me the end result as shown above. Is there a certain condition that I could apply to format the data ta. all the RA and RAIN FOREST to RA
@kevinwakhisi
/** The mapping between values in import and column names you need */
$lookupTable = [
'RA CERTIFICATION' => 'RA',
'RFA' => 'RA',
'RA/ISO CERTIFICATION' => 'RA',
// ... all others
];
$catalogues = Auctioncatalogue::get()->groupBy(function ($data) use ($lookupTable) {
return Arr::get($lookupTable, $data->certification, $data->certification);
})->dd();
@SilenceBringer Well this works thanks, How can I reuse it in my model let's say is certified and still maintaining the lookup that you used here
@kevinwakhisi
as you can see, RA, RFA, and RAIN FOREST all mean the same
You are the "only one" who knows this, have a synonyms list somewhere and decide which is the master to enter the db.
Standardize certification values
Create a fixed master list (e.g., RA, ISO 22000, RA + ISO, RFA) and map all variations like RA CERTIFICARION, R/A, RAIN FOREST to the correct standard value.
Please sign in or create an account to participate in this conversation.