memele
2 months ago

How to properly check for duplicate files in storage?

Posted 2 months ago by memele

I have a project where I want to crawl other websites for files. In this crawler I want to check if an identical file is already saved in my storage. I have this code at the moment, but it is not working properly. If a file doesn't exist, a new one with an id is not being created.

Also, for some reason, not all files are deleted from the /zip directory. Unlink() works on some files but not all of them, why is that?

$zip = new ZipArchive;
$res = $zip->open('storage/app/public/zip/files.zip');

$dir = "storage/app/public/zip/";

if ($res === TRUE) {
   $zip->extractTo($dir);
   $zip->close();

    foreach (glob($dir . '*') as $file) {

           if (pathinfo($file, PATHINFO_EXTENSION) === 'txt') {

               $fileNameWithoutExt = basename($file, '.txt');

               $files= MyFile::where('url', $filename . '.txt')->get();

                     if ($files->count() > 0) {

                            $id = $files->count();
                            $id = $id + 1;

                            foreach ($files as $file) {
                                $originalFile = 'storage/app/public/files/' . $file->url;

                                if (md5_file($dir . basename($file)) === md5_file($originalFile )) {
                                    echo "file " . basename($file) . " already exists \n";
                                    return;
                                }

                                $filename= $filename. '-' . $id;

                           }

                        }


                        // If file not found in db, create one
                        Myfile::create([
                            ....
                        ]);

                        copy($dir . basename($file), 'storage/app/public/files/' . $filename. '.txt');
                        echo "FILE CREATED \n";


                    // Delete any subdirectories created
                    // Delete all files from the /zip folder
                    if (is_dir($file)) {
                        File::deleteDirectory('./' . $file);
                    } else {
                        unlink($file);
                    }

                    }

Any thoughts on how to improve this? Thanks.

Please sign in or create an account to participate in this conversation.