7 months ago

How to properly check for duplicate files in storage?

Posted 7 months ago by memele

I have a project where I want to crawl other websites for files. In this crawler I want to check if an identical file is already saved in my storage. I have this code at the moment, but it is not working properly. If a file doesn't exist, a new one with an id is not being created.

Also, for some reason, not all files are deleted from the /zip directory. Unlink() works on some files but not all of them, why is that?

$zip = new ZipArchive;
$res = $zip->open('storage/app/public/zip/');

$dir = "storage/app/public/zip/";

if ($res === TRUE) {

    foreach (glob($dir . '*') as $file) {

           if (pathinfo($file, PATHINFO_EXTENSION) === 'txt') {

               $fileNameWithoutExt = basename($file, '.txt');

               $files= MyFile::where('url', $filename . '.txt')->get();

                     if ($files->count() > 0) {

                            $id = $files->count();
                            $id = $id + 1;

                            foreach ($files as $file) {
                                $originalFile = 'storage/app/public/files/' . $file->url;

                                if (md5_file($dir . basename($file)) === md5_file($originalFile )) {
                                    echo "file " . basename($file) . " already exists \n";

                                $filename= $filename. '-' . $id;



                        // If file not found in db, create one

                        copy($dir . basename($file), 'storage/app/public/files/' . $filename. '.txt');
                        echo "FILE CREATED \n";

                    // Delete any subdirectories created
                    // Delete all files from the /zip folder
                    if (is_dir($file)) {
                        File::deleteDirectory('./' . $file);
                    } else {


Any thoughts on how to improve this? Thanks.

Please sign in or create an account to participate in this conversation.