diaglyph's avatar

Detect when file uploaded to a folder in "public"

I was wondering how I could go about detecting the presence of new files in a particular folder in public, eg I have public/uploads/results which will have a results file uploaded to it 3am every day. I would like to detect when the file has been uploaded so I can then grab it and process the data to add to the database.
I've been searching around but can't seem to find any examples or suggestions on how I could do this.
I've thought about using the laravel scheduler and then check for new files through that, but I can't seem to get the scheduling to work in Windows (the task scheduler only seems to run it once, even though I set it to go every minute).

0 likes
17 replies
diaglyph's avatar

It gets transferred by ftp from a 3rd party.

diaglyph's avatar

Ok seems the scheduling is working ok.
Would using the scheduler be the best way to do this?
I just wanted to check and see if there was maybe some other way.

zachleigh's avatar

Is there anyway you could get the file uploaded through your app instead of via ftp? Then you could simply open an api route to upload to, check auth if you need to, put the file in public, and fire off events.

If this isnt possible, maybe use php FAM functions?
http://php.net/manual/en/ref.fam.php

diaglyph's avatar

Unfortunately no. It is how the 3rd party sends out those files. I'll look into the FAM functions thanks. I assume they'll work fine from within Laravel?

ohffs's avatar

I don't think the fam stuff works on windows afair. I had to do something similar a while back which had to work cross-platform and used python's watchdog library.

davestewart's avatar

Most desktop apps I use (After Effects, Media Encoder) will handle this with some kind of watch folder functionality.

In the kiosk app I'm currently developing, images are rendered from a separate system, into an output folder. I used Node & gulp-watch to monitor this folder for a while, but it was one more process to worry about, and I found it to be pretty fragile, so most of the code was error-checking and workarounds. Over a network it was slow as well, and seemed to get slower the more files it was monitoring. To get around this, we would move files out of the "output" folder and into a (non-watched) folder called "processed".

We were dumping around 8000 files a day into it, it batches of 100, and it would crash about 4 times a day. Apparently there's a module you can use to detect a crash and restart, but I gave up in the end and ran a batch file to call an endpoint in the laravel app every 10 seconds, which would does a scandir, then process and logs all the files it finds:

watch -n 10 "wget -qO- http://localhost/watch/run | python -m json.tool"

The nice thing about that is all the code is now PHP, rather than a mix of PHP and Node.

Polling doesn't seem like an elegant solution, but it works for us.

This is a local app at an event, so I can run the command line manually, but not sure how you would do this on a webserver.

Connor-S-Parks's avatar

Honestly (if you don't need it to be instantaneous), I'd set up something similar to this: https://github.com/bigbitecreative/paddle/blob/master/app/Console/Commands/Releases/ClearOld.php and have it run via the scheduler (obviously replace the delete logic with whatever other logic you need, you may need a blacklist system here or if you know they'll always put it in X folder then I'd just have it go from there and when it's been processed maybe move it if you need to keep it).

1 like
davestewart's avatar

Looks like that might be useful for me as well Connor :)

1 like
ohffs's avatar

The problem I had with polling was that the files could be quite big, so there was a high chance of seeing a file which was in the process of being copied but not finished. Before using the watchdog method I had the script block on a new file until it's size hadn't changed for > $some_seconds. That was a bit error prone on Windows though afair.

Connor-S-Parks's avatar

@ohffs you could always just check if the filectime is sufficiently old before doing anything with it? Say, 20 minutes or so?

I don't see a point in it 'not changed for x seconds', it seems more logical to do the checks statically. I have no idea why a file would still be getting 'uploaded' when it's last inode change time was >= 20 minutes ago ;)

ohffs's avatar

@Connor-S-Parks there were a lot of oddities & edge cases - some files were being copied (very slowly) over serial lines from bits of attached hardware, some were created on the machine, some were ftp, some files were written, then after a short pause would get over-written, onwards and downwards ;-)

Coupled with it having to run ok on ancient & modern-ish Windows, MacOS, various Linux filesystems (going back to ext2 & reiserfs) and (shudder) an old VMS box. Admittedly on VMS I just gave up and did a 'hope for the best' shell script mind you... Boy, DEC Alpha's are strange... :-/

Windows is (shock) quite odd about how it handles filesystem timestamps too - afair mtime (or maybe atime - thankfully can't remember!) only has a 1-day resolution on FAT for instance :'-/

ohffs's avatar

@Connor-S-Parks it has to be seen to be believed - we had to resurrect an old mid-90s SUN SparcStation not long ago for a project... there was even a NeXTcube on the network not too long ago. Fun times..... ;-)

diaglyph's avatar
diaglyph
OP
Best Answer
Level 3

Thanks for the hints and suggestions, much appreciated.
I've created my own console command and have it successfully scheduled.
My console command gets the list of files from the folder using Storage::listContents().
I then compare this list with my list in the DB of processed files and discard what has been processed.
If there are any elements left in the array, then I fire my event to process the file or files.
Seems to be working quite well too.

antic's avatar

@diaglyph I'm trying to do something similar to this. Would you care to share your implementation or some example code? Thanks in advance.

diaglyph's avatar

@antic

in config/filesystems, I set up a 'disk':

 'mylocation' => [
            'driver' => 'local',
            'root' => public_path(). '/uploads/folder1/folder2'
        ],

Then I get a list of files from the location:

$file_list   = Storage::disk('mylocation')->listContents();

I do this as a console command which I schedule to run at midnight eg

        $schedule->command('findfiles')
                 ->withoutOverlapping()
                 ->daily()
                 ->sendOutputTo($resultslogfile);

This will then get a list of files at the location and send to my event listener which will then process the files as needed eg

                    event(new NewFilesEvent($file_list));

This works nicely for my needs.

Please or to participate in this conversation.