I have a Laravel 10 instance setup with a command script that will loop through 2.7M json files I have stored in Amazon S3. This works flawlessly. However, inside the loop, the command will dispatch a laravel job to parse the json data in the file. Each file contains 1 to 100 rows of data that we want to insert into our PG database. In testing a single file at a time, it works perfectly. All records insert as expected. But when I run the command to do all of the json files, it seems some jobs just disappear.
We are using Amazon SQS for the queue system. These 2.7M files contain a total of over 118M rows of data.
The command simply passes the file path to the job (not the data in the file) and the job fetches the file and gets the data out to parse.
I have 3 worker systems setup to monitor the queue and take each job sequentially (FIFO). When I ran it all the way through, it took most of a day. But in the end it only inserted 2.3M rows of data. No errors were thrown by the system (that I can find) and since we aren't using the db for the queue driver, I can't find any record of failed jobs.
I thought maybe it was too many jobs going too quickly so I added a half second of sleep time at the end of each job to pace it a bit. But that didn't help.
How do we log the failed jobs so I can determine what is going wrong?
Is there some limitation to SQS that we aren't aware of?