Queued job silently failing
We’re seeing some unexpected behaviour from queued jobs that take more than a few minutes to process.
To explain our set up quickly, we’ve got Horizon running a couple of queues, and the majority of the jobs our system processes, mainly emails etc, are working fine. As far as I can tell there doesn’t seem to be any configuration issues.
Some of the jobs are more complex; they use Symfony’s Process component to run a mysqldump of a remote system, gzip it, then use Laravel’s file system components to upload to S3.
Running the raw shell commands from the server works fine, as does creating and running the Symfony Process from within Tinker.
But the exact same code ran from a queue behaves very strangely. It performs the dump and the gzip perfectly, but then just stops. It’s supposed to log to the database immediately after it finishes, but it never gets there. It stalls until it eventually hits the configured ‘retry_after’ value, then after it expires all of its queue tries it obviously throws an exception. Prior to that, no exceptions are thrown.
The databases in question are about 4GB, so not tiny but also not unreasonably large. It’s dumping from an RDS instance to our code on an EC2 instance, so the entire process only takes about 5 minutes when run from the shell. We’ve tried retry_after values ranging from 10 minutes to an hour and it never seems to complete.
I thought it might be a max_execution_time problem despite it working fine from Tinker, but that’s set to 0. Similarly I don’t think it’s a memory issue as it doesn’t show any signs of crashing.
I’m at a total loss here, does anyone have any ideas?
Please or to participate in this conversation.