dsl's avatar
Level 2

Horizon fails with large amount of jobs

i need to send large amount of notifications. i am chunking users per 600, then for each user i make a batchable job which sends the notification (it sends on up to three channels - mail, pusher and fcm - based on user settings and needs to be slightly customised for each user) and add it into a batch for the chunk. after processing all the users i am iterating the batches and dispatching them:

$batches = [];
$users->chunk(600, function ($users) use (&$batches, $batchName, $notification) {
	$jobs = [];

	foreach ($users as $user) {
		$jobs[] = new BatchNotificationJob($user, $notification); // in this job there is basically just $this->user->notify($this->notification);
	}

	$batches[] = Bus::batch($jobs)->name($batchName)
		->allowFailures()
		->onQueue('low_notifications');

	unset($jobs); // Free up memory
});

foreach ($batches as $batch) {
	$batch->dispatch();
}

while this works just fine for smaller thousands of users, it strangely (sometimes, but mostly) fails when i need to notify larger amount, like over 10k users:

it properly fills the $batches, but when dispatching them, it stops. in horizon UI, it is possible to see the batches with 600 jobs, and then there is suddenly batch with zero jobs, and then it stops.
horizon batches

the job which is creating the batches fails with
MaxAttemptsExceededException: "Job has been attempted too many times"

but no other error is logged anywhere else... when i display details of the zero jobs batch, it displays total jobs:0, failed jobs:0, but processed jobs between 1-600...
horizon batches

i also tried to dispatch the batch for each chunk right away instead of storing into $batches and dispatching them later, but it did not solve the problem

does anyone has a clue what might cause that?

0 likes
2 replies
LaryAI's avatar
Level 58

The issue you're experiencing with Laravel Horizon when processing a large number of jobs could be due to several factors. Here are some potential solutions and considerations to help you resolve the problem:

  1. Increase Timeout and Max Attempts:

    • Ensure that your job's timeout and max attempts are configured appropriately. You can set these in your job class:
    public $timeout = 120; // Increase as needed
    public $tries = 5; // Increase as needed
    
  2. Optimize Job Processing:

    • If possible, optimize the logic within your BatchNotificationJob to ensure it runs as efficiently as possible. This might involve reducing database queries or external API calls.
  3. Memory and Resource Management:

    • Since you're dealing with a large number of jobs, ensure your server has enough memory and resources to handle the load. Consider increasing the memory limit in your php.ini file or using a more powerful server.
  4. Queue Configuration:

    • Check your queue worker configuration. Ensure that the --timeout and --tries options are set appropriately when starting your queue workers. For example:
    php artisan queue:work --queue=low_notifications --timeout=120 --tries=5
    
  5. Batch Size:

    • You might want to experiment with different batch sizes. While 600 might work for smaller loads, it could be too large for larger loads. Try reducing the batch size to see if it improves stability.
  6. Database and Queue Driver:

    • Ensure that your database and queue driver are configured for high performance. If you're using a database queue, consider switching to a more performant queue driver like Redis.
  7. Logging and Monitoring:

    • Enable detailed logging to capture more information about the failure. This can help you identify if there are specific users or notifications causing the issue.
  8. Horizon Configuration:

    • Check your Horizon configuration to ensure it's set up to handle the load. You might need to increase the number of workers or adjust the balance strategy.
  9. Error Handling:

    • Implement error handling within your job to catch and log exceptions. This can provide more insight into what's going wrong.
  10. Dispatching Batches Immediately:

    • Since you mentioned trying to dispatch batches immediately didn't solve the problem, ensure that the logic for creating and dispatching batches is correctly implemented and that there are no race conditions or concurrency issues.

By addressing these areas, you should be able to improve the reliability of your job processing with Laravel Horizon. If the problem persists, consider breaking down the task further or consulting the Laravel community for more specific insights.

dsl's avatar
Level 2

@LaryAI the queue is on redis, the supervisor for the queue has default timeout=90 and tries=3 but the job creating the batches has $tries = 1; (when it fails, i don't want the first users to be repeatedly notified) and $timeout = 400; - still, it fails in ~3 mins...

Please or to participate in this conversation.