Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

regulartoaster's avatar

Problem running 30 scraping script in Laravel

I have a site that scrapes quite a bit of data from my university about courses, teachers, etc. It automatically generates schedules for students. (There's a lot of computation on the server)

Monthly, I need to scrape all the data again to see what has updated. The script I have takes about 30 minutes to finish. On my local apache server I can set the execution time to infinity, but on my production server (Digital Ocean, Laravel Forge) that's not a good option -- I get timeout errors from nginx that I haven't been able to resolve.

I know that making the browser wait 30 minutes is never a good solution. Can I use Laravel jobs to run this script without timeout problems? Is there some way to run this script in the background with PHP CLI or something?

My current solution is to scrape everything from my local server and then export the DB to my production server. (ugly, I know)

Thank you!

0 likes
6 replies
Duffleman's avatar

If you use the CLI or a queued job, PHP has a max runtime limit too. Can you configure that limit on your production server?

If it is not configurable, it may be worth considering breaking the jobs up into smaller jobs? I find that if the job takes longer than a few minutes, then it's doing too much in one go.

ohffs's avatar

It's not very Laravel-ish, but scrapy is very good for this kind of thing.

regulartoaster's avatar

@Duffleman I've tried increasing the limit on the production server according to this guide: https://easyengine.io/tutorials/php/increase-script-execution-time/

but I still keep getting the gateway timeout error. I'm looking into splitting it up into smaller jobs; it just requires a lot of refactoring. Thanks for the idea. I'll start doing that and see how it turns out.

@ohffs I'm wishing I would have used scrapy. I used php simple dom parse with curl. The project has gotten pretty big--it would be hard to redo it all in scrapy, but I'm considering it at this point. Thanks.

bathan's avatar

You should run the heavy lifting process using PHP CLI directly on command line. As the documentation reads http://php.net/manual/en/info.configuration.php#ini.max-execution-time :

max_execution_time integer
This sets the maximum time in seconds a script is allowed to run before it is terminated by the parser. This helps prevent poorly written scripts from tying up the server. The default setting is 30. When running PHP from the command line the default setting is 0.

When running from console there is no max execution time. Hope this helps

willvincent's avatar

I'd probably implement this as a bunch of jobs that get added to a queue, and use maybe guzzle for the actual requests and such. Have an artisan command that queues up all the jobs, and run that once/day with cron.. the let your queue workers do all the processing in the background.

You know, if you want it to be laravel-ish.. ;)

geocine's avatar

I hope you don't mind if I join in on the discussion as I am doing some scraping right now and some other background jobs through PHP CLI. Now I wanted to create a web interface for these scripts. May you share your setup guys how would you accomplish this?

Please or to participate in this conversation.