Sep 11, 2017
0
Level 8
Max "crawle speed" / Rate Limiting
Hi,
I have a "crawler" that scans a few different websites. The urls are stored in a db table. Every hour I dispatch a job for each url.
What would be a good concept of rate limiting? For example max x times per domain per minute? I tried the delay method, but as i don't know the response time of the crawler, it happens that the jobs accumulates and are then handled close together.
Would be happy about any help!
Thanks
Please or to participate in this conversation.