Have you tried a Honeypot?
Also there are services that provide help with this. Also check some Github packages.
Also check cloudflare they have a service.
Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.
My NGINX server is experiencing high CPU load due to Bots. In my log it appears like this:
::ffff:34.195.212.30 - - [13/Jul/2024:17:29:09 -0300] "GET /sonhar-com-regar-flores/ HTTP/1.1" 503 428 "-" "ias-va/3.3 (former https://www.admantx.com + https://integralads.com/about-ias/)"
::ffff:65.109.99.207 - - [13/Jul/2024:17:29:12 -0300] "GET /letra/pessoa/page/3/ HTTP/1.1" 200 40501 "-" "Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)"
::ffff:3.81.17.186 - - [13/Jul/2024:17:29:13 -0300] "GET /sonhar-com-pai-dirigindo/ HTTP/1.1" 200 43444 "-" "Mozilla/5.0 (compatible; proximic; +https://www.comscore.com/Web-Crawler)"
::ffff:66.249.68.39 - - [13/Jul/2024:17:29:13 -0300] "GET /sonhar-com-escorpiao-e-lagosta/ HTTP/1.1" 200 43942 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.175 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
::ffff:185.191.171.14 - - [13/Jul/2024:17:29:13 -0300] "GET /biblia/kja/jo/39/22 HTTP/1.1" 200 8272 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)"
And I'm trying to block the bot like this:
server {
#[...]
if ($http_user_agent ~* (SemrushBot|BLEXBot) ) {
return 403;
}
}
But it just doesn't work and I can't understand why. I've tried everything and the only thing that works is putting deny followed by the IP address. However, some bots have many IP variations, and it seems impractical to keep updating new IPs all the time.
Can anyone tell me why it's not working?
Note: I always restart all services after any change, and it still doesn't work.
robots.txt is your first defence (if you dont want your site crawling)
Please or to participate in this conversation.