There's only one LW endpoint that I use on all pages on my website. And in Google Search Console I have over 70% of all crawl requests (of over 1M daily crawl requests) on this single endpoint.
If I block it in robots.txt, when I live test my website's URLs, there's this error that looks like a black square covering all content. I got this error in Chrome by simulating a 403 error on this endpoint. So I unblocked it from robots.txt.
What solutions do I have so I can lower these crawl requests?
@JussiMannisto for this kind of website & industry, JSON should make up for over 70% of all crawl requests, but HTML (content updated several times, daily, also meaning new pages). I want to reduce the number of crawl requests on LW.
As mentioned, I would block it entirely in robots.txt (as my most important content in the page is rendered server side), but I don't know how to eliminate that error.
@AnnaNImo If you have dynamic pages that display data retrieved from APIs, then you cannot block those APIs in robots.txt. How would the crawler render the page if you did?
@JussiMannisto there are only some sections in the pages with data from APIs. but the main content is rendered server side. so if I block /livewire/ in robots.txt, the main content can be seen, just not those sections. plus, the error mentioned.
@AnnaNImo I don't use Livewire, but I wouldn't assume that it quietly ignores any errors and tries to render the rest of the page if you block the backend.
It's still not clear to me why you want to stop these requests. Do they cause issues?
If you don't want crawlers calling a Livewire endpoint, either don't use Livewire, or block the whole page from crawling. It makes no sense to me to serve a broken version of the page to crawlers. I don't know how bots will behave when your page tries to send a request to a disallowed URL, but it might get reported as an error, which can't be good for SEO.
why would crawlers search the livewire endpoint? Surely they are searching the actual livewire pages, and this is causing lots of traffic to the livewire endpoint.
You are getting a lot of crawl requests, they have to be served somehow?
If you want to limit crawls then you should be blocking access to the livewire pages themselves, not the route that THEY call.
@Snapey I don't have full pages with LW content, just some sections, and I am ok with not showing them to search engines (the main content is rendered server side and accessible to them if /livewire/ is blocked). but if I block it, the search engines will see an error, appearing as a black rectangle covering the screen.
I’d return a lightweight cached HTML snapshot for bots on that endpoint. It cuts crawl load, avoids Livewire calls, and won’t trigger the visual testing error.