SEO: Too many crawl requests on livewire endpoint

There's only one LW endpoint that I use on all pages on my website. And in Google Search Console I have over 70% of all crawl requests (of over 1M daily crawl requests) on this single endpoint. If I block it in robots.txt, when I live test my website's URLs, there's this error that looks like a black square covering all content. I got this error in Chrome by simulating a 403 error on this endpoint. So I unblocked it from robots.txt. What solutions do I have so I can lower these crawl requests?

JussiMannisto

10 months ago

Level 50

If the endpoint needs to be called on every page, and you want your pages to be crawled, then it's going to get called.

Are the crawl requests causing issues? Can the response of the endpoint be cached?

AnnaNImo

10 months ago

Level 1

@JussiMannisto for this kind of website & industry, JSON should make up for over 70% of all crawl requests, but HTML (content updated several times, daily, also meaning new pages). I want to reduce the number of crawl requests on LW. As mentioned, I would block it entirely in robots.txt (as my most important content in the page is rendered server side), but I don't know how to eliminate that error.

JussiMannisto

10 months ago

Level 50

@AnnaNImo If you have dynamic pages that display data retrieved from APIs, then you cannot block those APIs in robots.txt. How would the crawler render the page if you did?

AnnaNImo

10 months ago

Level 1

@JussiMannisto there are only some sections in the pages with data from APIs. but the main content is rendered server side. so if I block /livewire/ in robots.txt, the main content can be seen, just not those sections. plus, the error mentioned.

JussiMannisto

10 months ago

Level 50

@AnnaNImo I don't use Livewire, but I wouldn't assume that it quietly ignores any errors and tries to render the rest of the page if you block the backend.

It's still not clear to me why you want to stop these requests. Do they cause issues?

If you don't want crawlers calling a Livewire endpoint, either don't use Livewire, or block the whole page from crawling. It makes no sense to me to serve a broken version of the page to crawlers. I don't know how bots will behave when your page tries to send a request to a disallowed URL, but it might get reported as an error, which can't be good for SEO.

Snapey

10 months ago

Level 122

why would crawlers search the livewire endpoint? Surely they are searching the actual livewire pages, and this is causing lots of traffic to the livewire endpoint.

You are getting a lot of crawl requests, they have to be served somehow?

If you want to limit crawls then you should be blocking access to the livewire pages themselves, not the route that THEY call.

AnnaNImo

10 months ago

Level 1

@Snapey I don't have full pages with LW content, just some sections, and I am ok with not showing them to search engines (the main content is rendered server side and accessible to them if /livewire/ is blocked). but if I block it, the search engines will see an error, appearing as a black rectangle covering the screen.

Snapey

10 months ago

Level 122

@AnnaNImo So don't block it. Block the pages with the livewire content.

If they are the things you need to have indexed, then I would switch to lazy loading the livewire component.

However. adding the livewire endpoint to robots.txt should NEVER cause issues.

Estinvosh

16 hours ago

Level 1

I’d return a lightweight cached HTML snapshot for bots on that endpoint. It cuts crawl load, avoids Livewire calls, and won’t trigger the visual testing error.

Please or to participate in this conversation.