-
Bookmarking this in case I have to use it; I have a blog-related use case that I don’t want LLM scrapers to kill my blog with.
Anubis is a man-in-the-middle HTTP proxy that requires clients to either solve or have solved a proof-of-work challenge before they can access the site. This is a very simple way to block the most common AI scrapers because they are not able to execute JavaScript to solve the challenge. The scrapers that can execute JavaScript usually don’t support the modern JavaScript features that Anubis requires. In case a scraper is dedicated enough to solve the challenge, Anubis lets them through because at that point they are functionally a browser.
The most hilarious part about how Anubis is implemented is that it triggers challenges for every request with a User-Agent containing “Mozilla”. Nearly all AI scrapers (and browsers) use a User-Agent string that includes “Mozilla” in it. This means that Anubis is able to block nearly all AI scrapers without any configuration.
Tags: throttling robots scraping ops llms bots hashcash tarpits