Misusing the BIG-Bench canary string
Interesting; this blog post discusses using the BIG-Bench canary string, intended to keep data like accuracy test cases out of LLM training corpora, as a general-purpose “don’t scrape me” flag on personal blogs. This seems like a more practical, and more likely to be observed, way to opt out of AI training — seeing as the scrapers don’t seem to reliably honour any of the others
(tags: blogging canaries opt-out scraping web ai llm openai chatgpt claude bing)