Skip to content

Archives

The Unbelievable Scale of AI’s Pirated-Books Problem

  • The Unbelievable Scale of AI’s Pirated-Books Problem

    The Atlantic go digging in LibGen, the insanely huge collection of 7.5 million pirated books used to train Meta’s Llama LLM:

    One of the biggest questions of the digital age is how to manage the flow of knowledge and creative work in a way that benefits society the most. LibGen and other such pirated libraries make information more accessible, allowing people to read original work without paying for it. Yet generative-AI companies such as Meta have gone a step further: Their goal is to absorb the work into profitable technology products that compete with the originals. Will these be better for society than the human dialogue they are already starting to replace?

    Also, I found this quote from a Meta Director of Engineering in the legal discovery output interesting: “The problem is that people don’t realize that if we license one single book, we won’t be able to lean into fair use strategy”. huh.

    Tags: books knowledge papers meta llama llms law piracy ip libgen genai fair-use