Justin's Linklog

This Is How Meta AI Staffers Deemed More Than 7 Million Books to Have No “Economic Value”

This is jaw-dropping legal logic:

[Meta's] defense hinges on the argument that the individual books themselves are, essentially, worthless — one expert witness for Meta describes that the influence of a single book in LLM pretraining “adjusted its performance by less than 0.06% on industry standard benchmarks, a meaningless change no different from noise.”

Furthermore, Meta says, that while the company “has invested hundreds of millions of dollars in LLM development,” they see no market in paying authors to license their books because “for there to be a market, there must be something of value to exchange, but none of Plaintiffs works has economic value, individually, as training data.” (An argument essential to fair use, but that also sounds like a scaled up version of a scenario in which the New York Philharmonic board argues against paying individual members of the orchestra because the organization spent a lot of money on the upkeep of David Geffen Hall, and also, a solo bassoon cannot play every part in “The Rite of Spring.”)

as Paul Mainwood notes, this is the Sorites paradox: https://plato.stanford.edu/entries/sorites-paradox/ --
- 1 grain of wheat does not make a heap.
- If 1 grain doesn’t make a heap, then 2 grains don’t.
- If 2 grains don’t make a heap, then 3 grains don’t.
- ...
- If 999,999 grains don’t make a heap, then 1 million grains don’t.
Therefore, 1 million grains don’t make a heap.
Tags: ml copyright ip books training llms meta llama pretraining paradoxes sorites-paradox

Archives

This Is How Meta AI Staffers Deemed More Than 7 Million Books to Have No “Economic Value”