Humanity, an artificial intelligence firm, has agreed to a $1.5 billion settlement in response to a class action lawsuit filed by the author of a specific book, who alleges that the company used a pirated copy of their work to train chatbots.
If a judge approves the landmark settlement on Monday, it could signify a significant shift in the ongoing legal conflict between AI companies and writers, visual artists, and other creative professionals who are raising concerns about copyright violations.
The company plans to compensate the author approximately $3,000 for each of the estimated 500,000 books involved in the settlement.
“This could be the largest copyright restoration we’ve seen,” stated Justin Nelson, the author’s attorney. “This marks a first in the era of AI.”
Authors Andrea Burtz, Charles Greber, and Kirk Wallace Johnson, who were litigated against last year, now represent a wider group of writers and publishers whose works were utilized to train the AI chatbot Claude.
In June, a federal judge issued a complex ruling stating that training AI chatbots on copyrighted books is not illegal. Unfortunately, Humanity acquired millions of books from copyright-infringing sources inadvertently.
Experts predict that if Humanity hadn’t settled, they would likely have lost the lawsuit as it was set to go to trial in December.
“We’re eager to see how this unfolds in the future,” commented William Long, a legal analyst at Wolters Kluwer.
U.S. District Judge William Alsup in San Francisco is scheduled to hear the terms of the settlement on Monday.
Why are books important to AI?
Books are crucial as they provide the critical data sources—essentially billions of words—needed to develop the large language models that power chatbots like Anthropic’s Claude and OpenAI’s ChatGPT.
Judge Alsup’s ruling revealed that Anthropic had downloaded over 7 million digitized books, many of which are believed to be pirated. The initial download included nearly 200,000 titles from an online library named Books3, created by researchers other than OpenAI to build a vast collection utilized for training ChatGPT.
After the newsletter promotion
Burtz’s debut thriller, The Lost Night, served as the lead plaintiff in this case and was also part of the Books3 dataset.
The ruling revealed that at least 5 million copies had been ingested from around 2 million instances found on Pirate websites like Library Genesis.
The Author Guild informed its thousands of members last month that it anticipated losses of at least $750 per work, which could potentially be much higher. A sizeable settlement award of about $3,000 per work could indicate a reduced pool of impacted titles after taking duplicates and non-copyrighted works into account.
On Friday, Author Guild CEO Mary Raysenberger stated that the settlement represents “a tremendous victory for authors, publishers, and rights holders, sending a strong message to the AI industry about the dangers of using pirated works to train AI at the expense of those who can’t afford it.”
Source: www.theguardian.com
