Famous Authors Sue Microsoft Over AI Training Using Their Books

A coalition of authors has accused Microsoft of utilizing nearly 200,000 pirated copies to develop an artificial intelligence model. This accusation adds to the ongoing legal struggles surrounding copyright issues between creative professionals and tech companies.

Kai Bird, Jia Tolentino, Daniel Okrent, and others argue that Microsoft intends to use a well-known digital version of their book to train Megatron AI for generating responses to user queries. Their lawsuit, filed in federal court in New York on Tuesday, is among several crucial cases initiated by authors, news outlets, and other copyright holders against tech firms regarding alleged misuse in AI training.

The authors are seeking a court order to prohibit statutory damages of up to $150,000 for each work that Microsoft is accused of misusing.

Generative AI products like Megatron can produce text, music, images, and videos based on user input. To develop these models, software engineers gather expansive databases of media and train AI to produce similar outputs.

The authors claim that Microsoft has utilized a trove of nearly 200,000 pirated books for training Megatron, which generates text responses to prompts. The complaint states that Microsoft employed these pirated datasets to “build not only computer models from the works of numerous creators and authors but also to produce a variety of representations replicating the syntax, sound, and themes of the copyrighted works.”

A Microsoft representative has yet to respond to inquiries about the lawsuit, while the authors’ attorney declined to comment.

This lawsuit against Microsoft was filed just after a federal judge in California ruled that the use of copyrighted material for AI training could be considered fair use, but acknowledged that they might still be liable for the utilization of pirated book versions. This marked the first US legal decision addressing the legality of using copyrighted materials without authorization for AI training. On the same day the complaint against Microsoft was filed, a California judge ruled in favor of Meta in a similar copyright dispute, attributing the decision more to the plaintiff’s weak argument than to the strength of the tech company’s defense.

The conflict over copyright and AI emerged soon after the launch of ChatGPT, encompassing various forms of media. The New York Times has taken legal action against OpenAI for copyright infringement related to article archives. Similarly, Dow Jones, the parent company of the Wall Street Journal and the New York Post, has filed a lawsuit against the perplexed AI. Major record labels are pursuing legal action against companies producing AI music generators. Getty Images has also sued Stability AI concerning a startup’s text-to-image product. Just last week, Disney and NBC Universal initiated legal proceedings against Midjourney, a company operating popular AI image generators that are believed to misuse iconic film and television characters.

Tech companies argue that being compelled to use copyrighted materials fairly to create new, transformative content and to compensate copyright holders could hinder the burgeoning AI industry. OpenAI CEO Sam Altman has stated that the development of ChatGPT was “impossible” without incorporating copyrighted works.

Source: www.theguardian.com

Leave a Reply

Your email address will not be published. Required fields are marked *