New lawsuit accuses Bloomberg, Microsoft, and Meta of training AI with pirated books


The unresolved AI copyright issue continues to simmer, drawing the next class action lawsuit. This time, Bloomberg and Eleuther AI are involved.

Mike Huckabee, former governor of Arkansas, and bestselling author Lysa TerKeurst are among the authors who have filed suit against Meta, Microsoft, and Bloomberg. They accuse the companies of using their work to train AI without their consent and illegally extracting “an enormous amount of value.”

Books3 dataset alleged to contain pirated books

The new lawsuit centers on the “Books3” data set. The plaintiffs claim that it contains hundreds of thousands of illegally copied books. They were allegedly used by the named companies to train their large language models.

Microsoft and Meta have not yet commented on the new lawsuit. A Bloomberg spokesperson says that Books3 was not used to train the commercial version of BloombergGPT, only the research model.



Also named in the suit is EleutherAI, an AI research organization that included the Books3 dataset in its large AI training dataset, The Pile. The Books dataset, according to the complaint, contains approximately 183,000 books published over the past 20 years and represents 12 percent of the entire The Pile dataset.

“While using books as part of datasets is not inherently problematic, using pirated (or stolen) books does not fairly compensate authors and publishers for their work,” the plaintiffs claim.

In their lawsuit, the authors seek unspecified damages and an injunction to stop the misuse of their works. The authors’ lawyer accuses the companies of developing large language models “by all means necessary—including theft of our authors’ books.”

One of many author lawsuits

Earlier, the Authors Guild announced that 17 prominent authors, including John Grisham, George R.R. Martin, and Jodi Picoult, have sued OpenAI for copyright infringement. A group of authors led by Pulitzer Prize winner Michael Chabon has filed suit against Meta and OpenAI in a federal court in San Francisco with nearly identical allegations.

The authors accuse OpenAI of using copyrighted books without permission to train AI, specifically as part of the Books dataset.


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top