Authors are suing OpenAI because their copyrighted works have become part of the training material for GPT models without their consent. The company denies the allegations in all points, but still seems to be seeking fundamental legal clarification.
In early July, news broke that comedian Sarah Silverman and authors Chris Golden and Richard Kadrey had filed a lawsuit against OpenAI, alleging that their works had become part of the training material for OpenAI’s AI models. The allegations are
- direct copyright infringement
- vicarious copyright infringement
- removal of copyright management information (DMCA)
- unfair competition
- unjust enrichment
- and negligence.
OpenAI does not dispute (but does not confirm) that the books by the named authors have been used for AI training. Nevertheless, OpenAI moves to dismiss allegations two through six – but not the first allegation. More on that later.
In its motion to dismiss, OpenAI cites fair use, where copyright should not impede technological innovation, and the workings of large language models that generate substantially new content not directly incorporating specific copyrighted passages from training data. Large language models would rely on large amounts of text for training, rather than a single, specific text.
OpenAI cites several cases in which the use of copyrighted material in innovative and transformative ways has been found not to infringe copyright. Claims that copyright-relevant information, such as the author’s name, has been removed are simply false and unsubstantiated, they say.
OpenAI seeks clarity from court ruling
As copyright specialist Andres Guadamuz points out on Twitter, despite this line of argument, OpenAI is explicitly not asking for the first claim, the claim of direct copyright infringement, to be dismissed.
Guadamuz calls the move “surprising,” but suggests it is tactical: OpenAI may be hoping for a ruling that AI training falls under fair use.
“That would be big,” Guadamuz says, if the direct copyright infringement charge actually goes to trial. He gives OpenAI a good chance of getting the other charges dismissed, as requested.
That would make direct copyright infringement the focus of the trial. Guadamuz also says that OpenAI might think it has a good chance of winning this case, and that “many copyright lawyers I’ve talked to in the last couple of months seem to agree.”
The ruling could bring some clarity to the copyright debate over text and image data for AI training, which goes far beyond this case and involves other major AI companies such as Meta and Google.