Authors sue OpenAI for copyright infringement, claim ChatGPT unlawfully 'ingested' their books

July 05, 2023 at 14:47 PM EDT

Authors Paul Tremblay and Mona Awad allege OpenAI "trained" its ChatGPT tool using their copyrighted books without permission or compensation.

Authors Paul Tremblay and Mona Awad filed a class-action complaint in California federal court alleging OpenAI broke copyright law by training its software to "ingest" their books without permission.

ChatGPT, a large language model, is "trained" by copying massive amounts of text and extracting expressive information from it to form a compilation of input material known as the "training dataset," according to the complaint filed in U.S. District Court in San Francisco.

The lawsuit says neither Tremblay nor Awad, both writers who live in Massachusetts, consented to the use of their copyrighted books as training material for ChatGPT. Nonetheless, "their copyrighted materials were ingested and used to train ChatGPT."

Tremblay owns registered copyrights in several books, including "The Cabin at the End of the World." Awad owns registered copyrights in several books, including "13 Ways of Looking at a Fat Girl" and "Bunny."

OPENAI FORCES SHUTDOWN OF CONSERVATIVE CHATGPT-POWERED AI BOT, CREATOR CLAIMS

"Indeed, when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works — something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works," the 17-page complaint says. "Defendants, by and through the use of ChatGPT, benefit commercial and profit richly from the use of Plaintiffs’ and Class members’ copyrighted materials."

The complaint cites a June 2018 paper in which OpenAI revealed it trained its GPT-1 tool on BookCorpus, a collection of "over 7,000 unique unpublished books from a variety of genres, including Adventure, Fantasy, and Romance."

"OpenAI confirmed why a dataset of books was so valuable: ‘Crucially, it contains long stretches of contiguous text, which allows the generative model to learn to condition on long-range information.’ Hundreds of large language models have been trained on BookCorpus, including those made by OpenAI, Google, Amazon, and others," the complaint notes.

Andres Guadamuz, a reader in intellectual property law at the University of Sussex, told The Guardian the complaint represents the first against OpenAI regarding copyright law.

BANKING INDUSTRY PUSHES BACK ON CFPB'S WARNING OVER USE OF AI CHATBOTS

Joseph Saveri and Matthew Butterick, attorneys representing the authors, told the newspaper using books to train large language models is ideal because they contain "high-quality, well-edited, long-form prose," essentially forming "the gold standard of idea storage for our species."

"Defendants breached their duties by negligently, carelessly, and recklessly collecting, maintaining and controlling Plaintiffs’ and Class members’ Infringed Works and engineering, designing, maintaining and controlling systems — including ChatGPT — which are trained on Plaintiffs’ and Class members’ Infringed Works without their authorization," the complaint says.

GET FOX BUSINESS ON THE GO BY CLICKING HERE

The lawsuit seeks an award of statutory and other damages.

Fox News Digital reached out to OpenAI for comment Wednesday but did not immediately hear back.