According to the proposed class action lawsuits filed by Ms. Silverman, Richard Kadrey, and Christopher Golden on Friday in San Francisco federal court, the makers of ChatGPT, Meta, and Facebook, the parent corporation, are accused of using copyrighted content to teach chatbots.
The lawsuits claim, among other things, that OpenAI’s ChatGPT and Meta’s LLaMA were trained on illegal datasets containing their works. The datasets, according to the lawsuits, were obtained from “shadow library” websites like Bibliotik, Library Genesis, Z-Library, and others, and the lawsuits note that the books are “available in bulk via torrent systems.”
Golden and Kadrey both declined to comment on the case, and Silverman’s legal team did not react as of the time of release.
The trio submits evidence in the OpenAI lawsuit that ChatGPT will sum up their books upon request, violating their copyrights. The first book displayed in the exhibits being summarized by ChatGPT is Silverman’s Bedwetter. Golden’s book Ararat and Kadrey’s book Sandman Slim are also used as examples. In the complaint, it is claimed that the chatbot never bothered to “reproduce any of the copyright management information Plaintiffs included with their published works.”
Regarding the other case filed against Meta, it asserts that datasets used to train the LLaMA models, a group of four open-source AI models the business unveiled in February, contained references to the authors’ works.
Step-by-step explanations of the plaintiffs’ claims are included in the complaint. For example, the business cites ThePile, a training dataset created by EleutherAI, as one of the sources for its training datasets in a Meta paper describing LLaMA. Bibliotik and the other “shadow libraries” identified, according to the complaint, are “flagrantly illegal.” ThePile was characterized in an EleutherAI article as being constructed from “a copy of the contents of the Bibliotik private tracker.”
According to the authors’ statements in both lawsuits, they “did not consent to the use of their copyrighted books as training material” for the businesses’ AI models. Six counts of negligence, unjust enrichment, unfair competition, and other forms of copyright breaches are included in each of their complaints. The authors are requesting statutory damages as well as the return of their profits.
Additionally, Saveri has filed a lawsuit on behalf of programmers and artists against AI businesses. Saveri and Butterick are also representing authors Mona Awad and Paul Tremblay in a related case over the company’s chatbot. Getty Images also filed an AI lawsuit, alleging that Stability AI, which developed the AI image generation tool Stable Diffusion, trained its model on “millions of images protected by copyright.”
These lawsuits are questioning the very parameters of copyright, which is problematic not only for OpenAI and other AI businesses. We will continue to see cases involving this subject for years to come, as we have stated on The Vergecast each time someone starts talking about copyright law.
Requests for responses on Sunday were not immediately answered by Meta or OpenAI, a private company supported by Microsoft Corp.