Artists are suing OpenAI, accusing it of violating copyright law by training their content-generating tools on their work without permission.
The lawsuits, both filed in the Northern District Court of San Francisco, say ChatGPT generates accurate summaries of their books and highlighted this as evidence for the software being trained on their work.
"OpenAI made copies of Plaintiffs' books during the training process of the OpenAI Language Models without Plaintiffs' permission. Specifically, OpenAI copied at least Plaintiff Tremblay's book The Cabin at the End of the World; and Plaintiff Awad's books 13 Ways of Looking at a Fat Girl and Bunny," according to court documents [PDF] in the first suit.
In the second suit, Silverman et al [PDF], make similar claims.
As extractive as the AI companies are–they drink everyone's milkshake and will reap the commercial benefits of promising and powerful technology–there's a "wait a minute!" problem here. Whatever the courts decide will surely be applied to humans, too. Artists already have to be extremely careful about admitting their influences. If copyright is held to cover training and ingesting and learning and summaries (which come into it because copyrighted work isn't substantially present in the model or generated by it), artists will be the last to benefit–and the first to learn their own work now belongs to people they've never even heard of.
That these models were reportedly trained using pirate torrent trackers as training material is really something, isn't it? No scruples at all. The problem is maybe about how to stop them without benefiting the corporate incumbents who would benefit most from the proposed expansion of copyright.