
Core Summary
Adobe has been hit with a proposed class-action lawsuit filed by Oregon author Elizabeth Lyon, alleging the company used pirated versions of numerous books, including her own works, to train its SlimLM AI model. The lawsuit claims Adobe utilized the SlimPajama dataset, which allegedly contains the controversial Books3 collection of 191,000 books that has been at the center of several AI training disputes. This case joins a growing trend of litigation against tech companies over unauthorized use of copyrighted materials in AI training datasets.
Key Points
- Elizabeth Lyon, an author of guidebooks for non-fiction writing, filed the proposed class-action lawsuit against Adobe.
- The lawsuit alleges Adobe’s SlimLM model was trained using the SlimPajama-627B dataset, which allegedly contains pirated books from the Books3 collection.
- Books3, a collection of 191,000 books, has been cited in multiple lawsuits against tech companies for AI training.
- Similar lawsuits have been filed against Apple and Salesforce for allegedly using copyrighted materials via the RedPajama dataset.
- In September, AI company Anthropic agreed to pay $1.5 billion to settle a similar lawsuit from authors, potentially setting a precedent.
- Adobe describes SlimLM as a small language model optimized for document assistance on mobile devices.
Positive Perspective
The increasing litigation around AI training data could lead to more transparent and ethical practices in the AI industry. These lawsuits may establish clearer guidelines for proper licensing and compensation for creators whose works are used in AI training. The Anthropic settlement of $1.5 billion demonstrates that content creators may eventually receive fair compensation for their intellectual property used in AI development, potentially creating new revenue streams for authors and publishers in the digital age.
Negative Perspective
The proliferation of lawsuits against tech companies for AI training practices could significantly slow AI innovation and development. Legal costs and potential settlements may force smaller AI companies out of the market, leading to concentration of AI power among only the largest tech corporations that can afford licensing deals. Additionally, if companies are required to obtain explicit permission for all training data, it could create practical impossibilities for building comprehensive AI systems, potentially hampering technological progress in generative AI.
Noteworthy Angle
This case highlights a fundamental tension in AI development that has yet to be resolved: AI systems require massive datasets to achieve their capabilities, but the legal infrastructure governing intellectual property was not designed for the era of machine learning. The industry is operating in a gray area where technological capabilities have outpaced legal frameworks, creating a situation where nearly every major AI company could potentially face similar litigation. The outcome of cases like Lyon’s against Adobe may fundamentally reshape how AI companies approach training data acquisition and attribution.
