How Copyrighted Data is Essential for AI Training and Why Avoiding it is ‘Impossible’
OpenAI on Copyrighted Data and the Future of AI
OpenAI’s Bold Assertion
- OpenAI claimed it would be “impossible” to develop leading AI systems without using vast amounts of copyrighted data
- Advanced AI tools like ChatGPT require broad training that makes adhering to copyright law utterly unworkable
- Virtually every sort of human expression would be off-limits for training data due to expansive copyright laws and the ubiquity of protected online content
Reaction and Potential Lawsuits
This stance has opened OpenAI up to multiple lawsuits, including from media outlets like The New York Times alleging copyright breaches
Legal experts expect vigorous courtroom battles around infringement by systems designed to absorb enormous volumes of protected text, media, and other creative output
OpenAI’s Actions and Defenses
OpenAI hopes to rely on broad interpretations of fair use allowances to legally leverage vast swathes of copyrighted data
OpenAI is betting against copyright maximalists in favor of near-boundless copying to drive ongoing AI development
Continued Implications
As advanced AI continues to demonstrate uncanny abilities emulating human expression, the company’s unwillingness to alter its data collection and training processes may lead to more legal issues