Robert Važan

No good guys in the AI copyright game

Can AI developers scrape content for free to train their (often commercial) AI models? This question is not really about justice. It's about money and power. Whether it's AI companies, publishers, or authors, all of them are just stealing from others in one way or another.

AI developers come in many sizes and forms. The biggest and the most well-funded ones demand free access to content, claiming fair use rights, but then keep their models private, giving public only restricted access to what's essentially an amalgamation of cultural heritage, and fill their terms of service with legal language that prohibits using their models to train other models. In the rare cases when they do release weights of a model, they typically attach restrictive license to it that prohibits commercial use in order to demand payment for commercial use of their content while denying they have to pay for content of others. When they do pay publishers for content, it's often part of an exclusive deal that denies access to other AI developers. In their anti-competitive tactics, they go as far as lobbying governments to effectively outlaw opensource AIs, the only fair AIs and at the same time the greatest long-term competitor of commercial AI companies.

AI companies, the new content thieves, are waging a war with publishers, the old content thieves. In the academic publishing racket, publishers use network effects to appropriate results of publicly funded research. Social networks use the same network effects in their trillion dollar heist of what should have been public domain works, effectively privatising huge chunks of our generation's cultural heritage. And now they are selling their users' content to AI companies as if it's their own, like in the recent deal between Reddit and OpenAI. Users are farmed like sheep and regularly shaved. Even where creatives get paid, it's usually peanuts compared to what publishers and platforms earn.

Then there are creatives, or authors, themselves, who are usually portrayed as innocent and deserving, but let's not be deceived so easily. Authors rely on enforcement of copyright, which is entirely paid for by the public while authors just profit from it. Enforcement of copyright is bloody expensive. It creates legal risks for content users, especially when using random free content from the Internet. Some content users face legal risks so high they have to check every piece of content for copyright violation. Expensive lawyers get involved. It's often cheaper to forgo use of external content entirely. Copyright along with other intellectual property hinders innovation. Where content is paid for partially or wholly from public funds, attaching restrictive license to it just reduces its popularity and thus wastes public money just to extract a bit higher private profits. Authors themselves pay none of these costs. They don't even pay copyright registration fees. They are freeriders in the copyright system that burdens everyone else and that is a net loss for the economy.

I wouldn't trust anyone in the AI copyright game. They are all wolves fighting over the same sheep. No matter who wins, it will be the public that loses as political representatives get bought by AI companies or publishers or pressured by influential creatives. If you ask me, what public needs is abolishment of copyright or its narrowing to only big and expensive works that are registered by their authors or publishers for a high enough fee to cover enforcement costs. Opensource AIs should then be trained on all content to give everyone access to what's effectively a compact and dense representation of our cultural heritage.