Robert Važan Mar 18, 2024 – Mar 18, 2024

EU AI Act analysis

EU Artificial Intelligence Act has just been approved (press release, full text, votes, Wikipedia). This was the last vote, in which things could have been changed. Now it's only a matter of technical checks (linguistic and legal) and a few years of ramping up its effects. It can potentially kill my business, so I did a more thorough reading of the law instead of relying on the limited and somewhat misleading press release and news articles. The law is 450 pages long, so you will perhaps appreciate my summary.

TLDR

EU AI Act is a great victory for copyright holders and a huge loss for opensource. It introduces mandatory censorship and mandatory watermarks. It exposes AI developers and users to legal risks and legal uncertainty. Safety of general-purpose models focuses on capabilities instead of intent, repeating the mistake of the infamous cookie law. Exemptions covering personal use, opensource, research, and development have unexpected limitations. There are lobbyists' fingerprints all over the law. Overall, it's bad news for artificial intelligence.

Great victory for copyright holders

EU AI Act reaffirms opt-out from data mining that was granted to copyright holders in Article 4 of EU Copyright Directive (Wikipedia, full text). There are two problems with this opt-out. Firstly, the opt-out is an unreasonable extension of copyright, because data mining (including AI training) extracts information from underlying representation and pure information traditionally isn't copyrightable. Secondly, the opt-out is generally not used by individual content creators but rather by social networks, which use it as another mechanism to appropriate user-contributed content and with it our generation's cultural heritage.

EU AI Act further aggravates the problem by requiring developers of gereral-purpose models to publish copyright protection policy and to document source datasets, a measure specifically intended to enable copyright audits. Model developers were keeping datasets secret for many reasons, one of which was to protect themselves from predatory lawsuits.

There are no workarounds. You are subject to the regulation even if you train in the US and then deploy in Europe or if you host the model in the US and expose it as a remote service to users in Europe. Opensource models are not exempt either. EU AI Act plugs all holes. You can still train and deploy outside Europe as much as you want, but reaching European users requires compliance.

Copyright holders, especially social media, will now start demanding fees to access content created by their users. While widely shared information can be always obtained from elsewhere, information circulated in small communities on social networks will be inaccessible for training. High model quality also requires diversity of information sources, which will suffer when large parts of the Internet become inaccessible for training. Model developers will be under great pressure to just pay and social networks will intensify efforts to capture as much content as possible.

Huge loss for opensource

Opensource has some exemptions from requirements of the EU AI Act, but the main clause enabling this is a mess (Article 2, clause 12). It seems to have been accidentally negated, now claiming that opensource models are subject to the regulation. It seems to have broken references to other sections of the regulation, making it unclear what rules actually apply to opensource models. Furthermore, mentions of opensource elsewhere in the regulation seem to contradict it. Lawyers will eventually hash out actual requirements placed on opensource models, but meantime opensource suffers from legal uncertainty and looming legal risks.

Definition of opensource in the EU AI Act is limited to free (as in free beer) and open AIs. If you offer support or other related services, your AI model will not be considered opensource even if you publish weights under permissive license and other people are using your model for free. One paying customer is enough to lose all privileges of opensource models.

Opensource developers already suffer legal difficulties when sharing training datasets, which amounts to redistribution of copyrighted material when done openly among unaffiliated developers. EU AI Act does not address this difficulty and instead reaffirms data mining opt-out, which will make large parts of the Internet inaccessible to non-commercial opensource developers, and burdens opensource AI developers with reporting requirements that serve only interests of copyright holders.

Opensource development requires open sharing of code and models among contributors. As I understand it, this constitutes placing the model on the market if the model is available to developers in the EU. Certain kinds of opensource collaboration, for example taking turns in training the same model or parallel distributed training, are thus subject to the regulation. Courts might still dismiss it as technicality that is not in the spirit of the law. This is nevertheless a source of legal uncertainty for opensource developers. Private enterprises can meanwhile develop their AIs without restriction even if that means transferring artifacts to employees in the EU.

Opensource is not exempt from the heavy regulation that applies to high-risk AI applications, which includes dozens of pages of rules in the law itself plus new standards and certification procedures. As no opensource developer will ever comply with all the rules, this effectively outlaws opensource in areas defined as high-risk. While regulating truly high-risk applications like self-driving cars is generally reasonable, I worry about gray areas, for example self-service medical models, which might fall under the regulation depending on precise wording of the relevant laws. There is some leeway for opensource components in high-risk applications, but whole systems apparently have to be commercial.

The regulation spells doom for traditional opensource software as well. As AI components are increasingly integrated in all software to support essential features, most software will eventually fall in scope of the EU AI Act. The regulation is not here to kill opensource AI. It's here to kill opensource as such.

Mandatory censorship of large models

General-purpose models may be classified as having "systemic risk", i.e. disaster-level risk. While general-purpose robots are obviously dangerous, because they can wield knives and guns, I am having hard time imagining how disembodied AI can possibly become a serious threat.

Wording of the regulation is somewhat vague, but for now the "systemic risk" label is assigned only to large models, 100B+ parameters in case of LLMs. Worrying about models being too smart is already a warning sign. Most risks associated with deployment of AI stem from AI's imperfect reasoning. When LLM recommends wrong treatment for a disease, it does not do so out of malice but rather because of limited knowledge or intelligence. Larger models make fewer mistakes and they are therefore actually safer to use.

What's more worrying is that "providers of general-purpose AI models with systemic risk shall ... mitigate possible systemic risk". In practice, that translates to mandatory censorship of all large models. The same requirements apply to fine-tunes of large models, so censorship cannot be legally removed. It is not clear whether base models will have to be censored too, for example by filtering training datasets, or whether they are exempted due to their nature.

Systemic risks include such trivialities as "dissemination of illegal, false, or discriminatory content" (nevermind that media, social networks, and religions already do so at scale). However, what content is legal, true, and fair depends on context. And since models usually aren't aware of the full context, they cannot make content policy decisions reliably. For all the model knows, it could be just role-playing some character in a fictional story. Censorship is inappropriate for some models and it is universally despised by users, because it is overbearing, disrespectful, triggers randomly on innocent prompts, and biases model output in unrealistic ways. When the AI is used for brainstorming, model censorship carves a digital exception to freedom of thought.

At the same time, EU AI Act prohibits manipulative models. The gotcha here is that censorship as commonly implemented is manipulative by nature, because in addition to outright refusals, it secretly reshapes model's interpretation of user's prompt and subliminally alters all output. This squeezes AI developers into tight compliance space between mandatory censorship and prohibited manipulation.

Mandatory watermarking

Mandatory watermarking (Article 50, clause 2) is a major violation of privacy, because it unconspiciously reveals what tools were used to create given piece of content. Nothing prohibits AI companies from abusing watermarking to reveal additional information, for example user's identity, timestamp, or even the entire prompt.

Watermaring will result in false accusations of fraud, because watermarks are present even if the AI was used only in supporting role to polish or translate human work. Watermarks are in some cases unreliable. For example, there's non-trivial probability that article-sized text is falsely identified as LLM-generated content.

Since when is incompetence a good thing?

When it comes to regulation of general-purpose models, EU AI Act repeats the mistake of the infamous cookie law (ePrivacy Directive) in targeting capability instead of malice and negligence.

There are three kinds of threats that can be seen in AIs, humans, and even in traditional software: malice, negligence, and incompetence. In AIs, malice translates to deliberately harmful applications. Incompetence usually stems from insufficient size or training of the model. Negligence comes in two forms: incorrect objectives and lax containment. Incorrect training or application objectives cause the model to do something else than what was intended. Containment refers to model's access to the Internet, APIs, and the physical world. Lax containment magnifies impact of AI's mistakes akin to how testing brakes or guns in public streets magnifies danger inherent in these tasks.

By focusing regulation on the largest and most capable models, EU AI Act rewards incompetence (poor model performance) and thus makes AIs less safe. It's like criminalizing people who are too smart or like regulating software that is too useful. The law should have instead focused on malice (application intent) and negligence (quality of objectives and containment).

Enabling trolls

There are no exceptions from GDPR (Wikipedia, full text). Imagine you train your model on web crawl data that accidentally includes some personal data, which gets baked into the model, and someone requests removal of their personal data per Article 17 of GDPR. How do you comply? There are no tools that can edit knowledge out of an already trained model. Is removal from next year's version of the model considered "without undue delay"? Are users forced to upgrade? What if you don't plan to publish another version?

There are no blanket exceptions for minor, random, and accidental breaches of this or other laws. Training datasets are huge. There's no way to ensure they comply with existing laws perfectly. Models sometimes run unattended and create content or take actions on behalf of the user without manual review. Outputs of AI models are however unreliable and they even include a component of randomness. There's no way to ensure that an unattended model never produces illegal output. While some laws explicitly exempt accidental violations, I think this is not universal and there are hard-edged laws out there that will be a permanent threat to developers and users of AI models.

Exemptions

If you are looking for a way to avoid compliance with EU AI Act, there are a few narrow exemptions to consider:

Personal non-professional deployment is completely exempt. Whoever supplies you with the AI system (development and distribution) is not exempt from the law though.
Opensource is intended to be exempt from some requirements according to non-binding recitals, but the actual binding law text is a mess, so exact rules are not known at the moment. This exemption definitely does not apply to banned and high-risk AI nor to general-purpose models with "systemic risk".
Research, development, and lab test are exempt. Real-world tests are not. Note that opensource development implicitly involves continuous publishing and distribution, so it's not clear whether development by an opensource team is exempt or not.
Scientific research is exempt as long as the model does not have other purpose. You cannot claim "it's research" and then reuse the model commercially or as a consumer-oriented product. The regulation still applies to such repurposed model.
Specialized AI systems have sort of an exemption, because the rules for general-purpose models (Chapter V) do not apply to them. i.e. no systemic risk considerations, no documentation requirements, no source dataset transparency, no compulsory assistance to regulators, and no authorised representatives, at least as long as the specialized model is not considered high-risk.
Some AI systems used in high-risk applications are not themselves considered high-risk under some narrow conditions. This applies to various auxiliary, non-essential functions detailed in the law. These systems still require certain amount of documentation and possibly registration.

Impact on SourceAFIS

I am developing opensource fingerprint recognition engine SourceAFIS and providing custom development on top of the opensource version. EU AI Act can potentially ruin my business, which was the original reason I reviewed the law.

Fortunately, it looks like I am okay. Only remote biometric identification is regulated. Remote means without person's active involvement (think face recognition using a camera). Most applications of fingerprint recognition are however local, because the person must be present and cooperating to have their fingerprints scanned. Fingerprint capture and recognition can be remote only in case of latent prints (taken off surfaces the person touched) and in the exotic case of using high-resolution camera at a distance, both of which are uncommon in commercial applications of fingerprint recognition systems.

Biometric verification (1:1) of claimed identity is even explicitly excluded from scope of the law. It's not clear whether this also covers claims of group identity (e.g. claiming to be an employee at the entrance to company premises), which technically require identification (1:N) to implement, but in any case, whether it's verification or identification, non-remote applications are not regulated under EU AI Act.

Goofs

Some aspects of the EU AI Act are so ridiculously foolish or absurd they make you laugh:

Military AIs are exempt, including those developed and distributed by private entities. You are reading that right. If you arm an AI with guns, it releases you from all obligations under this regulation. Even AIs with army-sized firepower are completely unconstrained. While this is an inevitable consequence of limited competencies of the EU, it makes a farce of the rest of the law.
The limited exemption for opensource is apparently accidentally negated, so now the law actually states that opensource is subject to the regulation without exception.
Emotion recognition systems are classified as high-risk. In workplaces and education, they are completely prohibited. This effectively outlaws empathy. In European Union, AIs must be cold and indifferent. I am also wondering how is this going to be enforced when all models inevitably soak up empathy from training data.
Since models are classified as carrying "systemic risk" based solely on their size, a smart virtual girlfriend is dangerous whereas compute-limited robotic arm capable of wielding weapons is safe.
The chosen threshold for general-purpose models with "systemic risk" is 10²⁵ FLOPS, which is per my very conservative estimates less than the amount of computation a single human brain performs over its lifetime. That means we have 8 billion dangerously smart natural intelligences (NIs) in the world already.

Future

EU AI Act still needs to go through linguistic and legal checks, corrigendum procedure, and get final approval from the European Council. It will then gradually enter into force over the course of two years.

Drafting of the EU AI Act took years and it already required substantial revision after ChatGPT was released. More such revisions are likely to be needed in the near future and they will likely be substantial enough to be beyond the scope of the faster, simpler delegated acts.

The 10²⁵ FLOPS threshold will be quickly overcome given current investments in LLMs. The biggest Llama3 is rumored to be big enough to be the first opensource model over the threshold. EU AI Act allows for later updates of the threshold, but these have to be justified and the justifications allowed by the law suggest downward revision of the threshold rather than upward one.

Future revision of the law might theoretically completely remove the threshold and switch to monitoring malicious and negligent applications. I wouldn't be too hopeful though, because cookie law is broken in the same way and it was never fixed.

EU AI Act will enable future regulation creep, which will be as easy as adding items into the list of high-risk AIs, adding reasons to classify more general-purpose AIs as having systemic risk, or expanding standards and certification procedures.

Regulation will render a lot of opensource models illegal in the EU. This will not be accepted by users who will continue to use the models illegally and share them over P2P networks. That will result in mass criminalization but with no actual penalties for small-time users.