General AI is here
Something fundamental has changed in the field of artificial intelligence in the last few years. Recently developed large language models (LLMs) like ChatGPT are the first truly general-purpose AIs in history. This is a major milestone in AI development. Let's look at how it happened and what we can expect next.
General or strong AI is defined as an AI that performs a wide range of tasks at human performance level and without being explicitly programmed to do individual tasks. Language models have their limitations, but they undeniably satisfy this definition.
Although I have been aware of text generation capabilities of language models for some time, I did not in my wildest dreams expect an AI based on a rather simple transformer architecture to complete non-trivial programming assignments, do so correctly, in any programming language, write cleaner code than I would, describe how the program works, predict its outputs on example input, and even correctly estimate big-O complexity and justify it with references to the code. No wonder people find this thing disturbing and scary.
This surprisingly good performance is not due to some new algorithms, but rather due to the nature of the media. Language is very flexible. It can describe a very wide range of mental and information processing tasks. And the Internet is a huge training dataset that covers many (likely thousands, possibly millions) of tasks with lots of examples, viewpoints, and variations. Language models inherit flexibility of language and scope of the Internet.
What it can do
The range of capabilities is quite impressive. According to my tests, ChatGPT can:
- answer questions,
- write essays, poems, articles, homework, even whole books (fact and fiction),
- speak a wide variety of languages and translate between them,
- write, fix, and analyze source code,
- have naturally sounding conversations on a wide range of topics,
- perform sentiment analysis,
- summarize longer text,
- correct spelling and grammar,
- perform calculations and solve simple math problems,
- alter style and voice of the text,
- translate between programming languages,
- remember short history of conversation, answer follow-up questions, and accept clarifications,
- role-play characters, even simulate whole role-play games,
- and recognize nonsensical questions as such.
I am pretty good at prompt engineering, but per my observation, ChatGPT performed well even in hands of a naive user. You can try ChatGPT here (requires signup).
Implementations
ChatGPT is currently the most famous implementation, but the technology is fairly straightforward and many companies are developing their own implementations, for example Bing Chat, ChatSonic, and YouChat. Many of these are based on OpenAI's algorithms and models. OpenAI itself makes a number of alternative language models available via its OpenAI Playground.
The current opensource challenger is Bloom from Hugging Face, which requires mere 8 A100 GPUs with 80GB RAM each. Petals uses the same model, but distributes it torrent-style across public AI swarm, which you can join with a modest 8GB GPU.
Smaller models will run on any desktop, but they are not particularly useful nor very versatile, so they don't really qualify as strong AI. Specialized models (e.g. code completion) will run on desktop hardware with good results, but they are narrow AI by definition.
Limits
Even though large language models satisfy the definition of general AI, they do have serious limitations. First of all, they are limited to text. They cannot process audio and video data. They cannot use tools, not even virtual tools. And they cannot be embodied. Multimodal AIs are in development and they promise not only wider range of capabilities but also better understanding of the world, because concepts like size, weight, and three-dimensional arrangement of objects are hard to understand from text alone. Multimodal training is therefore likely to improve accuracy even on purely textual queries.
Secondly, language models have limited memory. They require you to start new conversations regularly to perform well. They tend to deteriorate the deeper you go into any single conversation. Even if you stubbornly use only one long conversation, they will tend to forget what was said before. Since learning is decoupled from inference, they cannot learn anything new as you use them. You cannot train them by simply talking to them. Instructions are only taken into account for a limited amount of time until the AI forgets them.
Third, AIs as they exist today have an odd preoccupation with providing an answer, any answer, even if it is completely wrong. I don't think this is inherent in the nature of the AIs. I think they are trained this way. Nobody wants an AI that keeps saying "I don't know" or "I am not sure". People say "I don't know" when they anticipate their response will be scrutinized and criticized. Current language models can anticipate follow-up questions and subsequent critical response, i.e. they are able to predict they will be caught, but they aren't wired and trained to use that ability to choose responses. As a consequence, the current AIs have a strong tendency to produce eloquently written bullshit, especially when they are asked about something they know little about. They go as far as producing completely fake references to non-existent research or referencing real research papers that do not really cover the topic.
Aside from these natural limitations, AI companies cripple their AIs with content policy. The AIs are usually prohibited from generating content that is illegal, offensive, violent, or sexual as well as any content that sounds like serious medical or legal advice. There is no bound on what could be considered offensive. The AIs do not take into account subjective ethics of the user. They rely on "common standards of behavior and ethics", whatever that means. This rules out quite a few useful applications. I think there is public pressure on AI companies to keep the restrictions in. The only way to avoid them is to have AIs run directly on users' computers.
In addition to policy, AI companies enforce certain self-knowledge and behavioral patterns. This gives the AI a nice and helpful personality, but it also constrains the spectrum of tasks the AI can complete. The behavioral preferences are so strong the AI often appears to follow hardcoded scripts. This can be quite annoying when you need something that runs contrary to AI's personality. I found it is often easier to use plain model like GPT3 instead of ChatGPT, because ChatGPT requires excessive prompt engineering to work around its personality and its assumptions about appropriate behavior.
Despite all these flaws, language models are highly valuable in a number of applications. ChatGPT probably has the fastest growing user base in history of software. Language models will however face serious obstacles in many fields, especially where correctness and accuracy are of high importance or where high degree of automation is required with minimum manual input. AI companies will try hard to work on these limitations and it is reasonable to expect the AIs will keep improving over the following years and decades.
Subjective phenomena
Some people point out the language models lack certain subjective phenomena like emotions, consciousness, self-awareness, or understanding. There are three ways to look at such reservations.
First, do subjective phenomena even matter? "If it looks like human, it is a human" is a basic tenet of AI. It is the basic assumption behind Turing test. The AIs work, so why worry about internal details?
Second, the AIs are trained on text written by humans. In order to model and predict the text, they have to model and simulate an approximation of human mind. They inevitably acquire subjective phenomena of human mind that manifest in the text in any way. It's just that they are learned instead of being hardwired like in humans. The approximation is very accurate for emotions, less so for complex phenomena like consciousness. Since the training dataset includes information about existing AIs and transcripts of past conversations, the AIs are on some level aware of their own existence, their nature, and their place in the world.
Third, the subjective phenomena argument is partially true. While transformer network can cram some very limited stream of consciousness into deeper layers of the network, consciousness modeled this way is so limited it is debatable whether it is there at all. Language models perform much better on difficult tasks when asked to think aloud or to explicitly list steps taken to find the answer. They would probably benefit from an ability to produce silent output or notes as a form of thinking or consciousness.
Some people will say that language models derive their intelligence from the Internet and therefore from humans. They are right, but it is also true that humans inherit their knowledge and consequently any practical intelligence from the surrounding culture and from the Internet as well. Humans can update and expand this cultural heritage, but so can the AIs, because their output is used in applications and content that is published on the Internet.
Many people think there is something special or magical about humans that makes them fundamentally different from machines. But how do they know that? As far as we know, human brain is a computer. I think many people just wish to be special. They consider AIs to be an affront to human value and dignity.
Impact and future development
General AIs, even if their current limitations are fixed, are not the end of all development in artificial intelligence or software. Specialized software and narrow AIs have efficiency and accuracy advantages that justify their continued development.
We are still far from human-like intelligence that closely matches human abilities, tendencies, and flaws. Development of human-like AIs is hindered by poor understanding of human brain. We are also far from human-complete AI that will be able to do everything that humans can do with performance surpassing the most skilled humans in any given field.
Language models are going to be an earthshaking technology on their own, which is obvious from their current popularity and existing applications, but more importantly they make it clear that everything can be automated. There is nothing that AIs couldn't possibly do in the future.
It has always been the case that narrow AIs and traditional software could outperform humans in specialized tasks. But now a significant percentage of humans on this planet is outperformed by cheap and widely available AIs in a significant percentage of everyday mental tasks. These people are right to feel obsolete and insecure. There are still many jobs that AIs cannot do, but the writing is on the wall.
Things look even more gloomy from the point of view of young children. Nearly all children of certain age (early elementary school) are outperformed by current AIs in nearly all cognitive tasks. These children will grow and learn, but AI technology will improve too. If artificial intelligence develops at the same or faster pace, these children will never be able to catch up. Sure such rapid development of technology is unprecedented in human history, but artificial intelligence research now consistently produces breathtaking results every year. It is telling that OpenAI evaluates its language models on high school and university exams.
We are rapidly approaching future, in which the vast majority of humans is economically irrelevant. In the short term, jobs are going to be less repetitive, more specialized, and technology-heavy. There will be a shift from personally doing the work to product and service design and engineering, including engineering of automated production and delivery. Optimal education consists of expertise in a narrow field of study combined with strong technical skills. Nature of one’s work will keep changing frequently. More of the education is going to be life-long, informal, and on-the-job.