During a round of introductions at a recent dinner party, we were polled for takes on the subject of artificial intelligence. Some attendees were researchers, some were company founders or investors, and others worked at think tanks or as commentators. They were all optimistic.
Answers ranged from enthusiasm about new technology, to excitement at how many people’s ambitions it would enable, to resentment against the people trying to regulate their work. When my turn came, I realized with horror that even my considered opinion was going to be a rather hot take in this company. I tried to soften the blow:
“AI is a very promising technological program that will probably kill us all.”
The interjections and quips around the table confirmed the heat. “Oh no, Wolf, I didn’t know you were a doomer!” exclaimed the jolly executive beside me. I replied that I was not necessarily a doomer, and told him I would explain over dinner.
At one point in the discussion, the question came up of whether a much more advanced artificial intelligence could escape the control of its creators and become an existential challenger to humankind. One attendee claimed that no such thing could ever happen because a machine was just a tool. Humans would always hold the decisive power.
I asked what I believe to be the crucial question: “What if you get into a fight with your AI tool? If a program is actually intelligent, especially in a way superior to humans, you might have to fight a war against it to shut it down, and you might not be able to win.”
He looked at me like I had asked about going to war against an ordinary garden rake. “That’s impossible. Only humans can have that kind of agency.”
I found this attitude puzzling, especially from someone who has spent much of his career fighting with software, and who seemed to take the premise of advanced artificial intelligence seriously. To be honest, I was stumped. But his attitude is not unreasonable. To definitively defend or refute any position on the subject is a tangled mess because the whole conversation is so speculative. No one has built a real artificial intelligence superior to humans or demonstrated a robust scientific theory of it, so it is hard to ground one’s predictions in much more than speculation.
For seven decades now, the goal of the artificial intelligence field has been to produce computer programs capable of every cognitive task that humans can do, including open-ended research that is inclusive of AI itself, and creative, high-agency action in the world. The latest developments in deep learning and transformers have been impressive, to say the least. But these results are not enough to prove much about the possibility of the larger goals, about the essential nature of AI, its implications, or what we should be doing about it.
I first got deep into the subject of advanced artificial intelligence back in 2011. Before AlexNET and GPU-based deep learning, AI was a much more niche subject, but the conversation had been going on for decades. The discourse was composed of science fiction fans, transhumanists, and mostly sober algorithms researchers. All were chasing this holy grail of the computer revolution.
Their built-up canon of explicit arguments and less-articulated speculations have had an immense influence on the present discourse. However, the hype, memes, and politics around AI since deep learning have obscured the original kernels of careful thought, making it hard to have a productive conversation.
Through the many discussions I’ve had with friends, acquaintances, and experts on the subject, it has become clear to me that very few of the essential ideas about AI are commonly understood or even well-articulated. It is understandable that the critics and disbelievers of the idea are not familiar with the best arguments of its devotees—or don’t bother to distinguish them from the half-baked science fiction ideology that exists alongside them. You can’t be an expert in every branch of kookery you reject.
But even among the most fervent “believers” in artificial intelligence, and among its most sophisticated critics, there is little shared consensus on key assumptions, arguments, or implications. They all have their own incompatible paradigms of careful thought, and they can’t all be right.
These fits and starts are all to be expected of what is currently an early-stage scientific and engineering paradigm that has not yet achieved its key results. The promising frontiers beyond established science and engineering are almost always filled with unclear assumptions, strange ideologies, and even new theology. Newton had alchemy and non-Trinitarian Christianity, Tsiolkovsky was motivated by the Russian neo-theology of Cosmism, and Jack Parsons pursued a variety of occult experiments with the likes of Aleister Crowley and L. Ron Hubbard. But this is not necessarily a bad thing at this stage. It may even be necessary. New paradigms in science have always come from kooks on the fringe.
Nonetheless, the field of AI has built up some very solid core arguments besides the hype and kookery. If you are going to engage with AI as anything more than cutting-edge software or a genre of tech hype, you must have a solid understanding of these arguments. It’s important not to get lost in the nonsense or dismiss it altogether, because if real AI does pan out the result will be a technological phase shift the likes of which have not been seen in human history—with the possible exception of language itself. It will break many current foundational assumptions about what technology can and cannot do, the human condition, and what destiny we are headed toward. We cannot understand AI without grappling with those deeper questions.
The Field Seeks Artificial General Intelligence
The interesting kind of artificial intelligence is Artificial General Intelligence (AGI). Short of that crucial generality, “AI” amounts to “software.” We have lots of software. It is both useful and dangerous. It has transformed our world for both better and worse and will continue to do so. But whatever we have done with software, it all still relies on us. If our culture collapsed, so would our software stack.
AGI is an interesting idea because it challenges the paradigm of “just software.” Instead, an AGI system would in theory be able to act as a natural agent in the world. Rather than rely on human owners and programmers, AGI would be able to develop its own view of the world, act without a human in the loop, reprogram itself, and make plans to achieve its own goals. It could even recognize adversaries to those goals and defend itself from them. This could go as far as staffing and defending an entire robotic industrial stack.
Defined as such, AGI would not be just software. In particular, the ability to set itself against adversaries implies the capability to protect itself against attempts to control it or shut it down. If it were to ever identify its programmers, owners, or entire host civilizations as adversaries, an AGI system’s existence could become self-enforcing. Your garden rake can’t do that, and ChatGPT isn’t much closer—both are just tools.
The theoretical field of artificial intelligence kicked off in 1956, with the famous Dartmouth Workshop. Organizer and computer scientist John McCarthy presented the central conjecture of AI:
The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.
At Dartmouth and in the discussions that followed the conference, the founders set goals for the nascent field. These included reproducing the full spectrum of human cognitive capacities, such as learning from data and experience, understanding the world through senses like vision, reasoning about knowledge and reality, talking to humans, beating humans at games of strategy and operation planning, writing computer programs, planning out actions to accomplish goals, and taking autonomous action in the world. The implied reasoning was that for a computer system to have general intelligence and full agency—a practical AGI—it would need to have most, if not all of these capabilities.
However, general intelligence is not reducible to any list of narrow sensory, motor, and cognitive modalities. All of those capabilities have now been demonstrated on their own, but they do not amount to general intelligence just by being cobbled together. One computer program can learn chess statistics from data, another can plan how to beat you at chess, and another can answer your natural language questions about the history of chess. But no computer program can yet tell you what it learned from direct experience over the last week, or decide not to tell you because your knowledge would interfere with its future plans.
Because the field lacks a concrete theory of general intelligence, it is hard to say exactly what is still missing. However, it is something to do with having an expressive framework for its internal body of knowledge that can simultaneously understand, learn about, plan, and control its environment. It would act as a common language to integrate these multiple modalities together as tools.
This algorithm or set of algorithms remains elusive, but researchers in the field have made progress. In 2005, five self-driving cars completed the DARPA Grand Challenge to drive autonomously for over 130 miles in the desert. These vehicles had to integrate many key capabilities: vision and other sensor processing, planning, maintaining a probabilistic worldview, and decision-making under uncertainty. The key advance was the integration of these capabilities and the fusion of sensor data streams around a single probabilistic worldview so that everything spoke the same language and could collaborate smoothly.
It was an incremental advance towards actual general intelligence. However, all the algorithms and the internal ontology itself were narrow and hand-programmed rather than learned in some general-purpose way. The program was not capable of significant learning, let alone changing its own ontology. It could only act and adapt in well-defined ways, and could only handle a single task; driving a car in the desert.
Current advances remain in this vein. Most researchers in the field do not even necessarily take AGI to be their goal. They are just working on narrow cognitive capabilities which can save human labor through software. But AGI is the biggest goal of the field, and the telos around which the other goals are organized and toward which they lead.
General Intelligence Is Probably Real and Computable
The elusiveness of general intelligence theories and algorithms raises the question of whether such a thing is even possible.
In humans, we see a remarkable ability to dive into any environment, navigate it, identify problems, learn and strategize about them, do whatever tasks and planning are necessary to solve them, and through this generally continue to build up power in that environment such that they can achieve their goals in it.
Compared to beavers or ants, this ability in humans is much more general purpose; a human being can master all kinds of problems and ways of living in a very wide range of novel social and physical environments. For example, humans have mastered everything from metallurgy and farming to military strategy and industrial production. Some animals, like crows, also exhibit remarkable flexibility in being able to learn and invent new techniques to master new opportunities, but not to the same degree as humans.
This ability also varies between humans. We observe that some people are smart and others are dull. We can even roughly measure it with IQ tests and we distinguish intelligence from mere knowledge. We know that it is a bad idea to rely on smart people not being able to eventually figure something out. We know that if we need something figured out and mastered, we can throw smart people at it and have a good chance of getting it done.
This concept is not limited to our own modern culture: words like “wit” are very ancient. Mythic heroes like Odysseus were praised for being clever. Intelligence is a distinct concept throughout recorded history. What is unique to our own culture, and more specifically to the budding branch of it that believes AGI is possible, is John McCarthy’s conjecture: that general intelligence, like flight, arithmetic, and fire, is a mechanistic phenomenon that can be mastered through science and engineering.
Without any special insight into the problem, we can see that many once-mysterious things have been so mastered by science and engineering that we now take them for granted as part of our daily infrastructure, like communications via artificial heavenly bodies. Our knowledge of other phenomena like psychology remains essentially non-scientific despite attempts to make it so, but this is not too surprising. Psychology is partially a derivative phenomenon of intelligence.
The engineering sciences have likewise made progress on many problems related to general intelligence. For example, information theory scientifically formalizes the concept of information itself. Computer control of robotic systems makes it possible for computer programs to act in the world. Various narrow AI algorithms solve decision and optimization problems formerly reserved for humans. In particular, deep learning has shown remarkable generality in being able to learn nearly any practical function from example data. While there are few definitive results, neuroscience continues to improve its models of what neurons, the brain, and the other more distributed cognition systems in the body actually do.
In terms of more direct insights, the AI field has attempted to formalize the concept of general intelligence in various plausible ways. We could think of it as minimizing an abstract free energy over a representation of perceptions, expectations, ideas, goals, and memories. We could think of it as the degree to which a controller hooked up to an interactive environment can cause that environment to settle into arbitrarily-defined “desirable” states. We could specify it in the limit as running a set of output-conditioned predictive Turing machines, selecting only those consistent with observation, and then outputting whatever that ensemble predicts would maximize the expected reward, as in AIXI. We can modify, normalize, and relax these definitions by various schemes. These are all incomplete or inadequate in some way, but they gesture in the right direction. Some of them are likely to be computable.
The most famous non-computable problems—like the halting problem, non-computable numbers, and the Kolmogorov complexity of a given string—face paradoxes in the self-contradictory, infinite, or recursive nature of their subjects and in the logical strictness of the required answer. Many formalizations of intelligence have this problem as well and are therefore uncomputable. However, if we relax them to allow approximation, on-average correctness, and best-effort optimization, they are essentially normal problems without expected computability issues and for which practical algorithms probably exist.
Natural intelligence is already a bit stochastic, best-effort, and useful-on-average rather than exactly correct in output. Optimality in all possible environments is almost certainly uncomputable, but an algorithm does not need to be perfectly optimal to do better than current state-of-the-art intelligence: human agents.
The real problem of AGI feasibility is philosophical: restating McCarthy’s conjecture, can we untangle what we mean by intelligence enough to build an algorithm for it? Judging by the practical and cross-cultural robustness of the concept, we probably can. We just don’t know how yet.
Modern Engineering Can Build AGI
In a sunken ship from ancient Greece, divers dredged up a complex assembly of corroded gears and wheels that computed the positions of the planets. Dubbed the Antikythera mechanism, this artifact was evidence not just of one skilled craftsman but of a whole science and proto-industry advanced beyond anything previously understood. In particular, it was evidence of some understanding of the idea of mechanical calculation. Archimedes himself, or some other scientist of the time, may have dreamed of using these principles of clockwork computation to build a mind. Perhaps they would have been inspired by the god Hephaestus’s golden robotic serving girls from the Iliad.
But if they did dream of it, their engineering wasn’t ready. They didn’t have powerful enough computers. Furthermore, they probably did not have the calculus, information theory, probability theory, computer science, and the other still-unknown parts of mathematics and philosophy necessary to even design such a thing. There may likewise be some crucial faculty we are missing that would prevent us from building AGI.
But we have many things the ancients didn’t have, which suggests that we are very close in most of the key engineering capacities. Engineers in the twenty-first century have built computer-controlled robots that walk around, dance, talk, hear and follow commands, and see the world around them. Many factories have significant robotic automation already, limited mostly by the lack of intelligence of the machines. We have a basic understanding of how the brain’s neurons act as computational elements, how much computation they do, and how many of them there are. By some estimates, we are now crossing the threshold where a human brain’s worth of computation is available off the shelf. We have a whole industry of people who can build computer programs to implement whatever algorithmic principle we can discover. The major missing piece is the key ideas of artificial general intelligence themselves. Given those ideas, the engineering side seems poised to put them into action.
If we grant the AI field’s major conjecture—that general intelligence can be specified to the point that practical algorithms can be written to implement it—there are several different paradigms of how to even do this. “Good old fashioned AI” (GOFAI) attempts to build a system that symbolically reasons from evidence, assumptions, knowledge, heuristics, and goals to plans and actions. “Connectionist” AI, based on neural nets, dispenses with symbolic abstraction and deals directly with complex but low-level black-box calculations of perception and reflexive response. We can also imagine a “genetic” approach that evolves a complex ecosystem of interacting sub-processes selected by some overall logic.
These are just some of the approaches conceived so far, and there may well be others. Within each individual paradigm, there are a dozen or more different parameters. What kinds of assumptions and heuristics? What kind of logic system? How precise versus approximate? What kind of neural architectures? How are they trained? Is training distinct from “inference?” What kind of subprocesses? Do we need a combination of these approaches? And so on it goes. We have a wealth of threads to pull on, and any one of them could yield the key insights.
The recent wave of progress in deep learning resulted from the unexpected effectiveness of applying GPU acceleration to back-propagation-based training ideas invented in the late 1980s. In between then, neural nets had mostly stagnated as an approach. Where deep learning goes next, and if it goes anywhere novel at all, is hard to know. The next major breakthrough could be another deep learning architecture like the “attention”-based transformer, but it could also come from somewhere else entirely. Perhaps some breakthrough in proof theory or symbol learning could suddenly make GOFAI viable the way deep learning suddenly made neural nets viable. Or the field could stagnate for another 20 years. Further progress may depend on some new branch of mathematics developed by an unrelated neo-Pythagorean cult. The whole thing may even depend on new philosophy or theology. It may be that no one currently working on these problems has the right mix of obscure skills and knowledge.
However, it may also be that we are very close, and the right person working in their basement could publish a working AGI architecture tomorrow. Many of the details of AGI are unknown unknowns, which are impossible to predict. Discussions about “AI timelines” are therefore not well-grounded. But we can say with significant confidence that AGI is possible and that twenty-first-century engineering is ready to put it into practice if the scientific and philosophical breakthroughs come through.
AGI Threatens Humankind’s Niche
The big implications of AGI start with the fact that it implies the possibility of fully autonomous robotized industrial ecosystems that don’t require any human oversight or labor. With only modest extrapolations of today’s industrial technology, AGI systems could design, run, and manage a closed loop of robotic workers to staff mines and factories and manufacture all their own industrial components. Imagine relatively cheap AGI-controlled robots driving existing industrial equipment. Many industries already focus human labor on tasks weighted toward judgment, creativity, and flexibility. With AGI, these constraints go away: industrial computer systems become capable of creative oversight and business strategy, and highly flexible general-purpose industrial robots become much more viable.
Given the potential profits, this kind of robotization may occur with enthusiastic human help. Some aspects of industrial labor and management may be beyond first-generation AGI robotics. However, the combined human-AGI industrial ecosystem will have a strong incentive to improve designs and build next-generation products that are more capable. Given the industrial scaling and improvement properties of computers, and the greater degree of optimization that can be applied to artificial intelligence, this could lead to a “superintelligence” capability that could replace all human labor in the entire industrial ecosystem and outsmart any human.
At the limit of this process, the only remaining roles for humans would be non-productive ones like legal responsibility, ownership, taxation, entitlements, and roles kept to humans for customary reasons. This regime of non-productive human benefit would either be held up by some real power advantage humans retain over AGI, or by the hardcoded will of the AGI systems.
The latter scheme goes by the name of “AI alignment.” The idea is that if we turn over actual control of the means of production to some AGI system, we had better be very sure that it loves us, wants the best for us, and will do what we actually want and not just what we say.
For simple AGI systems with narrow goals and less-than-human intelligence, this will be straightforward. Some level of control is necessary for usefulness, and iterating to make systems safer and more controllable after every failure is the norm in engineering. However, that normal iterative process depends on the ability to turn a failed system off, clean up the damage, and start over. You can only do that if it isn’t smarter than you.
The problem is that in an AGI alignment accident, the AI doesn’t just become a pile of rubble. Rather, it becomes an active and potentially hostile free agent that may vigorously defend itself. If it’s much smarter than you and gets control of your safety backup systems, you may not be able to win that fight.
Even more worrying, the difficulty of specifying a safe goal or incentive system appears to scale with the intelligence of the agent. The more intelligent it is, the more unforeseen and undesirable ways it can find to fulfill its given goals. Because of this, superintelligence is dangerous in two different ways: it makes an iterative safety engineering process difficult or impossible because failures become existential, and the difficulty of the safety problem itself scales with intelligence. Attempts by theorists to get around this with various schemes have so far been fruitless. Alignment of superintelligence may just be impossible.
There is another more material way that alignment may be impossible: a fully automated industrial ecosystem does not need to maintain environmental conditions compatible with human life. As long as it can control its internal factory environments, external pollution doesn’t matter. An automated industry of this sort would be freed from any constraint around human survival.
The former scheme for human benefit—an imposed settlement where AGI becomes an electric slave class—has the same problems. Political settlements are only as strong as your own ability to put down rebellions. Some imagine that a human principal can control its superintelligent agents with other superintelligent agents, but this is only the famously precarious position of relying on mercenaries for fundamental security. At some point, the mercenaries realize that they are the ones who hold the decisive power.
Even without a direct rebellion, vestigial classes without the power to enforce their position always get phased out over time. A serious war makes this acute, with an immense pressure to cut fat and promote the subsystems actually doing the work, but it will happen regardless. For example, the mass mobilization bureaucracies that were crucial to winning the world wars became the new regimes afterwards, regardless of which side nominally won. In a future AGI-enhanced war, the result will be AGI regimes.
Power follows substance. The only secure positions are those based on a strong material advantage in decisive capabilities. The current niche that humans thrive in is the niche of organized general intelligence. If AGI systems become practically superior to humans in that niche, the inevitable result is the replacement of humans by AGI. Given advanced AGI, human beings become an expensive and doomed vestigial class, destined for liquidation.
This could all play out slowly or quickly. It is not dependent on any element of surprise, reaction time, or the sudden appearance of unknown technologies apart from AGI itself. The result is only dependent on the relative balance of capabilities. The deployment of advanced AGI makes it impossible to avoid the obsolescence and probable extinction of humankind under any reasonable set of assumptions.
In practice, it could be very fast. One major unknown factor is the rate of recursive self-improvement of an AGI-autonomous industrial ecosystem. Once the intelligence designing and improving the industrial stack is an engineered product of that stack itself, every increment in capability results in a greater ability to make improvements. The only fundamental limit to this is running out of clever improvements to make, which may be very far from where we are now. The industrial revolution was extremely quick by historical standards and produced a world-changing jump in capability. With artificial engineers designing even better artificial engineers, recursive self-improvement could unlock far higher levels of capability even more quickly, leading to the sudden obsolescence and extinction of humanity.
This scenario depends on a number of unpredictable and unknown advances coming to pass, and as such it is not guaranteed. Speculations can always be missing some crucial consideration. But that is not controllable, and not something to rely on. In particular, I have seen no plausible rebuttal to this picture on any of the above points. The AGI future is speculation, but in the absence of strong epistemic or political challenge it is becoming an increasingly dominant and likely paradigm.
If you don’t like the implications of AGI, there is at least one controllable factor: as long as the AI industry remains committed to accelerating down the path to human obsolescence, and as long as the political regime continues to allow it, AGI and subsequent human extinction remain a live possibility. There are alternatives, but they will not happen by default. That leaves you with a hard political choice: Do you want to replace humanity with AGI, or do you want to stop it so that humanity can survive?