One of the biggest and most interesting strategic problems that we’re faced with over the next few decades is the issue of artificial intelligence (AI), and its potentially existential risks to human civilization.
Humans, and organizations of humans, are currently the most powerful intelligent forces acting on the world. This is because we are the only forces that have the necessary general intelligence to have an impact on this scale. We have power over our environment, our rivals, and the future, because of our unique socially scalable intelligence. For now.
AI presents us with the possibility of powerful, non-human agency. What if machines could think and strategize and wield power? Would those machines participate benevolently in the human social order? What about when industrial production of silicon computing hardware, fine-tuning of the parameters and algorithms of software intelligence, and digitally precise coordination at scale allow those machines to outpace us in total coordinated intelligence? At that point, we would be faced by non-human intelligence powerful enough to be beyond human constraint, potentially able to scatter and outmaneuver us the same way we overcome our own rivals, if it were so inclined.
The question, then, of whether that machine intelligence would be benevolent, indifferent, or hostile, and the extent to which those concepts can be applied, is of supreme importance. If it’s weaker than we are, control is relatively easy; if it starts doing the wrong thing, we can just shut it down or change its programming. But if it’s smarter and stronger than we are, the reverse applies. We need to it be stably benevolent so that it’s still good when we can no longer control it. But when examined with rigor, the problem of building a stably benevolent machine intelligence turns out to be very difficult, once the possibility of superhuman intelligence and self-modification comes into play. There is usually some loophole in the motivation scheme that will lead it to break out of our implicit intentions in hard-to-predict ways. Building a dangerous and uncontrollable general intelligence seems much easier.
Further, the sheer interestingness and potential power of general intelligence mean that if such a thing is possible, it will probably be built, possibly this century. As such, we’re in a sticky situation: we’re probably going to build something of world-changing power that by default won’t be aligned with what we want for the future, and which we won’t be able to control, contain, or resist.
So, we need to figure out how to get this right to avoid an existential disaster. The capacity to get this right, in the philosophical, mathematical, engineering, and political dimensions, is therefore of high strategic priority, at least on a medium-term timescale. The most important immediate step is understanding the problem well, so that as we go about planning and preparing for the next few decades of social and political change, we do so with this need in mind. We need some idea of how to handle superhuman AI before the situation becomes critical.
To better understand the problem posed by AI, I recently talked to Jaan Tallinn, best known as the founder of Skype.
Jaan has spent the past 10 years meeting with key figures, funding important projects, and assembling strategic knowledge relating to AI risk, so he’s one of the best people to talk to for a well-informed and in-depth overview of the topic.
Going into this discussion, I was frustrated by the relatively shallow depth of most discussions of the topic available in non-specialist media. It’s mostly the same few introductory questions and arguments, aimed at a very general audience with low technical capacity, intended to titillate, rather than to inform. To get a useful handle on the topic, so that it’s more than just a science fiction hypothetical, we need more technical depth.
I called up Jaan in the middle of the night, and we recorded a wide-ranging discussion of the background, technical challenges, and forecasts of the issue of AI risk, and what approaches we might take to navigate it. The following transcript of our discussion has been minimally edited for clarity.
You’ve expressed concern in the past about artificial general intelligence (AGI) as a strategic concern, or even a threat to humanity in the 21st century. So, I’d like to briefly go through the argument to get your thoughts on it. As I understand it, the AGI concern is not about deep learning. It’s about a more advanced capability—the ability to learn a causal model of the world from observation, and then plan and reason with that to accomplish objectives in the world. This is something that humans do that current software technologies don’t do, and deep learning doesn’t do.
I think it’s an open question. Current deep learning cannot do this, but the question whether deep learning will take us all the way or not is unsolved. There are people whom I respect on both sides of this argument, so I’m truly uncertain. I know people who have strong opinions that deep learning will take us all the way, and I know people who have strong opinions that it will not.
So, the fear with AGI is that, like deep learning, it will be highly scalable with computing power and algorithmic progress, so that once we do have the core capability figured out, it could rapidly get away on us.
It’s even worse. We might sort of “pull an evolution.” Evolution didn’t really know what it was doing; it was just throwing lots and lots of compute at a very simple algorithm, and that invented humans. So it’s totally possible, if not very plausible, that we might pull an evolution in that way, that we never really understand what happened. It seems unlikely, but I don’t really know how unlikely.
That would be something that is on a gradient. You deliberately build an AGI, but it ends up doing things and working for reasons you don’t really understand. You can imagine a spectrum of possibilities between that, and building something that accidentally ends up being AGI.
Evolution is a very useful anchor here. Ultimately, if we had enough compute, we could replay evolution. Of course evolution had, as I say, one trillion sunsets to work with. That’s a lot of compute. But we are also way smarter than evolution, so we don’t need that much.
One way I’ve heard it put, I think in a previous interview with you, was that whatever comes out of this AGI thing would become the driving power of the future, in the sense that it would be the thing defining what the future is like.
Yes. It’s very likely that this will be the case.
Whatever nature that it has at that point, we will no longer really have power over it, so we better make sure that if we’re going into that, that we build something that creates the kind of future we would want.
That’s pretty much the definition of aligned AI. We want this entity (or entities) to be aligned with humanity.
The idea is that as we create this AGI, it should stably do something that we want, rather than something else?
There is more nuance to this. It’s possible that we will have the problem manifest before we get to AGI. The Open Philanthropy Project has coined this term Transformative AI. The idea is that to be AGI, technically it should be fully general, whereas it’s clear that in order to take over the world, you probably don’t have to be fully general. So it’s possible that the AI that will eventually present an existential threat to us will not even be an AGI, technically speaking.
It could just be a superweapon level of thing that gives someone an immense amount of power, or creates an immense impact somehow, without being fully general.
There’s this really cool book called Avogadro Corp. I have to say I don’t endorse the ending, but the book itself is really cool. It presents an AI—it’s not even clear how general it is—that is able to intercept people’s emails and rewrite them. All hell breaks loose, and soon enough, the AI has military capability, and people don’t even know where it is, much less how to stop it.
That’s one of the big fears that people don’t really understand well enough when first thinking about this. If the thing can get control of enough data streams, in a widespread enough way, there’s actually a lot that you can do with that. People say: it doesn’t have hands, so how is it going to hurt us? But there’s actually a lot that it can do.
I just read Stuart Russell’s new book, or a draft of his new book, and he makes a really good point, that one way of looking at this whole Facebook scandal is that Facebook’s AI—though that might be too charitable to Facebook’s AI capabilities—is changing people’s behavior to become more radical. Radicalized people are more eager to click on advertising or sponsored links. So think about it. It’s not just that the AI is optimizing the news stream. The AI is optimizing the viewers.
That’s an example of the unintended consequences problem. You think you’re doing one thing, but along the way to accomplishing that, an uncontrolled intelligent system will potentially do a bunch of other stuff that you didn’t intend and don’t want.
One of the biggest problems with AI is that it seems to be extremely enticing to a lot of parties. Pure intellectual interestingness, economic value, military value, various utopian beliefs around the thing, all motivate a lot of work on the topic in a relatively uncontrolled way. So it seems likely to happen by default in a relatively uncontrolled way. It’s not something that happens only if we really want to go there, and we make some concerted effort to pursue it. Rather, it seems like it’s something where there will be many actors who are really trying to pursue it.
The good news though is that the leading AGI developers in the world have realized the problem. In one world model, OpenAI and DeepMind are the leading ones, and they both have full-time safety staff, and are aware that this is a problem that needs to be solved.
We’ve definitely heard them make noises to that effect, that they have safety staff and so on, but I don’t know the internal politics of these companies, or how competent those safety staff are.
I can definitely vouch for them. The safety staff is very competent. It remains to be seen though how well their organizations are able to integrate the safety efforts. I think OpenAI is in a much better position than DeepMind when it comes to that, because OpenAI was largely developed around the safety effort. From the start, people were concerned about safety, whereas DeepMind tacked this capability on later. But the people are good. I talk to them a lot.
These are the things we really want to know. It makes a big difference what that AI safety team actually is. Whether they are a serious operation or just kind of an afterthought.
I think in very stark contrast is Google Brain, which doesn’t have safety researchers, nor does it have a safety culture. It basically has a culture that is effectively against talking about safety, from what I hear. It’s really interesting. When I talked to Jeffrey Dean, who is the head of Google Brain, he basically acknowledges the safety problem, but is a very hands-off manager, so the culture there never ended up taking safety seriously.
That’s worrying. That’s enough context that we can start digging into the interesting details. One point that comes to mind is the philosophical difficulty of this alignment problem. In a lot of the discourse around it, you find the idea of human values. The idea that there’s some particular set of things that we find morally compelling, that could in principle be systematized and scaled. But this isn’t obviously possible, or maybe we have the wrong view of what that looks like, or how that works. And I know there are a lot of people who take other views on whether aligned AI is even in principle possible, or how that would work, or what that even means. I don’t have strong philosophical views here, but I see that it looks like there’s a lot of philosophical complexity, and a lot of smart people with differing views on the subject.
One likely way out is to punt the problem to a meta-level. Basically, instead of figuring out what we want and then programming that into the AI, we program into the AI that it’s supposed to figure out what we want.
But that would have similar issues, though perhaps fewer tricky issues. Just in determining whether that would be a good idea for example, what that would look like, what that would mean.
It seems that there’s a list of answers that we need. Some of those answers are our homework, like we have to do them. For example, how to make sure AI doesn’t deviate from its initial trajectory, plus general robustness and motivational issues. However, for a bunch of stuff, including much of the metaethics research, it’s very possible that we don’t have to finish them before we get to AGI.
Some of these problems can be punted to that meta-level. The AGI can basically handle a lot of that downstream. But there’s this big question of how much philosophical difficulty is there in the stuff that we would have to do.
One of the best people to speak about it is Wei Dai, a person who has several cryptocurrency units to his name. (Laughs). For the last several years, he has been thinking about philosophy and AI—what is our philosophical “homework,” and what does it mean for AI to do philosophy?
We’ve talked a little bit so far about how current deep learning relates to the AGI research program. But there’s a related question which is how hard AGI is, or what kind of problem it is in terms of the scale of work or the type of work. You could imagine it could be a simple problem of conceptual breakthroughs, on the scale of blockchain or deep learning. Some really smart person just has to figure out that core algorithm, and we just haven’t discovered it yet. Or you could imagine it being an immense engineering effort optimizing and working with deep learning or other algorithms.
There’s no black-and-white distinction there, because when you are applying brute force, it’s possible that the system you are brute-forcing will do the invention of these algorithms, just like evolution did the invention of the algorithms in our brains and other brains.
The idea being that if you get the right containing paradigm and search algorithm that you’re working in, and you throw a lot of effort and computing power at it, you may end up producing the real thing.
Eric Drexler, the inventor of nanotechnology and a good friend, has this framework he calls Comprehensive AI Services. The basic idea there is that it’s perhaps a mistake to look at AI developers and AI separately. It’s a mistake to think about people developing AI and the AI starting to develop itself as distinct regimes. Perhaps it’s more of a holistic system, where you have a combination of people and AI that are making the AI better. Some of the optimization pressure comes from the people doing it, and some of the optimization pressure, we already know, is coming from the AI. Google Brain has this program called “learning to learn” where they create AI that is able to create AIs. So there’s very explicitly a division of labor between the AI researchers and the AI. In that sense, it’s clear that this combination of AI researchers and AI that they are working with, the tools that they are working with, this combination needs to make one or a few conceptual breakthroughs, very likely. But it’s not obvious that the people side will have to do it.
You have this combined system of humans and algorithms working together, where the humans are shouldering some of the burden of the things that the AI system can’t do, and optimizing it on the things where it’s slow.
Look up architecture search. This is an explicit thing.
“Architecture search”—is that an idea of searching over what would currently be deep learning network architectures, or is that something more general?
I think it’s more general, although in practice it’s mostly applied to deep learning architectures. I’m not entirely confident, but the term certainly sounds more general.
As a concept, it definitely sounds like a good idea to do. But I’m curious whether anyone has figured out how to do any kind of systematic architecture search outside of a very well defined space like neural networks. My next question is about one of the obvious things people come up with to deal with this AI problem, which is the idea of regulating it somehow. You can treat it like nuclear weapons, or environmentally destructive chemicals, or the various other things that governments have stepped in to control in the past. How we might do that, and whether it would work, is an interesting question. And then whether that’s necessary. Are these AGI labs going to be self-regulating?
In my view, there is at least one aspect that can be regulated in AI. Whether it’s a good idea, I’m not sure, but I’m pretty sure it’s a good idea to prepare to be able to regulate it—and that’s computing power; concentration of computing power. Because that’s hard to hide, whereas algorithms are very hard to regulate. It’s hard to monitor what exactly is being run on those CPUs, but just the number of CPUs and the amount of energy that they draw, this is something that can be observed from satellites, etc. If there is a capability, once things are starting to get out of hand, you establish some regulatory regime, and regulate compute as we regulate nuclear materials, for example. As a preparatory approach; I’m not saying that we should regulate it right now.
One of the funny metaphors that I have is that it is possible that the remaining lifetime of humanity is now better measured in clock cycles, rather than wall clock time. There’s some amount of CPU cycles that humanity has left, and the more CPUs that come online, the shorter humanity’s lifetime.
And that would be predicated on the idea that the critical path work is now being bottlenecked on computing power?
Kind of, but it’s not strongly predicated. Because there’s this combination of tools and people who are working on it. This combination needs to come up with some conceptual breakthroughs. It’s very possible that if we have stronger tools in terms of compute, it becomes easier to come up with those conceptual breakthroughs for humans, as well. For example, a few years ago I talked to someone at DeepMind who was skeptical of AGI, and he said that it felt like we had 300 years left at least because the tools were so slow. He typically had to wait a week for experimental results to come in. He no longer thinks that we have 300 years.
300 years is a long time. The application of regulating computing power to AI is obvious, but there are all these other things that get done with enormous amounts of computing power, that are also of increasing importance, and seem to be increasingly AI-like and AI-related. What does Facebook do with enormous computing power? It optimizes people’s moods and political beliefs and whatever else. What does Google do? Something similar. Even beyond the issue of AGI, as computing power becomes relatively closely tied to power in general, then for a particular organization or a particular state to amass an enormous amount of computing power is something that could be important to regulate in general, not just for the AI issue. It would have all these other effects.
The way I look at it, I think we are already starting to see the paperclip problem. Facebook’s AI is a problem in at least some similar sense. The interesting question is: how does Facebook’s AI debacle generalize? It’s very possible that it would generalize in a really unfortunate way, where there would be more and more of people’s behavior and physical environment under the control of some AIs that run in some corporate data centers.
That brings us into some really interesting questions of who gets to govern the social fabric and the mass of people, and for what purpose. Do we allow Facebook to amass vastly more real optimization power over how people think and behave than, say, the government has? It brings up questions of political organization.
A couple of years ago, I gave a talk at the Asilomar conference about how it’s time to start dedicating AI resources, AI research, AI optimization power, towards what I call preference discovery. Right now, we have these economic and political incentives, and then AI is adopted to work along the lines of those particular incentives, be they monetary or political. Whereas it would be as interesting or much better to have unmotivated research, to the degree that’s possible, that uses AI, to figure out what are the correct incentives that we should be designing for. One problem that we have in regulation right now is that we know that there is AI being applied in the wrong direction in various places, but we don’t know what the right direction is. It would be valuable to start doing more research there. There is a bunch of social science research already, but there isn’t much AI optimization power applied towards helping that research. I do have a few ideas myself on what to do.
That’s another area where the question of whether the problem is computing power–bound, or conceptually bound, or paradigmatically bound, is interesting. Implicit in the application of current AI technology to the problem is the idea that we would be able to frame the problem so that current AI technology could give us useful solutions.
The situation is that right now, the world doesn’t provide incentives to do AI alignment. Preference discovery is just one piece of AI alignment. Because there is no incentive, it means that there has been very little effort towards that, so it is very likely that we have low hanging fruits there.
The general idea seems good. We should study the incentives on these big chunks of optimization power. This big chunk of power at Facebook, for example. What are they doing with that? How do they profit with that?
It’s important to tear apart two directions of research: one is the empirical part of the research, where you look at the existing things and how their incentives are distorted, and the other one is the normative part, where you look at things like what we should be doing with that optimization power in the first place.
It seems like there’s two parts to that normative part. One is in general what should be doing with that optimization power, two is, given a vision of what we want to do, how do we actually incentivize that?
That’s true. Glen Weyl is, in my view, one of the leading mechanism designers in the world, and he thinks a lot about this. He thinks a lot about how the world has, for historical reasons, picked up a ton of bad incentives, and how we can fix those incentives.
It’s clearly a really important problem. The structure of incentives, and more generally, the structuring of this kind of power in society. One of the things that seems like a big part of this problem is what we might call the strategic capacity to even be able to deal with stuff like this. We need to be able to answer some of these philosophical questions in a relatively official way, we need to be able to answer the mathematical questions, and we need to do a lot of really good engineering. At some point, it may be necessary for the state to take control of the developmental situation, like it did with the Manhattan Project, and drive it in the right direction. Or, if it turns out that we shouldn’t do AGI, we need to make sure we have the state capacity and strategic capacity to do something about that.
Yeah, to the degree that you trust government entities.
There’s this question of who you trust, but whoever you do trust, it’s clear that they need a lot of strategic capacity—which is the ability to understand the situation, and then act to change the landscape. The issue is that this would require a lot more of our human institutions than I would currently be comfortable expecting. This seems like a really hard problem that we’re not necessarily up to. Then that puts the need to increase quality of human institutions, and increase strategic capacity in general, on the critical path of the AI issue. I’m curious whether you have thoughts on the whole human organization side of this AI problem in terms of developing that strategic capacity.
I think it’s really strongly a function of timeline expectations. If the timelines are 10 years as some people say, then it’s pretty much hopeless to try to get governments up to speed. However, if we have 50 or 70 years left, and the people who will actually be making those decisions have not been born yet, then it’s very possible that social structure engineering efforts are potentially very valuable. Especially in the long-term. Almost nobody thinks long-term, perhaps with the exception of China, so there is very little resistance when it comes to trying to make long-term changes. For example, somebody told me this (but I haven’t looked it up), that when they eradicated smallpox, they basically went country by country in Africa, and if for whatever political reasons there was not a good climate, then they just waited for the regime to change. There was no resistance to that kind of long-term plan, because people had short-term incentives, rather than long-term incentives. If we have enough time, then indeed, we should start doing gradual long-term changes that there will be very little resistance to.
Then the question is: what are those changes? To sit and wait, you need to have this good proposal for, when we get a chance, what we’ll do to get more strategic capacity on the problem.
Exactly. I do think that redesigning incentive mechanisms and trying to build communities that are aware of the long-term problem are the first two obvious natural things to do, and they are already happening.
That community aspect, of creating more people who really understand these things, that’s definitely something we’re interested in at Palladium. Getting back to the timeline issue, since that answer was predicated on timelines: you’ve said a bit about the safety efforts at DeepMind and OpenAI. You’ve said a bit about China’s long-term approach. There are multiple efforts in the Anglosphere to create AGI, and there are probably efforts in Russia and China. I don’t have any positive knowledge of those, but those states have made statements to the effect that they want to work on AGI. China has a plan to achieve AI dominance. You’ve talked to a lot of the smartest people in the field: what is the state of the landscape right now? How close are these organizations? Which are the interesting organizations inside and outside the Anglosphere? Can we get some information on the timeline issue?
There are certainly some projects in Asia, but I’m not very familiar with them. There are a couple projects in Japan. One of them I visited, and the other one I might visit in the future, but I’m not very familiar with them. In China, I’ve seen a document that somebody put together about what the projects are, but I haven’t read it yet. When I go there, it’s an interesting situation there, because they have super strong incentives to be competitive in the short-term, both on the government and individual level. They have a very harsh sort of cowboy capitalism going on there. It means that they are very busy making money, let’s put it that way, so the philosophical or ethical considerations aren’t part of their day-to-day thinking. The interesting thing, though, is that they are much more open to talking about long-term concerns than AI researchers in the West. My model is that because the Chinese haven’t been developing AI for so long, they don’t have so many strong defensive positions when it comes to new arguments. If you go to someone at MIT who has been working on AI for 40 years, and you say, “Look, this thing might be dangerous,” he won’t really think of a response; he’ll have some cached response that is dismissive. In China, they don’t have those cached responses, so they’re willing to think along. In Russia, I don’t know, but Russia doesn’t seem to have the people. The numbers are not there.
I think the Russians had some semblance of AI dominance in the ’80s, where they actually were developing a lot of the neural network technologies, and were ahead of the West.
Russians are also an interesting case in that they are super strong technologically and scientifically as a culture, but for the sake of God, they cannot govern themselves. They can’t build a functional government that is actually good for society, and this ultimately blunts their scientific efforts. If you become famous in Russia, you have to emigrate.
So, you see a lot of the short-term, highly competitive thinking in China, and not necessarily the long-term engineering. But with the Western projects, how close are we? Do you take the 10-year idea seriously, or are we talking more like 30 years? Or is it just hard to know? What does it look like?
I think the correct answer is to just have a probability distribution over this. I condition on lack of other catastrophes, because it’s possible that we will never get there, but the main reason then will probably be that we screw up in some other way and stop the progress. But conditioned on no other disasters, my median is in the 2040s somewhere. Kind of Kurzweilian, I guess. But can I give less than 1% in the next 10 years? No, I can’t. I think it’s more than 1% in the next 10 years.
Let’s get back to that cultural issue of the difference between China and the West. One of the hypotheses I’ve heard, on top of what you were saying about how long we’ve been looking at the problem and how much people’s careers are entrenched, is that another difference is that in the West we’ve recently had people thinking about the safety issue. You get this impression of some of the people going around talking about that being relatively annoying, or doing it in an untactful way. I’ve certainly heard that from people in the space, that they cringe when they hear people talking about the safety issue, just because a lot of the people who talk about that aren’t really being very effective about it.
Yeah. A friend who just visited me from the U.S. has a general observation that he doesn’t know what to do with people who endorse the ideas that he thinks are important, but don’t really understand those ideas. We definitely have quite a few people like that in AI safety.
And some of the backlash or coldness towards those types of thinking might be related to that.
That’s definitely possible. For example, evidence for that is that I’ve personally met several very famous skeptics in AI safety, and after talking to them for an hour, we actually didn’t really find much to disagree about. Scott Alexander once wrote this article “AI experts on AI risk,” where he said, basically, “Bottom line, here’s the position of AI skeptics, and here’s the position of AI safety people,” and those positions were just identical. You could have just copy-pasted them.
We’re getting towards the end here, and I’ve got one more topic that I think would be interesting. On the topic of short timelines, or as this thing becomes heated up, people occasionally talk about the possibility of assassinations of key researchers by intelligence agencies or rival states, or terrorist groups, or other actors, or governments intervening with a heavy hand in AI projects. As this thing becomes more obviously a big strategic issue, there’s this open possibility of assassinations or terrorism. I’ve heard various things discussed in this space. I’m curious whether you have thoughts on whether we can expect to see that kind of escalation, or that level of heat in the area, or whether it seems like something that will stay peaceful.
That would be super unfortunate. The best remark about this that I’ve heard is from a researcher at CSER at Cambridge. He said: “Guys, no killing people. That’s cheating.”
Certainly, it’s not something that you would want to see by any means. But more generally, an arms race may develop in this area, as states start to get close. Let’s say the U.S. is getting close or China is getting close, and now there’s suddenly a lot of pressure for a lot of state force to get involved. The thing could easily become a flashpoint for other sorts of measures, like war and assassination.
Again, I cannot rule this out, but I think it would be really unfortunate if it happens because then it would no longer be possible to coordinate properly, which we need if we’re going to survive potentially uncontrollable AI. The reason for optimism is that these topics are still fairly abstract. Casual observers are underestimating their potential impact, and hopefully will continue to do so.
Hopefully, we’ll be able to build some kind of positive understanding and a roadmap before it becomes an issue of immediate strategic contention.
In the final stretch of the Three-Body Problem trilogy, the Chinese science fiction novels, it’s very interesting how it describes how the world will change when there is a perceived external threat. It’s a fictional example, so take that for what it’s worth, but I find it fascinating.
The response to the strategic threat of AI could just as easily be positive?
Yeah. In the book, it’s kind of a mixture of positive coordination and consolidation between nations, which is the main thrust, but also there are some extremist and religious forces and things like that. But there is this definite possibility that if there is a perceived threat towards our entire civilization, there will be a strong incentive to cooperate at some point.
I really enjoyed getting your perspective on all this. Sounds like you’ve really talked to a lot of the best people in the field.
Indeed. That’s the thing that I have done for the last 10 years.