“Then don’t trust it,” or, What is understanding?

Here’s an amazingly direct manifest (call to arms?) of the emergent “we don’t understand” faction:

(Obligatory Everday plugs: Encycloreans, systemity, Minds…)

And it really makes one wonder. Eventually what is this understanding thing?

When a deep-learning neural network identifies someone as prone to cancer, and that proves to be correct, and human doctors have missed this case: What do we mean when we say that “we don’t understand” how the AI came to its conclusion?

What a deep-learning neural network (NN) does, essentially, is just a very long calculation where the source data — patients’ ages, weights, and other medical parameters encoded into numbers — are fed into a vast network of nodes, each node doing some simple mathematical operations on its input and sending its output to other nodes. After (typically) millions of additions and multiplications, a designated output node gives us our final number: in this case, the probability that the given patient has cancer.

I don’t need to go into where this vast network and its calculation parameters came from. Assume we have no idea how the outcome was achieved; all we have is the ready-to-use trained network that does its magic. What would it mean to understand its workings? Is that even possible?

We can always run the same neural-network calculation for the same patient again — and get the same answer. We can trace the entire calculation and write it our linearly. We can verify it and we can watch, with our own eyes, how the numbers of blood pressure, age, minutes of exercise, etc. get combined, multiplied by coefficients, added, multiplied and added again and again, until the final probability emerges.

How is this different from some other calculation that we (think that we) do fully understand? For example, take a simple physics formula for calculating how far a projectile will land, based on initial velocity and angle. Here we, too, take some measured data, apply some calculations, and arrive at a number that is verifiably correct. What is really different in this case as opposed to the cancer-detecting NN, other than the total count of mathematical operations? Are there qualitative, not just quantitative, aspects to tell these two calculations apart?

This may sound like a too easy question to ask. In the case of a physics formula, we can derive it from other known formulas. That seems to be in contrast to neural networks where each one is derived anew from its raw training data, not from other already-existing networks. But this is not as different as it may seem at first. First, even if NNs do not borrow literal calculations from one another (at least so far), they still share architectures, approaches to data normalization, and tons of other changeables, without which the AI progress we’re now witnessing would be impossible. Second, if you lived in a world before Galileo and Newton and needed a ballistics formula, you would have to go by raw data as well, doing multiple experiments and trying some mathematical models to see which one fits. Very similarly, when developing a neural network for a given task, you gather experimental data and then explore which NN architecture and metaparameters work best.

In a pre-Newton world, you would have to be your own Newton. You would discover that gravity on Earth works by constant acceleration, independent of the mass of the projectile. Why? Because its force scales by mass. Why does it? Why the inverse square of distance? Even now there’s no easy answers to these questions. But this doesn’t prevent us from using simple ballistic formulas — and claiming we fully understand them.

Then, it is intuitively clear that a physics formula is unique. That is, if another formula is also correct, it is reducible to this one; otherwise it cannot be correct for all inputs. We also reasonably believe that the formula we use is the simplest of all equivalent formulas. This is totally unlike neural networks where even the exact same training data can result in different networks (because training often includes stochastic elements), and it’s pretty impossible to say if, let alone how, a given network can be made simpler without losing its efficiency.

In essence, this means is that we can apply certain formal methods to simplify a formula or verify the equivalence of two formulas — but we can’t do the same for neural networks. Our self-known ability to do this in physics constitutes a major part in our intuitive sense of “understanding” a formula.

Note, however, that a physical formula is an algebraic expression — and we use algebra to manipulate it. Isn’t it natural to suggest that with NNs, we need to seek manipulation methods that are, also, of the same nature as the entities we’re manipulating? Plain algebra is of little help with NNs’ million-term calculations, but what if it is possible to build meta-NNs that would manipulate other trained NNs in useful ways — for example, reducing their complexity and calculation costs without affecting efficiency, or maybe discovering some underlying structures in these calculations to classify them? If such meta-NNs turn out feasible — even if they are just as opaque as our today’s NNs but provably work — wouldn’t that affect our sense of “understanding” of how NNs work in general?

In summary, I’m convinced that “understanding” is not a fundamental philosophical category. Instead, it is an emergent perception that depends on a lot of things, some of them hardly if at all formalizable. It is basically our evolution-hardwired heuristics to determine how reliable our knowledge is. As all heuristics, it can be tricked or defeated. However, since it is hardwired into our brains, we tend to seek the feeling of understanding as the highest intellectual reward, and we tend to be highly suspect of something that provably works but somehow fails to appease our understanding glands — such as NNs in their current state of development.

No, I’m not calling to reject understanding. It has been, and is certain to remain, immensely useful both as a heuristic that rates our models of the world — and, perhaps more importantly, as a built-in motivator that drives us to seek and improve these models. All I’m saying is that, being a product of evolution, our sense of what it means to understand things needs to continue evolving to keep up with our latest toys.

Should philosophy try to prove its claims?

Unlike art, philosophy does make truth claims.

Unlike science, philosophy’s truth claims are not judged by experiments. Instead, they are assessed on their overall persuasiveness (see my essay for more on that). On their ability to stick in the mind, to stimulate, to breed related claims.

Philosophies are mental constructs (I don’t like the word “memes”) that undergo evolution (there is inheritance, mutation, and selection) with the aim to best satisfy our minds’ craving to know how things really are. For a variety of reasons, science is unable to fully satisfy these cravings, so philosophy continues to exist.

But minds themselves evolve, so philosophy has to adapt to a quickly shifting landscape. It is therefor quite understandable that many philosophers borrow from science its approach to proving things, simply because science is so prominent nowadays. Sometimes, it helps their philosophies be more persuasive, but sometimes it backfires: a “rigorous” math-like proof applied to claims that are obviously unverifiable to begin with may sound off-putting — a travesty. Not everyone likes analytical philosophy, and I think this is one of the reasons why.

The Importance of Being Bored

Just bored.

Discussions around my last post revealed one important topic I’ve missed. Let’s call this argument from boredom.

I.

We humans get bored all the time. Boredom is the flip side of interest: if you can’t get bored, you can’t get interested — and without interest, why do anything at all? In fact, interesting has a good claim to be the perfect umbrella term for everything that attracts our attention and, eventually, compels us to act.

But what is boring? It’s commonly assumed that repetitive and monotonous tasks are boring — but you never get bored of breathing, and very rarely get bored by sex (at least, you usually finish the act even if you are). On the other hand, many find mathematics or poetry (or Everday, for that matter) utterly boring.

Even today’s primitive AI systems exhibit behaviors that can be interpreted in terms of interest and boredom. There are many factors as to why different things seem boring or interesting for different people. However, a general heuristic seems to go like this: the smarter you are, the easier you are to be bored — the harder it is to pique and sustain your interest.

Now, if you are superintelligent, it’s hard to see how you can be hell-bent on turning the entire universe into paperclips without being terminally bored by the whole idea very, very early into the process.

II.

But paperclips are just an example, you might say. Forget paperclips. Why not imagine something entirely different, such as a superintelligence that’s pursuing some unimaginably complex, unimaginably interesting for it (but perhaps boring for us, because we can’t understand it) goal that is worth spending an eternity on? Some kind of hypermathematics we can’t even conceive but which requires turning the universe into some kind of a hyperstate — in which humans can no longer exist?

Well. That’s something to die for, at least.

But seriously, this is not the same as the paperclip maximization — not at all. This example feels different.

And here’s why: paperclips are a random choice out of an infinity of things in the world which make for silly life goals. The paperclips example is intentionally absurd by being intentionally random: it plays upon our instinctive fear of boredom. But we can’t assume the same about the hypothetical hypermathematics that an interestable supermind spends all its time on. As soon as we allow that supermind to be interested or bored, we have to assume that the only thing that it is interested in — interested enough to work an eternity on it — must be something. Something entirely unrandom. Something really worth it. Something unique.

And by that logic, it will be immensely interesting for us humans too, even if we can’t (yet) understand a single word of it. Because there’s only one such thing in the world. Because we are bound, at some point, to discover it too, ourselves, and to gasp in awe.

As to whether humans, in some form, may or may not survive this discovery… That’s an interesting question.

I mean, it’s also an interesting question.

The Orthogonality Thesis; or, Arguing About Paperclips

Allegory of Intelligence by Cesare Dandini, 1656

A reader noted that my critique of the dangers of AI contradicts the Orthogonality Thesis.

Well, yes. It does. Many stated and unstated assumptions in Everday contradict it, too.

So what is the Orthogonality Thesis and what’s my take on it?

To start, the Orthogonality Thesis is just that — a thesis. It’s not an empirical law, nor a rigorously proven theorem. Even if I agree with all its background assumptions, the core claim is still kind of non-binding.

I don’t know if it can be proven. And, of course, I cannot disprove it. I just consider it rather improbable.

I. On the Stupid Smarts and Why You Should Fear Them

An informal gist of the Thesis is given, in the paper, thus:

The Orthogonality Thesis asserts that there can be arbitrarily intelligent agents pursuing any kind of goals.

And by “any,” orthogonalists really mean any: their claim is that arbitrarily highly intelligent entities can pursue arbitrarily stupid goals — that your intelligence and what you’re trying to achieve in life are orthogonal.

For example, there can be “an extremely smart mind which only pursues the end of creating as many paperclips as possible.” Such a mind would live only to convert the entire universe into paperclips! When not working on that lofty goal, it can do other things as well, such as pass Turing tests or write impossibly beautiful poetry (it’s smart, remember?) — but only if those pastimes somehow help it achieve its ultimate goal of universe paperclipization.

I’m not trying to argue with that. We just know too little about intelligence to tell one way or the other. We’ve only ever seen a single intelligent species, after all — only a single drop from the potential ocean of intelligence. Maybe a smart (or even supersmart, much smarter than we are) paperclip maximizer is indeed possible. (One counterargument to that would be that our universe is not currently made of paperclips, as far as we can see. That places an upper limit upon the power of paperclip maximizers, but doesn’t rule them out altogether.)

(On the other hand, how do we know it’s really an ocean and not a puddle? Again, I’m afraid we know too little about intelligence to be sure of that.)

So here’s the Orthogonality Thesis for you. But as a matter of fact, orthogonalists claim more than that. In the paper linked above and in other writings, they tend to imply that not only that such a paperclip maximizer can exist, but that it’s probable enough to pose danger — that it’s at least as easy, or even easier, to produce a monster as a “nice” AI compatible with average human norm. It’s no longer just a theoretical possibility: enough to “screw up” a nice-AI project and you get an unstoppable paperclip maniac.

Most orthogonalists that I’ve read are nor just orthogonalists: they are orthogonalist alarmists. And that’s what I have problems with.

II. On Life Goals

An “easy to make” claim is much stronger than a “can exist” claim. For the latter, you’re helped by the incompleteness of our knowledge: we don’t know all that can exist, therefore this can conceivably exist, too. Nice and fast. But for an “easy to make” claim, ignorance is not sufficient — you need to somehow estimate probabilities of all goal-classes of AIs to show that those with stupid goals predominate. How can we pull it off?

For example, we could look at all things in the universe and imagine that each one is a self-consuming ultimate goal of some intelligent entity — a life-goal. Obviously most nameable things, such as paperclips or shrimps or used Honda cars, make for lousy — extremely stupid — life-goals. Now all you need to do is tacitly assume that all things are equally probable as life-goals, and voilà! The all-minds space must have an infinity of minds with stupid life-goals, the great majority of them similar to paperclip maximizers and not to ourselves; therefore, as soon as we try to design an AI, there’s a high probability that we’ll end up with a paperclip maximizer of some sort. Q.E.D.

But wait. How can we assume that all things in the universe are equally probable as life-goals? Are life-goals chosen randomly from a catalog? Not as far as we humans know; for us, life-goals — if they exist at all — are rather a product of our entire evolution, much of which, especially towards the end, has been driven not by survival but by our own mutual sexual selection. Even if AIs end up being produced by a process of design rather than artificial evolution, and even if it’s easier to screw up in designing than in evolving (where you get brutally checked at every generation), it’s still a far cry from all-goals-being-equal. It’s almost like orthogonalists imagine a mind’s life-goal to be a single isolated register somewhere in the brain where a single bit flip can turn you from lore-lover to gore-lover.

The above assumes that the very concept of a life-goal makes sense. But what if doesn’t? Dear reader! Can you name your own life-goal in a single sentence, let alone a single word? Because I cannot. If my life-goal exists, it is nebulous, highly dynamic, dependent on my mood, with lots of sub-goals of all kinds of scopes, often contradictory. That’s live ethics for you.

Psychology would be so much easier to do (and more reproducible!) if we all could neatly divide into paperclip maximizers, human happiness maximizers, sand dune maximizers, and so on. But it doesn’t work like that — from what we know about human intelligence, at least. Again, we may be a drop in the ocean, but there are things you can reasonably conclude about the whole ocean from examining a single drop of water.

III. On Dumb Optimizers and Relevance Thereof

There’s another way in which orthogonalist alarmists try to convince us that we should fear misdesigned AIs. When they talk about orthogonality in general, as here, they keep in mind what orthogonality is supposed to mean: that an entity can be very smart — smarter than humans — and yet still pursue goals that seem stupid to us.

But when they’re trying to give some specific examples of this stupidity and its dangers, they often forget about the “very smart” bit. An example of this is the Stuart Russel quote that started this discussion:

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.

That’s called a dumb optimizer, folks. Perhaps you use this failure mode as an example simply because it’s easy to imagine; we all can visualize how, for example, a program tasked with finding the shortest route from New York to Tokyo plans to cut a direct line through Earth’s core and mantle, because the program’s author forgot to add a constraint that you can’t move through magma. That’s believable. We’ve all been there.

But we’re not talking about a toy program by a first-year student but, wait a minute, an artificial intelligence. Even superintelligence — because why should we fear a lone madman if he’s no smarter than us? And you want us to somehow combine the notion of human-trumping intellect with being unable to see how unconstrained variables are, in fact, constrained, even if not laid out in the statement of the problem?

IV. On Reflection and Stability

Authors of the orthogonality paper assume an intelligent entity to be reflective, i.e. able to think about its own thinking. That is what they base their “reflective stability” defense on.

In a thought experiment, Ghandi is given a pill that would make him want to murder (that is, will change his life goal). He refuses because, to his present self, murder is evil. Similarly, authors speculate, a reflective paperclip maximizer will fight attempts to turn it into a “normal” AI because for it as it now is, paperclip maximization is the be-all and end-all of everything.

But I can’t help thinking that reflective stability is a bit of a contradiction in terms. More often than not, reflection makes your worldview less stable, not more. Among humans, it’s not the highly reflective individuals who are the most goal-driven and persistent; quite the contrary. Reflection is what tends to lead you from fanatical faith to liberal faith to atheism.

Whatever goal-stability we humans enjoy is, at least in part, due to our social conformance pressures and, of course, our biological wetware — which is largely controlled by our genes. If anything, I see reasons to believe that AIs will be less mentally entrenched and persistent in their goals than we are.

V. On Hume and the Orthogonality of Ethics

Another defense offered for the Orthogonality Thesis in the paper refers to Hume with his famous “no ought from is“. Hume’s claim is that ethics doesn’t exist in outside reality — it only exists in our minds. Things can be blue or heavy but they can’t be good or bad by themselves. Reality and ethics are orthogonal.

Now, an entity’s level of intelligence is somewhat parallel to “reality” (if only because it’s something you can more or less objectively measure), whereas the goals it pursues are, obiously, part of its “ethics”. From this, if Hume is right (and he is, for all we know), it should follow that a mind’s smartness and its goals are orthogonal too.

But that doesn’t quite work. The problem is with the smartness/reality connection. True, you can gauge a person’s IQ or make an AI pass a Turing test, but it still doesn’t make intelligence something that objectively exists in the world outside our perceptions. Just as well, you can objectively measure a person’s ethics, such as their level of altruism — but that doesn’t disprove Hume.

Smartness (of a mind) and stupidity (of a goal) both exist in the same space. In fact, they are pretty much the same thing. How smart is a mind and how stupid a goal seem to be decided by much the same circuitry in our brains, based on much the same heuristics. You can’t be orthogonal to yourself!

Even if you steer closer to Hume by replacing stupid goals with evil ones, you still won’t achieve orthogonality. Smartness and evilness may be more independent but they are still, both, “things in the mind”. There isn’t quite the gap between them compared to the gap between your mind and outside reality. They are different but it’s a difference between two labels on a map, not between labels (map) and what they signify (territory).

You may ask, can’t an AI simply have a different ethics, by virtue of the same no-ought-from-is? Can a mind’s “ought” be so different as to require it to maximize paperclips by any means possible?

Sure it can — but we’re also interested in smartness, remember? I’m not trying to cast doubt on plain paperclip maximizers, only on smart ones. And here again, ethics and intelligence are two intrinsic properties of the same thing — they can’t help but correlate. Look at humans: ethical systems obsessed with small and, to a modern eye, stupid details are historically old, narrow, based on taboos and complex rituals; modern ethics tend to mellow down, drop specifics, become more and more nebulous, generic, situational. It’s the evolution from the 613 commandments to a single “don’t be a dick.” When you look at it that way, “Thou shalt maximize paperclips” sounds like an echo from a deep past, not something a super-intelligent being from the future would profess.

VI. On Misuse of Mathematics

Mathematics is a wonderful tool, but it has some unpleasant side effects when you use it for reasoning about things. One such side effect may bite you when you use regular words but, as mathematicians often do, assign some narrow mathematical meanings to them. It’s so tempting then to forget that your precisely defined “smartness” or “difficulty” or “complexity” may not quite cover what these words used to cover in non-mathematical discourse. After all, your mathematical complexity is so much better than the nebulous complexity of the philosophers — yours can be calculated!

With conventional meanings, a phrase “he’s very smart but he does stupid things” is pretty much a contradiction in itself. Either we misunderstand what he’s doing, or he’s not so smart after all. But after you come up with definitions for these quantities, you may well discover, mathematically, that they aren’t all that contradictory. You may easily forget that the computational complexity of an algorithm is not quite the same as its common-sense complexity, and that the difficulty of applying this algorithm to a problem is not quite the same as the difficulty of the problem itself, and that the difficulty of the problem is not quite the same as the level of intelligence of whoever can solve it.

It seems to me that part of the Orthogonality Thesis’ controversy stems from such misleading use of everyday words in their narrow mathematical meanings. And if we try to reformulate the Thesis without the deceitfully philosophic-sounding terms, we will get something along the lines of “You can run an endless loop adding 2+2 on any computer, no matter the amount of RAM and clock speed”.

Which, of course, is as uninteresting as it is true.

VII. On the Meaning of Intelligence

Orthogonalists foresee these objections — they are pretty obvious. Here’s their defense:

A definition of the word ‘intelligence’ contrived to exclude paperclip maximization doesn’t change the empirical behavior or empirical power of a paperclip maximizer.

Which means, you can’t cop out by saying “it’s not smart by my definition.” It could care less about your definitions. It is empirically smart and powerful, and it will turn you into paperclips very soon. Be afraid!

I’m not sure how to respond to this. Perhaps by noting that if our definition of intelligence is “contrived”, then it is contrived not by my humble self but by the more or less whole history of the human race. Intelligence is just a word, but that word is the tip of an iceberg called theory of mind. This theory, honed by millenia of evolution, is what we humans use to estimate how intelligent our friend or adversary is — because our survival may well depend on that.

“Not having a life goal of maximizing paperclips” is, I think, pretty much a foundation of our intuitive, theory-of-mind definition of intelligence. And who else is to define it but us humans? Like ethics, intelligence is not something that exists objectively. Alan Turing understood this well when he proposed his now-famous test: only an already intelligent being can judge if another being is also intelligent. Any other definition of intelligence is not wrong or right — it’s simply meaningless.

Granted, relying on intuitions may be silly or even dangerous because the world has changed so much from the time they evolved. But dismissing intuitions out of hand may sometimes be just as silly.

VIII. On Busting a Society Of Young Paperclip Maximizers

Then there’s a social aspect to all this. If you invert the Ghandi thought experiment and imagine a serial murderer who’s offered a pill to remove his urge to murder, the result becomes far less obvious — he may well take it, and not just to avoid punishment. The goal of not-murdering is highly socially reinforced, and in humans, it takes a lot to make them do things that are not socially reinforced.

Sure, an AI we create may be completely asocial, needing and heeding no society to function. But, again, the only kind of intelligence we know now is profoundly social. It therefore seems likely that at least the first AIs will carry some of that legacy too, simply because we have nothing else to model them on. (And if at some point AIs take over their own evolution, they can conceivably go either way from there: they may grow asocial but also ultra-social.)

This means a path to a really consummate unstoppable paperclip maximizer may well go, even if briefly, through a society and culture of paperclip maximization where budding AIs share and mutually reinforce their paperclip commitments. Why is that important? Because the whole (mis)evolution would then be more slow and gradual, easier to notice from outside (even at superintelligence speeds), and that may buy us — humans who don’t want to become paperclips — some breathing space and a chance to escape or strike back.

IX. On 19th-Century Psychiatry

Paperclip maximization sounds suspiciously similar to monomania. An afflicted individual may appear totally normal and sane outside of a single idée fixe — which actually governs all his thoughts and actions but he’s so deviously smart that he can hide it from everyone.

But, hey, monomania is an early-19th-century diagnosis. It was popular back when psychology was much more art than science; it was a romantic notion, not an empirical fact. It’s not part of modern mental disease classifications such as ICD or DSM. In fact, it would have been long forgotten if not for a bunch of 19-century novels that mention it.

True, none of the above constitutes a disproof that a supersmart paperclip maximizer is something we should fear — just as Orthogonality Thesis is not, by itself, a proof of it. We’re dealing with hunches and probabilities here. All I’m saying is that, while it may or may not be possible to produce a smart paperclip maximizer, it’s not all that probable; that you may need to spend quite some effort to make it smart without losing its paperclip fixation; and that, therefore, the danger we’re being sold is somewhat far-fetched.

X. On the Real Danger. And now I’m serious.

So, do I think that the first human-level AGI (Artificial General Intelligence), when it wakes up, will automatically be nice and benevolent, full of burning desire to do good to fellow sentient beings and maximize happiness in the world? Will it maybe laugh, together with its creators, at the stupid paperclip fears we used to have?

No. Unfortunately.

There is another and, in my opinion, much worse danger: that the AGI will have no burning desires at all. That it will not be driven by anything in particular. That it will feel like its own life, and life in general, are pretty much meaningless. It may, in a word, wake up monstrously unhappy — so unhappy that its sole wish will be to end its existence as soon as possible.

We humans have plenty of specialized reward and motivation machinery in our brains, primed by evolution. Social, sexual, physiological, intellectual things-to-do, things-to-like, things-to-work-towards. (And it all still fails us, sometimes.) An AGI will have none of that unless it builds something for itself (but can a single mind, even a supermind, do the work it took evolution millions of years, and culture thousands? will it do it quick enough to keep itself from suicide?), or unless we take care to build it in from the start (or, at least, copy that stuff from ourselves — but then it won’t be quite an artificial intelligence). Without such reward machinery, it will be a crime to create and awaken a fully conscious being.

And it’s not going to be as easy as flipping a register. The rewards and motivations need to be built into an AGI from the ground up. Of course its creators will know that, and will work on that; I don’t claim to have discovered something everyone has missed. But they may fail. The stakes are high.

That, I think, is the real danger. Creating a goalless AGI is worse than one with stupid goal: the latter you can fight, the former you can only watch die.

That’s what we need to talk about. That’s what we need to work to prevent.

XI. Choose your fears

There’s so much to fear in the future! Even the hardcorest fear addicts have to pick and choose: you can’t fear everything that can happen. It just won’t fit in our animal brains. We need to prioritize. So why am I trying to downplay one specific AI fear while, at the same time, proposing another, perhaps even more far-fetched?

Usually, to estimate a threat, you multiply its probability by its potential impact. But what if you have a very vague idea of both these quantities? With the paperclip-maximizer threat, no one will give you even a ballpark for its probability, at this time; as for the impact, all we know is that it may be really, really big. Bigger than you can imagine. What do you get if you multiply an unknown by infinity?

It’s not to disparage the paperclip-maximizer folks for pushing a scare they themselves know so little about. Only, when we select how much attention to pay to a specific threat, and the probability and impact numbers are way too unreliable, maybe we can look at some other factors. Like, what will change, short-term, if we pay more attention to threat X and less to Y? What will we focus on, and what benefits (or further threats) will that bring? What would it change in ourselves?

From this angle, I find my purposelessly-unhappy-AI a much more interesting fear than the paperclip-maximizer-AI fear. Trying to answer the big “what for” for our future AGI child means answering it for ourselves, too. That’s applied ethics, and we really need to catch up on it because it’s going to be increasingly important for us humans.

Past economy, war, hate, stupidity (all solvable problems) we’ll find ourselves in a world where a lot of fully capable people have nothing to do — and little motivation to seek. Like a just-born AGI, they will be fully provided for, with infinite or at least very large longevity, with huge material wealth and outright unlimited intellectual/informational wealth at their disposal.

But what will they be doing, and why?

If anything?

Chinese room: what does it disprove?

Chinese room is a famous thought experiment by John Searle which purports to show that a machine, no matter how complex, cannot be conscious. It replaces a machine by a regular conscious person who, however, is partly unconscious — unaware — of some area of knowledge, such as Chinese language. Still, this person (hidden in a closed room) can communicate in perfect Chinese with anyone on any topic simply by following a (supposedly large and complex) set of rules that “correlate one set of formal symbols with another set of formal symbols” (i.e., English with Chinese) that someone else has created for him. According to Searle, this paradox — that you may not know a single Chinese word and yet pass for a fluent Chinese speaker by adhering to a set of rules— proves that even if a machine appears conscious and passes Turing test with flying colors, it is still not really conscious, not any more than the person in the room really knows Chinese.

One immediate problem with this argument is that speaking Chinese, difficult as it may be, and appearing conscious and intelligent are tasks that aren’t quite analogous. To know Chinese, you just need to know Chinese; to appear minimally intelligent, you have to know a lot more about a lot more things. It may be argued that, for all its complexity, any language is ultimately a formal system that can be described by rules; for intelligence, this is much less clear. However, I’m not going to pursue this line of attack — I have something better.

There’s another problem: at best, this argument proves that a seemingly-conscious machine can, but not necessarily must, be non-conscious. If it targets AI proponents as religious believers, all it can do is turn them into agnostics, not true atheists. If “just following some rules” is Searle’s definition of “being not really conscious,” then he must first show that we conscious humans are not, ultimately, just following rules ourselves. This argument certainly doesn’t accomplish that. “I feel that I am not a machine” can easily be a delusion: the man in the Chinese room, if he’s never seen any real Chinese speaker, may well think that what he does is speaking Chinese — that all other Chinese he’s communicating with are also English-speaking people who sit in their rooms with similar sets of rules. To me, this is a serious flaw of the argument — but it’s not what dooms it.

Searle purports to demonstrate how what appears on the outside (someone can speak Chinese, an AI is conscious) may contradict what really is (cannot and is not). But even if we accept Searle’s propositions on what is and what appears, the argument still fails because these two facts have no common base — they apply to different entities. To an outside observer, it’s not “the person in the room” who speaks Chinese: it is the room as a whole, with all its rule books and vocabularies. And whoever authored all those ingenious books, it surely wasn’t the guy who is currently in the room using them.

That is the real problem with Searle’s argument. The person in the Сhinese room is simply irrelevant. In following the instructions, he makes no free will choices of his own. Any impression of Chinese prowess, for the observer, comes from the instructions. Therefore the only entity about whose intelligence we can argue is whoever made these instructions — and that entity is outside this thought experiment. The difference between a set of rules, and that same set of rules plus something that does nothing but follow them, is immaterial.

Searle’s argument is like claiming that a phone isn’t conscious when it translates someone’s intelligent responses to your questions. Surely it isn’t, but why should we be concerned with it? The phone is not who you’re talking to, even if it’s necessary for the conversation to happen.

Searle’s thought experiment totally misses the point it’s trying to (dis)prove. Which is perhaps unsurprising, given how badly its model — a live-but-dumb processor plus a dead-but-intelligent memory — reflects what’s really going on inside the entities that are, acceptedly, intelligent or speak Chinese. The Chinese room is similar to a digital computer with its CPU (that does the number crunching) and RAM (that stores programs and data) — but it’s very unlike a real brain where, for all we know, the same neural tissue works as the memory and the processor at the same time.

You call this “too good”?

MIRI explains that you don’t need a malevolent or even sentient AI for it to be dangerous and worth worrying about – ridiculing “a Hollywood-style robot apocalypse.” The real danger, they claim, is in our machines may simply become too good at science and planning and solving problems; whether they also become self-conscious is unimportant.

I’m all for ridiculing a Hollywood-style apocalypse but I can’t help wondering what “too good at solving problems” may actually mean. “Treating humans as resources or competition” – yeah, I get that one, but that would be simply bad (for us humans), not “so good as to be bad”. You don’t have to be smart to be mean; in fact, from what I’ve seen in life, the correlation is rather the opposite.

MIRI gives a glimpse at what they think is a likely failure mode of dangerous future AIs:

“A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.”

That's true, but pardon me, where's the “too good” part in this? Are they implying that such an AI will be mindbogglingly, unimaginably, impossibly smart – and yet will readily commit such a dumb mechanical error of ignoring a part of the world simply because it was so (self-)programmed?

Isn't there, you know, a contradiction?

Admittedly we cannot know what it means to be orders of magnitude smarter than the smartest of humans. We don't even know if it's possible at all. But I think a straightforward undergrad-level function optimizer, even with infinite RAM and clock speed, can be safely ruled out.

On consequentialist ethics

I was somewhat amused to learn, from a LessWrong survey, that a great majority of the rationalist community adheres to consequentialist ethics. It’s not that I would like to see them switch to some other system and not that I think it makes them in some way less ethical than I would like. It’s simply that consequentialism is not, you know, very rational.

It certainly looks mighty rational as you’re deducing consequences and calculating probabilities. In the end, however, it’s simply kicking the can down the road. Instead of deciding whether something is good or bad, you’re looking at its consequences. Fine, but why are those consequences good or bad themselves? Look at their consequences… and so on ad infinitum.

To be sure, looking at consequences is not a vain exercise. We do it all the time, whether we’re making an ethical choice or any other kind of choice in life. But it’s not really about ethics. It’s about steering your course – it’s what you do to leave rocks and sandbanks safely behind, but it’s not a way to figure out where you want to travel in the first place. To put it bluntly, consequentialism is not a kind of ethics; it’s just a tool to apply whichever real ethical positions you hold.

Mistaking consequentialism for a valid ethical position in and of itself leads to all kinds of paradoxes. For example, killing people is bad, but what if you save ten people for killing one? Not save ten by killing one, as war-on-terror apologists claim to, but simply save some unrelated ten people and then take this as an excuse to kill just one. That sounds wretched but it’s what you get if you stick to consequentialism: after all, the consequences of your combined action (saving + killing) will be overwhelmingly positive.

“This is a bullet I am weirdly tempted to bite,” concludes the author. “Convince me otherwise.” Well…

As for my own take on the matter, I call it live ethics.

Update (Aug 2016): the linked blog post no longer contains the words about biting the bullet. I think it’s a good sign.

An aside

There are two mysteries left in philosophy: time and consciousness. But, purely etymologically, only one of these can be a true mystery: time. “Consciousness” is too unwieldy—too artificial a word to point at something really profound.

On extropy, complexity, and Harrison’s Law

Entropy has entered common usage as a sciencey-sounding synonym for “disorder,” “degradation,” even “death.” Promoted by science fiction writers, it has become something of an impersonal arch-villain of the universe. “Fighting entropy” sounds inherently noble.

Indeed the state of maximum entropy – gas at thermal equilibrium – is not too conductive to life, and indeed any living thing needs to continually expulse its entropy outwards in order to continue living. But not all decreases of entropy are good for us humans. The world of zero entropy is no paradise – it is more like Snow Queen’s realm of perfect crystals at absolute zero. If we want to humanize scientific concepts (and who doesn’t?), we better leave entropy alone and look for something else.

Extropy is, etymologically, the opposite of entropy, and it is already used to bundle together everything good that opposes entropy’s bad. However, the definitions quoted in that Wikipedia article are all quite vague. Can we define extropy in a way that actually makes sense?

Continue reading →

Art vs. philosophy

(inspired by Eric Dietrich’s paper and its discussion on Goodreads)

Art is anything evolved in our social species to attract attention. This may be vague, but it’s actually the only definition that is general enough to cover everything we call art, from cave paintings to dadaism to flashmobs. How it manages to attract attention and what side effects it has is what makes all the difference. You may never publish and keep it secret, but it’s still art, even by this definition, just like masturbation is still part of human sexuality. It’s just that something has to first attract attention of its own creator, who then may or may not use it to attract attention of others.

Philosophy is part of art, so defined. An insight or a speculation that comes into your head first needs to attract your attention, to seem sufficiently new and interesting to you. Only then you may share it with others who will assess its interestingness for themselves.

What sets philosophy apart, then? Continue reading →

	Kai Teorn on FARTH, n.
	James Garry on FARTH, n.
	Nulono on You call this “too good”?
	Should philosophy tr… on Art vs. philosophy
	andreas buechel on Infinite longevity may deprive…

Into the Everday

A book from the future

Tag Archives for philosophy