Here’s an amazingly direct manifest (call to arms?) of the emergent “we don’t understand” faction:

The Dark Secret at the Heart of AI.

(Obligatory Everday plugs: Encycloreans, systemity, Minds…)

And it really makes one wonder. Eventually *what* is this understanding thing?

When a deep-learning neural network identifies someone as prone to cancer, *and* that proves to be correct, *and* human doctors have missed this case: What do we mean when we say that “we don’t understand” how the AI came to its conclusion?

What a deep-learning neural network (NN) does, essentially, is just a very long calculation where the source data — patients’ ages, weights, and other medical parameters encoded into numbers — are fed into a vast network of nodes, each node doing some simple mathematical operations on its input and sending its output to other nodes. After (typically) millions of additions and multiplications, a designated output node gives us our final number: in this case, the probability that the given patient has cancer.

I don’t need to go into where this vast network and its calculation parameters came *from*. Assume we have no idea *how* the outcome was achieved; all we have is the ready-to-use trained network that does its magic. What would it mean to understand its workings? Is that even possible?

We can always run the same neural-network calculation for the same patient again — and get the same answer. We can trace the entire calculation and write it our linearly. We can verify it and we can watch, with our own eyes, how the numbers of blood pressure, age, minutes of exercise, etc. get combined, multiplied by coefficients, added, multiplied and added again and again, until the final probability emerges.

How is this different from some other calculation that we (think that we) *do* fully understand? For example, take a simple physics formula for calculating how far a projectile will land, based on initial velocity and angle. Here we, too, take some measured data, apply some calculations, and arrive at a number that is verifiably correct. What is *really* different in this case as opposed to the cancer-detecting NN, other than the total count of mathematical operations? Are there qualitative, not just quantitative, aspects to tell these two calculations apart?

This may sound like a too easy question to ask. In the case of a physics formula, **we can derive it** from other known formulas. That seems to be in contrast to neural networks where each one is derived anew from its raw training data, not from other already-existing networks. But this is not as different as it may seem at first. First, even if NNs do not borrow literal calculations from one another (at least so far), they still share architectures, approaches to data normalization, and tons of other changeables, without which the AI progress we’re now witnessing would be impossible. Second, if you lived in a world before Galileo and Newton and needed a ballistics formula, you would have to go by raw data as well, doing multiple experiments and trying some mathematical models to see which one fits. Very similarly, when developing a neural network for a given task, you gather experimental data and then explore which NN architecture and metaparameters work best.

In a pre-Newton world, you would have to be your own Newton. You would discover that gravity on Earth works by constant acceleration, independent of the mass of the projectile. Why? Because its force scales by mass. Why does it? Why the inverse square of distance? Even now there’s no easy answers to these questions. But this doesn’t prevent us from using simple ballistic formulas — and claiming we fully understand them.

Then, it is intuitively clear that a physics formula is **unique**. That is, if another formula is also correct, it is reducible to this one; otherwise it cannot be correct for all inputs. We also reasonably believe that the formula we use is the **simplest** of all equivalent formulas. This is totally unlike neural networks where even the exact same training data can result in different networks (because training often includes stochastic elements), and it’s pretty impossible to say if, let alone how, a given network can be made simpler without losing its efficiency.

In essence, this means is that we can apply certain formal methods to simplify a formula or verify the equivalence of two formulas — but we can’t do the same for neural networks. Our self-known ability to do this in physics constitutes a major part in our intuitive sense of “understanding” a formula.

Note, however, that a physical formula is an algebraic expression — and we use algebra to manipulate it. Isn’t it natural to suggest that with NNs, we need to seek manipulation methods that are, also, of the same nature as the entities we’re manipulating? Plain algebra is of little help with NNs’ million-term calculations, but what if it is possible to build meta-NNs that would manipulate other trained NNs in useful ways — for example, reducing their complexity and calculation costs without affecting efficiency, or maybe discovering some underlying structures in these calculations to classify them? If such meta-NNs turn out feasible — even if they are just as opaque as our today’s NNs but provably *work* — wouldn’t that affect our sense of “understanding” of how NNs work in general?

In summary, I’m convinced that “understanding” is not a fundamental philosophical category. Instead, it is an *emergent perception* that depends on a lot of things, some of them hardly if at all formalizable. It is basically our evolution-hardwired heuristics to determine how reliable our knowledge is. As all heuristics, it can be tricked or defeated. However, since it is hardwired into our brains, we tend to seek the feeling of understanding as the highest intellectual reward, and we tend to be highly suspect of something that provably works but somehow fails to appease our understanding glands — such as NNs in their current state of development.

No, I’m not calling to reject understanding. It has been, and is certain to remain, immensely useful both as a heuristic that rates our models of the world — and, perhaps more importantly, as a built-in motivator that drives us to seek and improve these models. All I’m saying is that, being a product of evolution, our sense of what it means to understand things needs to continue evolving to keep up with our latest toys.