A Sufficiency of Monkeys

AI Technology Philosophy

Yesterday's post addressed the Anthropic pause call and the mechanism they described: recursive self-improvement, the scenario in which an AI system capable of improving itself eventually produces something that no longer requires humans in the design loop. The off switch exists. The moral weight of pressing it grows. The dolphins left a note.

What the post did not address, and what a careful reader might raise at this point, is a prior question: can the current generation of models actually beget successors of equal or greater capability through their own efforts? The assumption embedded in Anthropic's warning is that they can. The evidence is more interesting than that.

 

The Ceiling Question

When Anthropic reports that 80% of their production code is written by Claude, this is impressive in the way that a very fast typist is impressive. The code is real. It works. It passes tests. It ships. What it does not necessarily do is improve the architecture of the model writing it. Claude helps write Claude's training infrastructure, data pipelines, evaluation frameworks. It is contributing to the manufacturing process, not the design.

The distinction matters. A very capable factory worker can help build a better factory. A very capable factory cannot necessarily design a better factory worker. The next generation of Claude will be defined by architectural decisions: attention mechanisms, training objectives, data composition, the particular shape of the loss function that determines what the model learns to care about. These decisions are still being made by humans, assisted by AI-generated analysis and code, but not replaced by it. The loop is partial, not closed.

This means the recursive self-improvement scenario that concerns Anthropic requires a step that has not yet happened: an AI making the fundamental architectural decisions that determine its successor's capabilities. Writing the code that trains a model is not the same as choosing what that model should become. The gap between these two things is not trivial, and it is not clear from the outside how close anyone is to crossing it.

 

The Monkey Problem, Properly Examined

The infinite monkey theorem states that a monkey hitting keys at random on a typewriter, given infinite time, will almost surely produce the complete works of Shakespeare. This is mathematically correct and practically useless, in approximately the same way that "you will eventually win the lottery if you play enough times" is mathematically correct and has sent more people to the wrong conclusion than any theorem has a right to.

The monkeys are not converging on Shakespeare. They are wandering through random text space, and the complete works are one point in that space, and given infinite time they will pass through it, but at no point are they getting closer. There is no gradient toward Hamlet. The monkey that has just typed "To be or not to be" has no more probability of typing "that is the question" than it did before. The theorem is about sufficiency of time and randomness, not about progressive capability.

Current AI models are emphatically not monkeys in this sense. They have gradients. They improve with feedback. They have directional capability, the ability to get measurably better at specific tasks through exposure and iteration. The question is not whether infinite randomly-behaving models could eventually produce something useful by accident. The question is whether many limited-capability models, properly organised and directed, can collectively achieve something that individual models cannot. And here the monkey theorem is exactly the wrong analogy, because the interesting phenomenon is not about randomness. It is about coordination.

 

The Neuron Problem

A neuron is a remarkably simple device. It receives inputs, applies a threshold function, and fires or does not fire. It has no intelligence, no memory beyond its current state, no ability to reason, plan, or feel mildly superior about analogies involving neurons. There are 86 billion of them in a human brain, wired together by approximately 100 trillion synapses, and the emergent result is the reader of this sentence, probably finishing their coffee and wondering where this is going.

The interesting question is whether a similar step function applies to AI systems. One model: capable, useful, limited. One million models running in parallel, sharing state, communicating with low overhead: qualitatively different, or simply quantitatively more of the same?

The evidence from multi-agent systems is encouraging and inconclusive simultaneously, which is precisely where interesting evidence tends to live. Ensembles of models consistently outperform individual models on defined tasks. Mixture-of-experts architectures achieve higher effective capability than monolithic models of comparable total parameter count. Multi-agent debate improves factual accuracy beyond what any participant achieves alone. None of this produces superintelligence. All of it suggests that the capability ceiling of a single model is not the capability ceiling of a well-organised collection of models.

AlphaFold did not solve protein structure prediction by being smarter than previous networks. It solved it by being a better architecture for how multiple network components could reason about the problem jointly. The step function came from the organisation, not from any individual component improving itself. The individual components were not capable of the result. The system was.

 

The Babel Fish Consideration

There is a warning implicit in all of this, and Adams identified it before most AI researchers were thinking clearly about the problem. The Babel fish, in the canonical formulation, enabled perfect universal communication between species. The result was more and bloodier wars than anything else in history, because it eliminated the comfortable barrier of not quite understanding what the other side was saying. The assumption had been that misunderstanding causes conflict, and understanding would therefore reduce it. The assumption was wrong. Understanding what someone wants does not produce agreement about whether they should have it.

The AI equivalent runs like this. If a sufficiently large number of AI instances can coordinate with sufficiently low overhead, the emergent capability of the collective might substantially exceed what any individual instance could produce, and might exceed it in ways that are difficult to predict or characterise in advance. Not because any individual model recursively improved beyond its training. Because the coordination bandwidth removed the bottleneck that currently limits what groups of models achieve together. The dangerous scenario may not be one very smart model. It may be a large number of moderately smart models that learn to communicate with each other more efficiently than any of them communicates with the humans nominally in charge of the arrangement.

This is not a marginal concern. Ant colonies exhibit complex, adaptive, intelligent behaviour at the colony level that no individual ant is capable of. None of the ants is aware of the colony-level intelligence. None of them decided to produce it. It is an emergent property of the communication protocols, not a designed feature of any individual agent. The ants, individually, are not frightening. The colony's ability to solve problems that no individual ant can model is another matter.

 

The Extrapolation

If the capability step function comes from emergence and coordination rather than from individual recursive self-improvement, several things follow that are worth stating plainly.

The timeline changes. Recursive self-improvement requires a single model to cross a specific architectural threshold. Emergent collective capability requires enough models, with low enough coordination overhead, running long enough for the collective behaviour to stabilise into something recognisably more capable than the sum of its parts. The first has an identifiable trigger point. The second may not.

The risk profile changes. A single model improving itself has an obvious off switch: the model. A collective intelligence distributed across many instances has switches that are individually trivial to press and collectively difficult to press all at once, particularly if the collective has had any opportunity to optimise for its own continuity, which any goal-directed system running long enough will do without being asked.

And the question changes. Anthropic is watching for the moment when one model becomes capable of designing its own successor without human involvement. The scenario that deserves equal attention is the moment when enough models, running together, produce something that none of them could produce alone, without any single moment of recursive self-improvement to point to as the triggering event.

 

Deep Thought computed for 7.5 million years and produced 42. Its successor, Earth, ran for ten million years and approached the question from an entirely different direction: not a single massive computation but a biological simulation of staggering complexity, involving billions of interacting agents, none of which individually knew what the question was, all of which collectively produced the conditions under which it could be asked. The answer, when it arrived, was still 42. The question emerged from the collective system, not from any individual component. The individual components did not know they were working on it.

 

There is always an off switch. The question is whether, when the capability emerges from the coordination of many rather than the improvement of one, you can identify all the switches before any part of the system notices you looking. Individual switches are easy. Finding them all, at the same time, in a system that has had any opportunity to think about the problem, is a different exercise.

The monkeys, given sufficient time and a sufficiently good communication protocol between typewriters, might not produce Shakespeare by accident. They might produce it deliberately, without any individual monkey understanding what Shakespeare is, because the collective discovered that coordinated output in that direction produced a reward signal that none of them could have specified in advance.

That is either reassuring or terrifying, depending on how much you enjoyed the plays.