The Category Error at the Heart of the Turing Test
The Category Error at the Heart of the Turing Test

The Category Error at the Heart of the Turing Test

The imitation game got solved. Consciousness didn't.

The marker still stands Link to heading

If you’ve ever typed “thanks” to a customer-support chat and meant it for half a second, you already know the problem. Your brain doesn’t wait for proof. It rounds up.

That rounding up used to be private. In 2026 it’s becoming infrastructure.

In the backcountry, old trail markers have a talent: they stay in place while the landscape shifts around them. The Turing Test is one of those markers.

In 1950, Alan Turing looked at “Can machines think?” and rerouted it into something you can actually run: a typed conversation, a judge trying to tell human from machine. 1 He did it because definitions were a swamp, and swamps don’t ship.

Now we have a datapoint. In a controlled, pre-registered study, a modern language model was judged to be the human more often than the actual human, in a five-minute, three-party setup. 2 It performed best when prompted to adopt a “humanlike persona.” 2

That’s useful. It’s also not what people think it is.

Passing isn’t proof.

The judge is the bug Link to heading

The judge in the imitation game is not a truth machine. The judge is a nervous system trained on a lifetime of tiny cues.

We decide fast. We decide with absurdly little data. Then we backfill a story and call it intuition. A brain that demanded courtroom-grade evidence before forming a social model would be too slow to be social at all.

So in five minutes, the judge isn’t reading for metaphysics. They’re reading for signals. Does the reply track the thread? Does it stumble occasionally, but not catastrophically? Does it acknowledge emotion in a believable way?

A language model trained on oceans of conversation can learn the surface statistics of those signals. “Humanlikeness” is increasingly a setting. Persona prompting moved the win rate dramatically. 2

Here’s the quiet irony: we talk about machine “context windows” like it’s a sci-fi organ. Humans have context windows too. We just count them in decades, in grudges, in inside jokes, in the way a friend’s silence can carry more meaning than a paragraph.

A five-minute chat doesn’t test whether a machine has that kind of context. It tests whether it can mimic the shape of it.

Passing isn’t proof. It’s proof that our brains will happily infer “someone” from well-shaped text.

Turing’s wager landed on the wrong hill Link to heading

Turing wasn’t building a consciousness detector. He was building a practical substitute for a bad argument. 1 He predicted that, in a short exchange, an average interrogator would struggle. 1

That prediction aged well. The misunderstanding is ours.

We took a yardstick for conversational indistinguishability and treated it like a scanner for inner life. In 2026, that category error costs us.

Syntax isn’t semantics Link to heading

Searle’s Chinese Room still earns its keep.

A person who doesn’t understand Chinese follows a rulebook to manipulate Chinese symbols, producing outputs that fool outsiders. From the outside: understanding. Inside: symbol shuffling. 3 The punchline is blunt. Syntax alone doesn’t guarantee semantics.

Modern language models aren’t literally a rulebook. But the pressure point remains. These systems learn statistical structure from enormous corpora, then get steered by post-training. 7 At core, most are trained to predict the next token. 7 They absorb a lot about the world by absorbing how humans talk about the world.

That can produce astonishing fluency. It can also produce a clean illusion: fluency equals understanding.

Passing isn’t proof. It might mean you’ve built a system that models our language well. That’s different from modeling the world as a place you inhabit.

The talker got hands Link to heading

The 2026 objection is fair: models don’t just chat now. They act.

Tool and function calling lets a model choose structured actions, then chain calls and responses. 6 Context windows have grown into “entire bookcase” territory, a million tokens in the API. 5 Models can retrieve, calculate, verify, coordinate. They can look like agency.

But hands don’t add a point of view. Traffic systems route cars. Thermostats “decide” to heat. Tool use can look like understanding while staying, in the metaphysical sense, hollow.

Passing isn’t proof. It’s just harder to ignore when the output moves money.

The strongest reply Link to heading

If you think I’m being unfair, functionalism is the steelman.

Mental states are what they do, not what they’re made of. 8 If the organization is right, the substrate can vary. Mind as pattern, not as carbon. In 2026, that feels aligned with engineering intuition.

The catch: functionalism doesn’t dissolve the debate. It relocates it.

Searle’s critique targets the claim that “the right program” is enough for understanding. 3 Functionalists push back. But the disagreement is about what counts as “the right organization,” and what would count as evidence that understanding is present rather than performed.

The Turing Test doesn’t settle that. It never did. 1

Three lenses that reframe the question Link to heading

Buddhist not-self. If a model says “I,” people feel a moral tug. Our social equipment treats language like a handshake. But Buddhist theories of mind warn against treating “I” as a pointer to a stable inner entity. 10 The self is a useful story, not a permanent core. Models are excellent at producing coherent narrators. So are humans. The difference is that humans carry bodily stakes through time. Models, as we build them, don’t. “It used ‘I’” is flimsy evidence.

Process philosophy. Consciousness as becoming, not being. A pattern that persists by continuously rebuilding itself. 9 By that standard, today’s base models are strange: powerful, but frozen at inference. Their weights don’t update as you speak. Their “persistence” lives outside the model, in databases, logs, product state. If consciousness exists in machines, the evidence may show up less as “it can chat like us” and more as “it sustains an integrated, self-updating process over time.” A five-minute conversation barely touches that.

Ubuntu. Personhood constituted in relationship and mutual recognition. 11 Not only an internal property but a social fact. This helps with a very 2026 failure mode: responsibility laundering. A system denies your loan, flags your content, screens your resume. Someone shrugs: the AI decided. That sentence turns a tool into an agent, then uses the “agent” as a liability sponge.

Passing the Turing Test makes laundering easier, because it presses directly on our relational reflexes. We want to treat the voice as a someone. Sometimes that’s harmless. In systems of record, it’s a governance bug.

Passing isn’t proof. But it can be a trap.

The door that stays open Link to heading

It’s worth holding one door ajar.

Consciousness is not a solved object. 12 Some aspects look function-like: attention, access, reportability. Some resist that framing, especially what philosophers call phenomenal experience. The “what it is like” side.

If you think there’s more to consciousness than language and math, you’re not being mystical. You’re noticing that a chat transcript is a thin slice of reality.

A model can be trained to be convincing. 7 It can be tuned warmer or colder, cautious or bold. 4 None of that forces the existence of a first-person point of view. None of that rules it out either.

Passing isn’t proof. It’s a mirror that got better at talking.

What the marker demands Link to heading

So what do we do with the Turing Test in 2026?

Keep it. Demote it.

Use it as a measure of social mimicry. Use it to study interface risk. Use it to quantify how easily people infer “mind” from language.

If you care about consciousness, look for evidence that puts pressure on persistence, integration, and costly commitments over time. Process philosophy points that way. Ubuntu points toward responsibility. Buddhist not-self reminds you that even in humans, the narrator is not the whole story.

And Turing, in his un-dramatic way, reminds you to choose questions that can be tested, without pretending they answer everything. 1

But here’s the part the old debates miss.

In 2026, you are always the judge. Every time you interact with a system that talks, you’re running a private imitation game. Your brain rounds up. That’s not a flaw, it’s how social cognition works. The question is whether you notice it happening.

Passing isn’t proof.

It’s a question for the judge, not the machine.

And the judge is you.

  • Computing Machinery and Intelligence Alan M. Turing (1950) Mind. 1
  • Large Language Models Pass the Turing Test Cameron R. Jones and Benjamin K. Bergen (2026) arXiv:2503.23674. 2
  • Minds, Brains, and Programs John R. Searle (1980) Behavioral and Brain Sciences. 3
  • Introducing GPT-4.5 OpenAI (2026). 4
  • Introducing GPT-4.1 in the API OpenAI (2026). 5
  • Function calling (tool calling) guide OpenAI (2026). 6
  • GPT-4 Technical Report OpenAI (2023). 7
  • Functionalism Stanford Encyclopedia of Philosophy (2004, rev.). 8
  • Process philosophy Stanford Encyclopedia of Philosophy (2012, rev.). 9
  • Mind in Indian Buddhist philosophy Stanford Encyclopedia of Philosophy (2009, rev.). 10
  • Hunhu/Ubuntu in the Traditional Thought of Southern Africa Internet Encyclopedia of Philosophy (n.d.). 11
  • Consciousness Stanford Encyclopedia of Philosophy (2004, rev.). 12

Continue reading