Continuous Thought Machines: What If Time Is the Missing Piece in AI?

Sakana AI's Continuous Thought Machine gives neurons a memory and reasons through synchronization, not raw output — producing step-by-step behavior on mazes and images that emerged on its own, with no LLM-scale results yet.

By Editorial · Published Jul 3, 2026 · 8 min read

// Key Takeaways

→CTM gives each neuron access to its own history of activity, computing its next output from a short private timeline instead of only its current state
→The model's core representation is the synchronization between neurons over time, measured directly and used as the signal the network reasons with
→CTM operates in an internal "thinking dimension" decoupled from its input, processing a static photo the same way it processes a sequence: step by step over internal time
→On maze-solving, CTM's attention visibly traces the route as it reasons — an emergent behavior that was never designed in and persists even with more thinking steps than trained on
→On ImageNet classification, accuracy rises the longer CTM thinks, and it learns on its own to spend fewer steps on easy images — adaptive compute allocation as an emergent property

The artificial neuron has barely changed since the 1980s. A unit fires, produces a single output, and that output is what the rest of the network sees — timing discarded. Sakana AI's Continuous Thought Machine (CTM), released in May 2025, argues that discarded signal was load-bearing. Biological neurons don't just fire; when they fire relative to each other carries information — the mechanism is well documented in neuroscience as spike-timing-dependent plasticity. CTM is what happens when you build that timing back into a working model, and the behavior that falls out is a network that appears to think in visible, human-legible steps.

The idea: give the neuron a memory, then measure the choir

Most of the "reasoning" progress in AI over the past two years has come from a blunt instrument: run the model longer, generate more tokens, sample more attempts. Sakana's CTM is a different kind of bet. Instead of scaling around the standard neuron, it changes what the neuron is.

The mechanism has two parts, and both are simple to state even though the resulting dynamics are not.

Each neuron gets a history. Rather than computing its next output purely from its current input, a CTM neuron has access to its own recent activity and learns how to use it. The neuron's behavior can now shift based on what it was doing moments earlier — a form of short-term memory built into the base unit, not bolted on as a separate recurrent module.

The representation is synchronization, not activation. This is the structurally novel move. In a standard network, what matters is each neuron's output value. In CTM, what matters is how neurons' activity lines up in time relative to each other. The network has to learn to coordinate — to synchronize — in order to solve a task, and that coordination pattern is the thing downstream computation reads from. Sakana measures this directly and uses it as the model's working representation.

The name follows from how the model uses this machinery: CTM operates in an internal "thinking dimension" that's decoupled from the shape of the input. It reasons about a single static photograph the same way it reasons about a sequence — by taking discrete internal steps and letting synchronization evolve across them. Time isn't something the data provides. It's something the model generates for itself, on every input, whether or not the input has a temporal structure at all.

What happens when you actually build this

The interesting part isn't the architecture description — it's what it does once trained. Sakana ran CTM on tasks chosen specifically because you can watch the reasoning happen, and the resulting behavior wasn't specified by anyone; it emerged from optimization.

Maze solving. Given a 2D top-down maze, CTM has to output the sequence of moves that solves it — not render a path visually, but actually plan one. Because the model takes multiple internal thinking steps, its attention at each step can be visualized. What shows up is a trace that follows the actual route through the maze, the way a person tracing a path with a finger would. Nobody designed CTM to do this. It's a side effect of giving the model a time dimension and training it to solve mazes. And when researchers let the model think for more steps than it saw during training, it kept following the correct path past that point — evidence it had learned a general planning procedure, not a memorized number of moves.

Image classification on ImageNet. Standard classifiers commit to an answer in a single forward pass. CTM instead takes several internal steps, and Sakana's team found two things worth noting. First, accuracy improves the longer the model thinks — more internal steps, better answers, up to a point. Second, and more strikingly, the model learned on its own to think less on images it found easy and more on images it found hard, without being told to. That's adaptive compute allocation showing up as an emergent property of the architecture rather than a scheduling heuristic someone hand-wrote. On a gorilla photo, the attention pattern in one example moved from eyes to nose to mouth — a sequence that reads as recognizably close to how a human eye scans a face.

Compare this to an LSTM, the classic recurrent architecture built to handle sequences. Sakana's side-by-side comparison of neuron dynamics shows the LSTM producing comparatively flat, low-diversity activity. CTM's neurons oscillate at different frequencies and amplitudes, sometimes shifting frequency within a single neuron mid-task — a much richer dynamical signature, and one the researchers describe as closer to what's actually measured in biological neural tissue, without claiming to be a strict emulation of it.

Why interpretability is the actual headline

Reasoning models in 2025 mostly buy interpretability, if they offer it at all, by narrating — the model writes out a chain of thought in natural language, and you read the narration and hope it reflects the real computation. CTM offers something structurally different: you can watch the attention pattern move across the maze or the image as the model computes, because the steps are architectural, not a post-hoc text summary the model was trained to produce.

That distinction matters for anyone thinking about AI trustworthiness. A narrated chain of thought is a separate output the model learned to generate; it can drift from what's actually driving the answer. CTM's attention trace is the computation. When it fails, you have a better shot at seeing where and why — which is part of why Sakana frames interpretability as valuable not only for understanding correct decisions but for surfacing biases and failure modes.

Where this fits, and where it doesn't — yet

CTM is not a transformer replacement and Sakana doesn't pitch it as one. The demonstrated results are on maze-solving and ImageNet-scale image classification — clean, visualizable domains chosen to make the internal dynamics legible. There's no published result showing CTM operating at large language model scale, and Sakana's own framing is explicit: this is a first attempt at narrowing the gap between how brains compute and how artificial networks compute, not a claim that the gap is closed.

The broader point connects to a pattern showing up across 2025's most interesting machine learning research: performance gains are increasingly coming from how a model spends its computation — over time, in parallel, through better verification — rather than purely from adding parameters. CTM attacks that question architecturally, by making time itself a resource the network learns to use. It's a different lever from the inference-time scaling and parallel-computation approaches showing up elsewhere in the field, and it's evidence that neuroscience-inspired mechanisms can still produce genuinely new model behavior, not just marginal efficiency.

Limitations and honest caveats

CTM's published results are on maze-solving and ImageNet classification — both chosen because they're interpretable, not because they're the hardest tasks in AI. Whether the synchronization mechanism holds up, computationally or in training stability, at language-model scale is untested and unclaimed by Sakana. The "human-like" framing of its attention traces is a visual resemblance, not a claim of shared mechanism with biological cognition — Sakana is explicit that CTM is inspired by, not a model of, the brain. As with any single-lab release, independent replication and adversarial testing are still early.

FAQ

What is a Continuous Thought Machine? The Continuous Thought Machine (CTM) is a neural network architecture from Sakana AI, released in May 2025, in which individual neurons retain a short history of their own activity and the model's core representation is the synchronization of neural activity across neurons over time, rather than each neuron's raw output.

How is CTM different from a standard neural network? Standard artificial neurons compute an output from their current input alone — a design largely unchanged since the 1980s. CTM neurons additionally use their own recent activity history, and the model reasons using the timing coordination between neurons, not just their individual outputs.

Does CTM actually "think" like a human? Not in a literal sense — it has no claim to consciousness or biological equivalence. What it does have is a step-by-step internal reasoning process whose intermediate states can be visualized, and in tasks like maze-solving, the visualized attention pattern closely resembles how a person would trace a path by eye, an emergent behavior Sakana did not explicitly design.

Can CTM run large language models? Not yet demonstrated. Sakana's published results cover maze-solving and image classification on ImageNet — tasks chosen for interpretability. Whether the architecture scales to LLM-sized language modeling is an open question the paper does not claim to answer.

Why does the synchronization mechanism matter? Because it changes what "interpretability" means in practice. Rather than reading a natural-language explanation the model generated separately from its computation, you can observe the attention and synchronization patterns that are the computation itself — a more direct window into why the model produced a given answer.

Explore Related Concepts

Frequently Asked Questions

What is a Continuous Thought Machine?+

The Continuous Thought Machine (CTM) is a neural network architecture from Sakana AI, released in May 2025, in which individual neurons retain a short history of their own activity and the model's core representation is the synchronization of neural activity across neurons over time, rather than each neuron's raw output.

How is CTM different from a standard neural network?+

Standard artificial neurons compute an output from their current input alone — a design largely unchanged since the 1980s. CTM neurons additionally use their own recent activity history, and the model reasons using the timing coordination between neurons, not just their individual outputs.

Does CTM actually "think" like a human?+

Not in a literal sense — it has no claim to consciousness or biological equivalence. What it does have is a step-by-step internal reasoning process whose intermediate states can be visualized, and in tasks like maze-solving, the visualized attention pattern closely resembles how a person would trace a path by eye, an emergent behavior Sakana did not explicitly design.

Can CTM run large language models?+

Not yet demonstrated. Sakana's published results cover maze-solving and image classification on ImageNet — tasks chosen for interpretability. Whether the architecture scales to LLM-sized language modeling is an open question the paper does not claim to answer.

Why does the synchronization mechanism matter?+

Because it changes what "interpretability" means in practice. Rather than reading a natural-language explanation the model generated separately from its computation, you can observe the attention and synchronization patterns that are the computation itself — a more direct window into why the model produced a given answer.