Can concepts arise from Thousand Brains Theory?

The attached document proposes that they can, using only TBT primitives, by extending “reference frames” beyond 3D object space into ordered predictive structures. From there, and from ordinary toddler behaviour like stacking blocks, it sketches how proto-concepts and concepts can emerge as structured patterns of biasing between cortical columns, influencing both brain dynamics and observable behaviour.

ConMonty Proposal by Trung Doan submitted10feb26 to TBT Discourse.pdf (501.6 KB)

5 Likes

@Trung_Doan, thanks for the thoughtful proposal, I enjoyed reading it!

Three questions came up as I worked through the proposal.


Could hierarchical SECCs with voting already give us these four functions?

I think how you clearly defined the different types of predictions (i.e., sensory, compatibility, regime persistence, and context) makes sense. We would certainly need all these predictive functions, but I’m wondering if these functions require distinct column types, or whether they might emerge naturally from SECC columns with voting connections organized hierarchically.

  • ACCC: TBT’s voting mechanism already has columns checking whether their predictions are mutually consistent. Columns that don’t fit the consensus get suppressed. This seems functionally equivalent to what the ACCC does. A distributed mechanism like voting would not require specifying a new distinct column type that aggregates predictions from multiple SECC columns and predicts their compatibility.

  • RSCC: Each cortical column already predicts sequences. If we combine that sequence-prediction ability with voting, we would get columns collectively predicting whether compatibility will hold across time. This can also apply in higher cortical regions to cover broader context at different timescales.

  • CSCC: Higher-region columns with similar architecture make higher-order predictions at broader context. A higher-region column that has learned the “stacking” pattern would vote for compatibility when it detects a hard surface and against it on a cushion. This can bias lower regions through top-down connections. Could context selection (CSCC) be what compatibility prediction (ACCC) looks like when it operates one level up in the hierarchy?

Is the leap from proto-concept to concept doing too much work?

The jump from proto-concept to full concept worries me a little. The proto-concept of bigness is grounded in specific patterns (e.g., wide hand opening, upward gaze, greater resistance). But for Tom to apply “bigness” to a new situation the system has to recognize that this new situation is an instance of bigness. And to do that, it seems like it already needs a more general notion of bigness than what the proto-concept provides. That feels a bit circular; generalizing the concept requires the generalized concept.

The proposal says the context-selection mechanism picks the bigness regime because it “supports predictive success,” but that doesn’t resolve the circularity. The system still needs some basis for trying the bigness regime in a novel situation where the sensorimotor patterns look quite different from the ones that formed the proto-concept.

How does the system get from “these specific bodily patterns go together” to “try this structure in situations that don’t share those patterns”?

One reasonable answer is what you wrote in P21 about “predictive horizons lengthen with experience”, but it still feels like a big gap that might require other mechanisms (e.g., hierarchical processing, resource constraints, enforcing sparsity constraints) to force this generalization behavior.

How are ordered predictive axes learned?

The proposal describes compatibility and stability as ordered predictive axes. An ACCC has an axis from “highly compatible” to “incompatible,” an RSCC has one from “very stable” to “about to collapse.” But compatibility and stability aren’t simple signals a single column can read off its inputs. They are themselves abstract, emergent properties of how many columns are behaving together. Representing them as axes inside specialized columns is essentially asking a column to have a built-in concept of compatibility or stability, which is the kind of abstract concept ConMonty is trying to explain in the first place.

This is part of why I think these functions can emerge naturally from existing voting and sensory prediction ability. Voting across columns doesn’t require any single column to represent “compatibility” as an axis, compatibility is just represented by whether predictions agree or not. Similarly, temporal predictions across voting columns don’t require a stability axis, it is represented by whether agreement between columns keeps happening over successive transitions. The abstraction emerges from the collective behaviour rather than being encoded in a specialized column.


Additional Notes

  • Re. the historical accident. Before TBT, the focus was mostly on sequence prediction for things like predicting the next note in a melody. This led to the algorithm of temporal memory (HTM), But HTM couldn’t explain how the brain learns structured representations of objects where features are organized relative to each other in a spatial map. The reference frame solved this by associating features with locations and allowing for movements to transition from one location to another. This is an extension of sequence prediction that allows learning a more structured representation, when available. The first part of this video might be useful context.
  • I think Diagram 2 for ACCC may have an incorrect prediction description. My understanding is that ACCC predicts compatibility of SECC columns but not over time.
  • Also Diagram 3 looks like it should be for RSCC not ACCC.
2 Likes

@rmounir, thank you, and I hope my response below helps to advance our discussion.

1. Can voting by hierarchies of cortical columns predict compatibility, regime, and context?

This “1 CC or a network of CCs” question, I struggled with it. It is elegant for evolution to make use of what it already has, i.e. just wire columns together. But evolution also uses specialisation, via cell differentiation. The existence of cortical areas with denser dendritic trees and extensive long connections favors the 1-CC solution, but not decisively.

In the end, I chose the 1-CC solution because it is more mechanistic and more falsifiable, and I just hoped that evolution chose it. More mechanistic, because for a single physical CC I was able to model how the mechanics work, but a network was too hard to model. More falsifiable, because if an experimenter has identified a candidate column, it’s easy to falsify it – for example, by silencing or perturbing tactile afferents. With a network of CCs, it’s tougher to know which levers to push. Interpreting experimental results may involve more If’s and But’s.

2. Is the leap from proto-concept to concept doing too much work?

You questioned how, as a growing child’s predictive horizons lengthen, proto-concepts can lead to concepts without additional mechanisms such as hierarchical processing.

Let’s consider toddler Tom doing these: He wrestles with his cousin’s huge dinosaur, he carries a big box Dad just bought, and he wears Mum’s jacket. The active sensory-expectation CCs – wider arm opening, gaze looking up, clothes not fitting – differ, but in each situation they are mutually compatible and they persist over the whole episodes of play. The same action-compatibility CCs and regime-stability CCs tend to work across different situations.

Context CCs receive biasing inputs from these ACCCs and RSCCs, plus various situation signatures (own home or cousin’s, shopping or play,..). Over episodes, CSCC synapses are trained by whether a RSCC regime was repeatedly confirmed or violated in a situation signature.

Let’s say Tom has a small cubby house, and this week his cousin got a bigger one. Now, Tom walks inside to play. Would his context CCs, based on those same ACCCs and RSCCs being active again, in this new context, predict that they will continue to be applicable? I can’t see why not. So, the applicability of the same regime is generalising another step here. And it keeps generalising as he goes inside a school bus, which is bigger than his parents’ car.

3. How are ordered predictive axes learned?

You wondered how a CC learns an axis that goes from “highly compatible” to “incompatible”, or from “very stable” to “about to collapse”. Compatibility and stability are abstract, they are not simple signals that a CC can read off its inputs.

You are right: If a cortical column participating in building up a concept must use that concept in its operation, then this would be a circular argument.

But let’s consider toddler Tom playing stacking blocks. At various times, his muscles did their jobs exactly right, or nearly exactly right, and the stacking succeeded. There have been other times where his muscles did their jobs badly or very badly, and the stack collapsed. There are also times in between, where the stack wobbled before collapsing or staying.

If we represent all the above as a bunch of dots, then the dots form a region that you and I use English to say that it ranges from stable to unstable. Tom’s regime-stability CCs of course use no such words, they simply predict this: given that the biasing inputs are currently a dot in a previously-learned region, and they are staying put or transitioning to another dot in the region, then will the current regime hold? No circularity is involved here.

4. A few other matters

“Historical accident”: Imagine that while waiting for his wife, Jeff was not only holding a cup but also listening to a familiar song. Some years prior, he had already developed the sequence-memory theory. So, A Thousand Brains would be different: The book would say that to predict the next tactile input his brain needed a 3-D coordinate frame attached to cup, and to predict the next note for the melody his brain needed a learned sequence of notes. His Thousand Brains Theory would say both a 3-D frame and a learned sequence can be a reference frame. My term, “ordered predictive structure”, would then simply be a general term applying to both.

Diagram 3 for regime-stability CC: Oops! It should be the ATTACHED. I mistakenly pasted Diagram 2 (ACCC) again then labelled it Diagram 3, meant for RSCC. This is what working late night did to my brain (the night before submitting it to Discourse, I worked to 4AM). Thank you for spotting it!C:\Users\doanviettrung\Dropbox\CONSCdropbox\cm. the correct Diagram 3 for the proposal.jpg

1 Like

@Trung_Doan, thanks for the detailed reply.

The active sensory-expectation CCs – wider arm opening, gaze looking up, clothes not fitting – differ, but in each situation they are mutually compatible and they persist over the whole episodes of play. The same action-compatibility CCs and regime-stability CCs tend to work across different situations.

I agree that an incremental mechanism like this would be needed for concepts to become more general over time. It works well when there is enough sensorimotor overlap between situations. But, e.g., for cases like Tom watching someone throw a rock that causes a “big” splash of water, where none of the original sensorimotor patterns are present, the gap might be too large. I think other mechanisms could play a complementary role to bridge that gap.

Tom’s regime-stability CCs of course use no such words, they simply predict this: given that the biasing inputs are currently a dot in a previously-learned region, and they are staying put or transitioning to another dot in the region, then will the current regime hold? No circularity is involved here.

The “dots in a region” framing is helpful, but I think there are two layers to the problem.

First, the column has to learn what stability means within a single regime (and not necessarily label it), extracting from many episodes of stacking which features are relevant to persistence versus collapse and which are incidental. Stability looks different at the sensory level when looking at the blocks from different angles, in different lighting, or looking at stable blocks that were stacked differently. Some generalization across episodes is involved here. A 1D predictive axis is effectively reducing a high-dimensional problem to a generalized concept of stability in a single dimension.

Second, the column has to generalize stability across regimes that look completely different at the sensory level. The stability of a block stack (visual alignment, even pressure, no wobble) and the stability of pushing a heavy object before it topples (resistance, tilt angle, momentum) share almost no sensory features. For these to land on the same axis, the column has to have extracted something abstract enough to span both.

Hi @Trung_Doan , @rmounir

I read your proposal, pardon me if I misinterpret.

The core question seems to be whether abstraction of concepts can arise purely from heterarchical interactions of columns, through mutual constraint and stabilization of predictions, without introducing explicit hierarchical representational levels.

My own intuition is biased toward concepts existing as high-dimensional predictive states, more like HTM-style activations, but operating across levels (i.e. hierarchies of heterarchies in cortico-cortical loops), with active dendritic integration providing the contextual and top-down modulation that links these levels.

So, let’s take, visual recognition of my guitar and side table, and their spatial relations. That is well captured by composable object detection: visual input leads to a stable coalition of confident predictions in a shared reference frame. A heterarchical model seems sufficient to me, for coherence at this level.

But when I consider the temporal aspect of picking up the guitar: avoiding the table, choosing where to grip, adjusting my movement as I feel the weight, the ‘concept’ involved feels less like a static coalition and more like a temporally extended predictive structure that only resolves through action and feedback.

Your proposal does address temporality in terms of prediction, regime stability, and persistence of compatible patterns over time, but I’m not sure whether that notion of temporality is sufficient for these kinds of embodied, goal-directed action concepts.

The uncertainty about how I will rotate the guitar only collapses once I actually feel its weight. This makes me suspect that while heterarchy may be sufficient for stabilizing object identities and spatial relations, temporal abstraction and action planning may require representations that persist and evolve across levels, rather than emerging solely from peer-level constraint and stabilization.

So in my view (influenced by TBT), abstraction emerges as columns learn to model the outputs of other columns. Active dendrites allow features to participate in different concepts depending on context. Goal direction corresponds to a preferred region of conceptual state space, and action emerges as active inference over trajectories that satisfy the evolving constraints of the goal context, with larger cortico-subcortical loops gating which trajectories are expressed as action/behavior.

It’s obviously more complicated than I can fully grasp, but I tend to think of goal direction as something like an attractor basin in the free energy sense. Not a fixed plan, but a constrained, directional pull on action trajectories toward a region of state space corresponding to an imagined or anticipated goal. In that framing, behavior emerges from biased dynamics over predictive states, with uncertainty progressively collapsing through embodied interaction and feedback.

@rmounir,

Thank you for the thoughtful comments, and apologies for the slow reply. I’ve just come back from a holiday.

1. On the “1D axis”

Yes, you are right, describing a reference frame as a single axis is an oversimplification.

A cortical column learns a reference frame, which is a multidimensional structure. In the Conceptual Mountcastle proposal document, I use the term Ordered Predictive Structure to express my claim that the reference frame concept applies to much more than just physical objects. The term Ordered Predictive Axis just aids intuition.

2. Bridging gaps between non-overlapping situations

Your deeper concern is how toddler Tom generalises between bodily experiences (e.g., grasping large toys, vs observing a large splash), whose sensorimotor patterns do not overlap.

The CC functions below, proposed in the ConMonty proposal, can create this bridge above the level of sensorimotor similarity.

Consider early episodes: Tom hugs a large teddy bear, lifts a wide toy truck, or catches a large ball.

In each episode:

-SECCs (Cortical Columns performing the Sensory Expectations function) predict bodily configurations,

-ACCCs (Action Compatibility CCs) predict that the above sensory expectations remain mutually compatible in a regime,

-RSCCs (Regime Stability CCs) predict that the above regime persists during that episode.

Across many episodes in various past situations, similar ACCC-RSCC success patterns recurred even though sensorimotor details differ. CSCCs (Context Selection CCs) learn that these regimes tend to succeed across diverse contexts. Generalisation therefore occurs not because situations look alike, but because the same predictive regime keeps working.

Now consider Tom watching a rock produce a large splash. The grasp-related SECCs are absent, yet new vision and auditory SECC coalitions again produce coherent large-scale change that persists momentarily. When similar ACCC and RSCC success dynamics recur, CSCCs can link this episode to previously learned regimes despite minimal sensory overlap.

So, the abstraction bridge is built from shared predictive success structure, not shared sensorimotor modalities.

@Daniel_Brownell

You covered many interesting topics, but I want to focus on your ideas about concepts. You seem to propose 2 ways to look at concept processing:

  • “concepts existing as high-dimensional predictive states..””

- “abstraction emerges as columns learn to model the outputs of other columns”

I’d be interested to learn how you plan to take these ideas forward.

Nature has given us a huge gift: sleep and coma. If we develop a theory that can’t explain how people wake up and don’t have to learn concepts from scratch like babies do, that theory is nlikely incorrect. Therefore, I believe that a theory about concepts must explain how learned concepts are quickly “rehydrated” upon waking.

Hi @Trung_Doan,

So in my view, concepts are attractor regions in the learned dynamical system formed by columns modeling each other (with motor gating occurring through context-dependent modulation).

Activity may collapse in coma or sleep, but connectivity remains. When sensory input resumes, the system ‘rehydrates’ back into pre-shaped attractor basins. A baby must sculpt these basins from scratch.

So my earlier two statements are sequential rather than alternative views:

  • Columns learn predictive models of each other
  • Stable predictive regimes emerge
  • These regimes shape the joint state space
  • The resulting attractor basins are what we call ‘concepts’

The hardest part is operationalizing the attractor-basin interpretation.

The way I would take this forward is to build a minimal embodied predictive system with the following structure:

Generic columns

  • Multiple identical predictive modules receive different projections of sensory input. Each learns to predict its own future input and exchanges predictions with peers. No column is predefined as higher or lower.

Persistent belief state

  • A recurrent latent state integrates across columns. It effectively encodes the attractor landscape.

Goal as density over belief space

  • Rather than encoding task structure explicitly, the goal is represented as a learned density over belief trajectories that historically led to successful outcomes. The system then biases its predicted evolution toward regions of belief space with high success density.

Action via predicted potential minimization

  • At each step, the system rolls forward candidate action sequences using its predictive model and evaluates them by minimizing a combined potential:

    • prediction error
    • uncertainty
    • distance from preferred belief regions

Behavior emerges from flowing toward low-surprise regions in the shaped landscape.

If this interpretation is correct, temporally extended behaviors (for example probing before lifting, adjusting after feedback, etc.) should emerge from the geometry of the learned dynamical landscape rather than from scripted sequencing or explicit hierarchical representations.

In a POC, I’d still need to specify what counts as success, but it would just be the terminal condition.

Because the search space is large, I’d probably use curriculum learning, in the sense of progressively hardening the environment. For the guitar example, something like:

  1. Pick-up only (no obstacles)
  • Learn reliable grasp/lift and gross orientation control.
  1. Add hidden dynamics / partial observability
  • Randomize mass / center-of-mass / friction so the agent must probe and adapt (uncertainty collapses through interaction).
  1. Introduce obstacle constraints
  • Add the side table and treat collisions as high surprisal / strong penalties, forcing clearance strategies.
  1. Final combined task
  • Same terminal condition as (1), but now under hidden variables + obstacle constraints, so the full “pick up and orient safely” behavior must emerge.

The key is that the agent wouldn’t be told ‘first do X then Y.’ It only gets the terminal success criterion.

It learns generic physical constraints through interaction, and learns a shaped belief-space landscape in which temporally extended sequences (approach → probe → adjust → lift → orient) become the natural low-surprise route to the goal.

That’s generally how I might take it forward, though I haven’t thought through every detail yet.

In reinforcement learning, reward shapes policy. In the model I’m envisioning, a learned density over belief trajectories associated with successful outcomes shapes the effective geometry of the system’s internal state space. The negative log of that density acts as a soft potential, biasing predicted trajectories toward regions of belief dynamics that historically led to success.

1 Like

@Trung_Doan

Just to clarify, I think the cortex is physically heterarchical. My thought is just that, over time, learning will create functional asymmetries, where some columns end up predicting from the output of others. It’s just a hunch. But since vision already shows fairly clear hierarchical gradients, I figure something similar would emerge, for concepts in general.

You mentioned heterarchy. Vivian Clay’s presentation on hierarchy vs heterarchy was fascinating. But that was about architecture, and thinking in architecture is only 1 way to solve a problem. For the problem that I tried to wrestle with, i.e. how TBT handles concepts, if I tacked it in an architectural way I think I wouldn’t have gone very far.