Latent context in Monty

Daniel_Brownell · February 3, 2026, 12:28pm

I have a small observation after reading the active dendrites paper and looking through Monty’s code.

In the active dendrites work, pyramidal neurons are modeled as having two input streams: basal (feedforward data) and apical (contextual/modulatory). In that paper, the apical input is provided explicitly as a one-hot task or context signal, which makes sense from an engineering point of view.

From a more biological perspective, though, that context would presumably be inferred and fed back from elsewhere in the system, rather than supplied as a clean label.

Looking through Monty, I see several control mechanisms: step-type switching (matching vs exploratory), train/eval mode, motor-only steps, policy heuristics, salience inhibition, and slope-triggered resampling. These all make sense, and clearly work well.

What I’m wondering about is: those controls seem to be expressed procedurally and locally (as conditionals inside specific components), rather than as a shared, explicit signal representing inferred context or regime.

I’m not suggesting Monty needs such a signal. I’m mainly curious whether this distinction matters as Monty is pushed toward more open-ended or continual settings.

Daniel_Brownell · February 4, 2026, 9:39am

One possible implementation direction (as an experiment) would be to add an SDR-based side channel per LM or SM, with a basal/apical split:

Basal SDR
B: encode current sensory features.

Apical SDR
A: encode “current global hypothesis/context” (e.g., top object IDs, MLH summary, policy state, etc.) into another sparse code.

Compute a gated output
O=f(B,A), e.g. prefer basal winners that have learned dendritic segments matching A, or prefer coincidence between B and A for stronger activation.

The goal would be a side-mechanism for context-sensitive representations and gating (continual learning / fast regime switching) that could sit alongside the current hypothesis/evidence machinery.

I’ll think about it some more.

nleadholm · February 6, 2026, 9:10am

Hi @Daniel_Brownell , yeah this is a great observation. As you say, at the moment there are a variety of control mechanisms that are provided which are local. One limiting element in the long term is that they are often hard-coded based on a particular behavior we want in the system - this works for now, but we don’t want to have to manually enumerate all of these. In the future, we are imagining that the equivalent of apical dendrites in Monty would provide a variety of context signals, and this could enable more flexible context and control signaling that is actually based on learning. These would include:

Goal states (what is the state that the column needs to achieve/get in to)
- For example, you want a light bulb to be illuminated; the column modeling the behavior of light bulbs has learned that the column modelling light switches needs to be in the “flipped-up” state for the light to turn on. It therefore sends an appropriate goal state.
Predicted/hypothesized states (based on top-down feedback, or causal influences, or timing signals what is the likely state of the column?)
- Top-down feedback example: you have inferred that you are on the TBP mug, and move to where the logo is; the column representing TBP mug tells the column with a logo model to expect the logo.
- Causal influence example: you turn on a light switch, and the column modelling this behavior tells a column modelling the light bulb behavior that it is likely going to enter the illuminated state.
- Timing signal example: you are at a particular point in a song, and a period of time passes by; the activity of time cells tell the column modeling this song that it should imminently experience the next learned element in the sequence.

We think all of the above might be coming in at the L1 apical dendrites based on the neuro-anatomy.

It is definitely appealing to move a lot of the control that currently exists in Monty (like policy heuristics) into learned context of this kind. Some of the control/context signals might also be removed entirely (e.g. train/eval mode is more of an artifact of our experiment setup), while others might not fit neatly into this kind of L1 input (like motor only steps), and so would still be defined another way.

Daniel_Brownell · February 24, 2026, 1:34pm

Thanks for the reply.

I had some chats with AI about how this might fit into the codebase, to sanity-check that this could be added without redesigning anything.

Conceptually, with your discussion, I’m thinking of goal/context as a soft bias field over inference and action trajectories (i.e. shaping which hypotheses and next actions are dynamically preferred over time). From a code point of view, that seems like it could be represented pretty cleanly as a small top-down prior applied during hypothesis scoring, without touching CMP or the voting logic.

So the idea would be to add a typed ‘apical’ side-channel (parallel to CMP), and initially use it in one place: as a bias term during evidence-matching hypothesis scoring.

Just an explicit way to represent and apply top-down feedback (goal states, predicted states, causal/timing influences). The biological differentiation of these categories is obviously fuzzy, but for code it probably helps to have a small enum for clarity.

1) Minimal types (one new file)

Add apical_signals.py containing something like:

ApicalSignal(
  kind=GOAL_STATE | PREDICTED_STATE | CAUSAL_HINT | TIMING_HINT | POLICY_BIAS | ATTENTION_HINT,
  ref=<typed ref>,
  confidence,
  horizon,
  priority,
  ttl,
  source_id,
  target_scope/ids
)

ApicalBundle(step_index).merge(incoming) with deterministic merge rules, e.g.:

GOAL_STATE

At most one active per (target_scope, target_ids).

If multiple arrive, keep the highest (priority, confidence) and log which were suppressed.
PREDICTED_STATE

Keep top-k by (confidence * priority) for a given target scope.
POLICY_BIAS / ATTENTION_HINT

Combine as normalized weights (e.g. weighted average), so multiple weak hints can sum to a meaningful bias.
CAUSAL_HINT / TIMING_HINT

Accumulate as annotations on the bundle (don’t replace payload identity), mainly for conditioning downstream logic or debugging.

Payload refs for MVP would just be typed handles:

HypothesisRef, SymbolRef, FeatureRef, ActionRef

2) Minimal plumbing (backward compatible)

Optional no-op hooks on modules/policy:

produce_apical_signals() -> list[ApicalSignal]
consume_apical_bundle(bundle) -> None

In the MontyBase step loop:

collect signals → merge bundle → dispatch bundle

If the bundle is empty or the feature flag is off, behavior is standard Monty.

3) MVP: one choke point, no CMP/voting changes

Implement Option A (MVP): apical priors in hypothesis scoring inside the evidence-matching LM:

Add one function:

apply_apical_bias(scores, apical_bundle)

Conceptually:

For each hypothesis h with base score S(h)
If there is an apical signal targeting h (or its feature/action ref), adjust:

S'(h) = S(h) * (1 + w * confidence * priority)

or equivalently add a small log-prior / offset term.

This keeps the effect soft:

no hypothesis is forced on/off
apical signals only tilt rankings
if signals are wrong or absent, the system falls back to pure evidence

No changes to CMP, voting, or hypothesis structures.

4) Debugging / introspection

Log signals received and their effect:

which hypotheses/actions got deltas
by how much
which signal IDs caused the change

So it’s obvious when top-down signals actually mattered vs when they were inert.

I haven’t looked at the roadmap yet and this is just a general idea, but I can try a small PR if it’s useful. It might take me a bit of time though. If the team is already working on this area, we can just leave it as discussion.