2025/09 - Proposal for How Columns Learn Sequences and Behaviors

@vclay presents a proposal for how timing signals can be generated and then interpreted in a cortical column allowing it to learn sequences of object behaviors and to use these behaviors to make predictions about morphology.

Watch the summary video here:

00:00 Introduction
00:46 Illustrative Melody Sequence Example
09:28 The Direction of Sequences
11:34 Stapler Example Sequence
22:54 Potential Cyclical Reset Signal
31:07 Paper: Complex Context Dependent Expectations in Mouse V1

4 Likes

Hi

It’s possible you may be over thinking this :slight_smile: Why can’t a melody be just another object where one of the dimensions is time. The same techniques of invariant representation apply such that you can expand or compress the time axis and the power axis and still recognise it as the same melody. We can memorise extremely long melodies (Tubular Bells comes to mind) and will notice really quite small discrepancies between what we are hearing and what we have memorised.

[ the need for the grey parts, if I have understood you correctly, would be to terminate the melody otherwise it has to be memorised as a continuous sequence until the boredom timeout occurs]

As I think I mentioned in another post somewhere, physical movements can be treated as melodies and so by implication can be treated as objects. Walking is a pattern of motor actions which can be played back fast or slow, and if the feedback from the world is not what we anticipate (for example the ground is not there when we put our foot down) we notice it immediately.

Treating physical objects, melodies and motor actions as different types of the same thing supports the supposition that the brain uses the same structures for vision, touch, hearing and motor.

Pressing a stapler is a combined melody of touch, motor and sound. If any of those don’t play out as expected we will notice.

For sure the brain marks the passing of time, but I doubt it has any measure of absolute time.

Jeff was saying in another video about behaviour something that we should always keep in mind, all the brain receives is nerve signals, there is nothing to identify where in the body they come from or what they represent, the only thing the brain has to work from are the patterns those nerve signals exhibit over an approximate timescale. So everything our senses pickup must be handled in a similar manner.

Alex

2 Likes

Hi Alex. I think we’re basically in agreement about treating melodies like objects with time as a dimension, but learning temporal sequences does present some problems unique to time. As discussed in the video, learning to recognize when a sequence repeats is a different kind of problem since there is no motion-like signal telling us that we have “moved” back to a previously visited location in sequence time. In a way, learning temporal sequences is analogous to learning a repeating spatial pattern laid on an object’s surface. Yes, we could just learn every point on such an object uniquely and ignore the repeating patterns, but it would be a more efficient (and I think more informative) to recognize when a texture contains repeating subunits. In the time domain, recognizing starts, stops, and repeating units is pretty necessary, which is why we focused a lot on those kinds of problems in this video rather than the aspects of the problem that are common with object recognition.

Additionally, we can’t move back and forth through melodies freely like we can with purely spatial sensorimotor contexts, so the passive and unidirection aspect adds some additional complexity when it comes to treating melodies like objects.

But in general, I think a more complete description of how we model melodies will include mechanisms common to object recognition, but in this meeting we focused more on the parts that make it a bit different.

Cheers,

Scott

2 Likes

True that, but there are various ways to finesse the passive and unidirectional aspects of examining melodies in (near) Real Time. For example, various data structures (e.g., indexable FIFO, ring buffer) can be used to store historical data, allowing programs to examine (e.g., sample, traverse, transform) sets of this data pretty freely. AFAIK, the main gotchas are:

  • Incorporating context from the “future” is limited by the amount of delay the program is willing to experience with respect to the incoming signal(s).

  • Incorporating massive amounts of context (e.g., snapshots) is limited by memory and processing constraints.

That said, recognizing repetitive patterns may be largely immune from these issues, if only because:

  • It’s possible to process and store data as it comes in, then examine the (pre-processed) data at need. For example, time-tagged events can be recorded.

  • At least a few of the repeating events will have to be “in the past” (else how would we know of the repetition?).

In a way, these are similar problems to those faced by a Monty-based vision system: CCD image data may need to be pre-processed (e.g., divided into “patches”) in order to provide data that an LM can absorb.

Around 28:25 there is a discussion on how to distinguish a start of a new sequence from a continuation of a longer one, if the first sequence element is detected again - what is a significant event to reset the clock? This is definitely a hard problem, if there is nothing else except for a continuous change of elements, like letters, and probably there’s nothing to do in such case except to hope that the sequence will be learned. But maybe this scenario is not that frequent in practice.

Maybe neighbor columns can detect some significant events to reset the clock? This would give a kind of spatial context for a sequence being learned. Like one column is about to learn letter sequence GATTACA that repeats itself with no delimiters, but on each 3rd occurrence of “A” neighbor column detects a flashing light, so the column can associate it with an end of the sequence.

The example is too artificial, but maybe alike stuff happens in the example that Viviane suggested, when a rhythm section can highlight which note in a melody should be stressed, splitting a melody into chunks. If different groups of the cochlea’s cells (different frequencies) map to different columns, then a “rhythm” column could provide significant events to a “melody” column via horizontal connections. Maybe voting is required to agree on what is significant in a particular column’s locality.

3 Likes

Hey artem, nice insights. I don’t remember where the literature stands on this (or if it has settled at all), but the best you might be able to do with a continuous change of elements is learning transition probabilities. It’s still something, but not really sequence learning. Just thought I’d mention it though since learning transition probabilities is kind of interesting in and of itself.

As per your suggestion about neighboring columns: I’m imaging an experiment that one could do if they happened to be in a lab… get your mouse in front of a screen, but physically divide the visual field, (i.e., place a board going from the nose to the screen). Have both the left and right eyes view the same sequence ABCDE, except E is a regular element for the right eye, but E is a gray screen for the left eye. If you record from binocular V1, where columns have either left- or right-eye ocular dominance, then the gray-screen E should from the left-eye columns should “bleed into” or inform right-eye columns, thereby recovering plasticity you normally get when E is always gray for both eyes. Just a thought, but figuring out how to record from only right-eye ocular dominance columns would be a haul.

That’d be one way to test the neighboring column idea. But in general, columns might be receiving information much more distantly as well that could be used as a kind of resetting cue. I’d imagine higher sensory/association areas would be the most natural place for that, but there is some degree of multi-modality in lower sensory areas as well.

3 Likes

The “remarkable event” signal somewhat reminds me of this paper [1]. However it doesn’t quite match, because it sounds like a remarkable event doesn’t necessarily need to be an unexpected event in the way you discuss it.

[1] Varela, Carmen, et al. “A mechanism for deviance detection and contextual routing in the thalamus: a review and theoretical proposal.” Frontiers in Neuroscience 18 (2024): 1359180.

1 Like