2025/08 - Recap of Behavior Solutions and Surface Model Proposal

@vclay recaps key ideas from behavior modeling discussions, exploring how behavior and morphology models can be separated, learned, and connected. The team also covers hierarchy and open questions around column separations, segmentation, and model types. Viviane proposes the idea of surface models and generally thinking about columns that receive different types of input and how this can be utilized. We talk about how the ideas could map onto neuroanatomy and how they could be implemented in Monty.

0:00 Introduction
1:49 Behavior Models Work like Morphology Models
6:25 First Learn Coarse Models and Interpolate
8:04 Learn Associations Between Object and Behavior
13:38 Making Predictions about an Object at an Arbitrary Point in the Behavior Sequence
19:55 Segmenting Child Objects Based on Movement of Parts
26:50 Circle and Oval Example
33:00 3 Types of Models
51:26 Observation and Question: When to Split Knowledge Into Separate Models
56:24 Elegant Principles vs Hacky Solutions
1:01:24 Inputs that Create the Different Models
1:04:12 How Could this be Achieved in the Brain
1:08:30 Where Would Surface Models Be in the Brain
1:12:01 More Abstract Models in Higher Regions and Further Questions
1:25:14 Implications to Monty
1:39:06 Wrap up and Other Discussions

1 Like

Hello all,

I’m new here. I’ve read both Jeff’s books. I’m just getting into monty. The whole idea seems very interesting to me.

I was watching this video and an idea came to me. I’m just scratching the surface here, thus maybe it makes no sense.

One of the ideas is to create an explicit representation of the surface of the object that is being represented. Just positions and RF gives an implicit representation of the surface (since you have a normal direction per point) but it seems there is not enough information on that representation to easily infer things like logos. The solution proposed seems to be to create an explicit representation of the surface, although this brings other problems.

It looks to me that a richer, still implicit representation of the surface could work better. I think it could be argue that we only see surfaces considered as 2D manyfolds in 3D, not surfaces in 2D of course. The ideas is that if one was to use something like gaussian splatter as model description, most of the characteristics of the object would be implicit and they could be learn.

I’m planning to apply to a PhD program, so I may be able to test some of this ideas in the future. But, I just thought it may be worth it to mention it here in case somebody found it interesting.

By the way, I have no idea if this has any reflection on human biology. My biology knowledge is extremely limited.

Hi Alberto,

yes that is an interesting idea. The topic of how to represent surfaces has come up many times and in various contexts for us, so I’d be curious to hear more about what problem you would solve with that and how you would represent the surface.

For example, one other context in which we are thinking about surface representations is when retrieving stored features near locations in our model. We use a basic heuristic (search further in the plane orthogonal to the surface normal and less far in the direction of the SN, more details here) but they could surely be improved.

Regarding the surface models modeling and recognizing logos on different shaped objects: This is active work in progress, and @hlee has implemented and tested a sensor module for this that is currently being integrated into tbp.monty (prototype here: https://github.com/thousandbrainsproject/feat.2d_sensor/pull/2)

I’d be excited to hear more from you if you get to look into this (or another TBT related topic) for your PhD!

Best wishes,
Viviane

Hi Viviane,

I was thinking about this when you when in the video you discussed the idea of having a separated representation for the surface. I’ve work in research in finite element models in the past and it got me thinking about how the representation of the surface should help improve recognizing deformed objects. Using a mesh seems perfectly fine to me as long as the objects are rigid. Dealing with deforming them and still identifying them as the same object seems quite hard indeed based on a surface with normals. I think that an implicit representation in which only points move should be easier. I imagine a paper with a drawing. If we curve that paper we still understand that it is the same paper. I all we have is distances between points and information at each point it should be easier to deal with. I agree that it should be estimate each time a normal field over a new mesh, but I fail to see how the system would recognize that as the same object.

I’m likely jumping the gun here as you are still dealing with rigid objects. It was just my thought process.

Regarding information that forms features in the surface (a logo for example), the way I understand it is monty does this hierarchically (I’m just starting with this so I may have misunderstood). The information on the surface defined as distances between points of particular color, brightness, etc. should be invariant to position over the surface. I believe monty does that already, so there should be no problem.

Obviously I have to think more deeply about this and the distance between an idea and an implementation is huge.

In practical terms, I was wondering what it would take to recognize biological systems. Particularly, how to recognize individual animals. Current systems are good at distinguishing a cow from a horse, but what about a particular horse or cow. A human care taker working with animal can do that very efficiently while current systems would require many pictures from the each individual of the population to do it.

If I manage to do something useful I’ll let you guys know for sure.

Best regards,
Alberto

Hi @alberto

I think the surface model proposal is slightly different from what you are describing but achieve the same effect (i.e. we can recognize things even as they deform). Basically the idea being that we use the 3D point normals to map our 3D movement into a 2D space. That 2D space would be distortion invariant. It basically just represents movement along the surface instead of movement in 3D space. We have some videos where we describe it in more detail coming up, but maybe this section of our recent focus week demos is useful:

Basically, we project the movement input to the LM into a lower dimensionality, which is robust to distortions in 3D space. The internals of the LM are unchanged; it just receives different movement input from the sensor module.

Regarding learning and recognizing classes vs. specific instances of a class, you might find our latest discussion on that topic interesting: https://www.youtube.com/watch?v=TpNzJrF3cKw

Best wishes,
Viviane