Unsupervisedly determining the end of an episode

xavier · May 1, 2026, 1:39am

I am coming back today with some more naive questions to the great people of this forum.

So far the way Monty knows where an episode boundary is, is because we tell it.

I know you have been doing some great work on determining this in an unsupervised way: present a view with multiple objects at once and Monty autonomously determines when it should reset its temporal context because it is looking at a different separate object. Is this working well?

And do you guys believe that am LM should be able to locally determine when its context is not coherent anymore (hence episode boundary), or should this need some higher level signal (probably related to attention)?

If there is already a discussion or a research video on this, I can’t seem to find it!

nleadholm · May 1, 2026, 8:33am

Hi Xavier, great questions. @rmounir has done some nice work on this with burst sampling, which you can watch a video about here: https://www.youtube.com/watch?v=DMuFICwGbWY

I think that might address most of your questions, but feel free to follow-up with clarifications. The short answer is that when Monty’s starts to make poor predictions based on its current model, then it will sample new hypotheses about the state of the world. So far this is indeed working well, enabling an LM on its own to determine that context has changed. So far we haven’t had a need for this to be a global signal for object inference, although you could imagine it being somewhat global through voting effects.

One thing to note is that unsupervised interactions with the world can happen in the context of both inference and learning. Our “unsupervised inference” and burst sampling, covered in the above video, only deals with the former, i.e. removing the use of the episode boundary as a means of resetting the system. I think this is what your questions are mostly about.

However, a related (and arguably even harder) problem is for the system to detect when it should learn about an object because it’s existing models are insufficient - this could either consist of updating existing models, or creating an entirely new one. Our unsupervised learning does have a way of approaching this, but we think we need to make it more sophisticated to perform better. We talk about this in this Future Work item, and if you’re interested, this is something that @YiannosD has been exploring here on the forums.

As a final complication, there is also the notion of resetting the system when modelling time-evolving dynamics, i.e. object behaviors. This is a case where we imagine there being a more global signal that resets the representation of time in all the LMs, akin to an attention effect that you refer to. Note that this is specifically for object behaviors, which are not yet implemented in Monty. You can read more about this here: Global Interval Timer

Hope that helps!

xavier · May 1, 2026, 11:22am

This is fantastic thank you Niels