Monty + Reward Free Empowerment for Intrinsically Motivated Autonomous Agents

Has anyone explored using Monty with Empowerment/Valence stuff to create intrinsically motivated agents who learn from their environment? Because it isn’t reward-based and only requires an agent, its sensorimotor, some abstraction space, and an environment, it seems like a good fit for Monty.

Any thoughts or feedback would be exciting to hear!

Below is Rinstrom’s thesis, “Reward is not Necessary: Foundations for Compositional Non-Stationary Non-Markovian Hierarchical Planning and Intrinsically Motivated Autonomous Agents”

2 Likes

Thanks for sharing that work, it definitely looks interesting. You’re absolutely right that we want most learning in Monty to be intrinsically motivated. In general our thinking is that Monty will be driven by a curiosity-type motivation to explore the world and understand it. This would manifest in the form of an LM receiving a “goal-state” that encourages the receiving LM to study unknown information - and eventually unknown dynamics - in its models. It’s important to note that a Monty system is always learning, in the sense that it will always try to integrate new information into its models based on what it is experiencing.

A “curiosity” policy that supports learning by sampling underexplored areas is not currently implemented, although you might find some of the early work we would like to do on that interesting, discussed in this open PR on top-down exploration policies.

Empowerment certainly seems related. I’d be interested to better understand to what degree it can be captured with a curiosity-style objective - i.e. in order to learn more about the world, you generally need to be able to manipulate it in interesting ways, which should encourage developing more complex models and policies?