Structured representation with deep learning

Hello, I recently read this paper: Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling.

It has some similar ideas to Monty, mainly using keypoint proposals to learn a structured representation of objects, in this case in the form of particles with 2D positioning, bounding-box size, depth (not real 3D since it learns from single camera video but more like relative order of the objects in the scene), etc.

It relies on distinguishing background from foreground, requiring either little camera movement or many episodes of the same scene.

Just wanted to share it as I found it interesting.

1 Like