Surface + multiple learning modules

Tech_LingQi · February 7, 2026, 9:39am

Hi,

I wonder is there any experiment that uses the surface view finder mount together with multiple learning modules?

From the codebase it looks like:

The surface view finder setup is always single-patch + viewfinder(e.g. surf_agent_2obj_train)

Multi-LM configs uses the dist naive policy not the surface moving policy(forward->hori->vert cycle)

is there an existing config that is doing this and I missed it? or any deeper reasons we are not doing this right now?

Thanks for any reply.

nleadholm · February 9, 2026, 8:44am

Hi @Tech_LingQi , good question. There is no existing config for this, however in principle it should be possible to set this up. If you do I’d be very curious to see any visualizations or results you make (tbp.plot might be useful).

The main thing I can think of that might be an issue is that the surface agent pivots around the patch, trying to keep it at a certain orientation and distance to the surface. If there were multiple SM/LMs, it is currently hard-coded that the SM with patch ID 0 will determine the re-orientation of the agent. In theory there is a chance that the other SMs would be off the object, or clipping into the object, but in practice I think this is low risk if they are reasonably close together. It also isn’t the end of the world if some SMs are occasionally off the object (this also happens with the distant agent when using multiple patches).

Hope that helps and as said if you do run some experiments it would be interesting to see how it works!

Tech_LingQi · February 9, 2026, 12:03pm

Thank you for your replay.

This is exactly what i’ve been wondering. I’ve grasped the core concepts of 1SM algorithm and haven’t tried the multi-SM code. I was wondering if i had 5 SMs, which SM should be used to go through the align-camera-z-axis-to-surface-normal process, or should i go through the process for each SM in turn?

As you’ve said, if we use the first SM, other SMs may have unexpected situations. Also,Ii wouldn’t assume they are close to each other, because it’s reasonable to imaging thousands of SMs covering a field view even larger than the object. So i think maybe we need an algorithm that can handle multiple SMs in a more general way?

Any suggestions? Thanks in advance

nleadholm · February 9, 2026, 1:13pm

No worries. Re. the nearness of the SMs, it’s important to note that in Monty, an “agent” is any component that can move in a semi-independent manner from other agents. Thus, two different finger tips are both agents, but the different patches of skin on a single finger tip are not different agents, because they are fixed relative to one another. Having support for multiple agents that move independently is indeed on the roadmap for Monty, but there are a lot of pieces that need to come together for that. In the meantime, I would suggest putting the SMs close to one another (potentially even overlapping in some areas, but with different zoom-factors/receptive field sizes). You can see an artistic example of this below. This would be a good simulation of a finger tip with multiple patches of skin, which is something we haven’t tried yet.

If you were to do the above, then re. your first question, you would just need to configure a multi-SM/LM experiment with the surface agent, and visualize what happens. The existing code will make use of the first SM to re-orient, which should be sufficient for this setup where the SMs are near each other.

Tech_LingQi · February 9, 2026, 1:52pm

Thanks, I now understand the difference between agent and patch, and multiple-agent support is one the way.

Actually, because i am doing robotics, what’s in my mind is:
i have a camera on a robot arm. The camera is the agent. The whole image captured by the camera is the observation of the view_finder(a special patch, i guess). I can define as many small patches as i want, say, a thousand for a 512*512 image, all attached to the agent. Moving the robot arm equals moving the agent and all the patches attached to it.

If the above understanding is correct, then what i want to experiment(either in Isaac Simulator or in reality, latter) is: by moving the arm around an object for a few times or even just one time, monty can learn a very detailed model. Also since we have so many patches, monty can recognize the object in just a couple of movement.

So can you give me some suggestions on how to conduct the experiment based on current version of monty? If i can do it successfully, then it would be useful in some real world robot tasks.

Thanks again.

nleadholm · February 9, 2026, 3:46pm

Ok great, that’s useful context. In that case, I would focus on the distant agent with voting, more than the surface agent. Even if your camera is attached to a robotic arm, as long as it maintains a reasonable distance from the object, then it is more analogous to the distant agent.

For a first experiment, you might want to see the Monty Meets World hackathon we did. In that, we take a single, large RGB-Depth image, and have the agent move “within” that image. It could be a good way of testing your setup before your robotic arm actually starts moving. @Zachary_Danzig has been working on something similar as part of a research project he’s doing, so you may also be interested in the discussion in this thread.

Note that once you start setting up voting and having the robotic arm move, I would suggest that you define the multiple SMs as part of a camera pre-processing stage, rather than trying to use a “view-finder” as a single sensor module. I think that is what you were planning on doing anyways, but I just thought I’d clarify that for voting to work, you should define multiple SMs, rather than trying to route a single (large) SM to multiple LMs.

Tech_LingQi · February 10, 2026, 2:43am

i see, i will check the multiple-SM code and see what i can do in my context. Also thanks for the reference:

If the experiment goes well, i will share some video here. Thank you for your time!

nleadholm · February 10, 2026, 8:00am

Sounds great and looking forward to it!