Ok great, that’s useful context. In that case, I would focus on the distant agent with voting, more than the surface agent. Even if your camera is attached to a robotic arm, as long as it maintains a reasonable distance from the object, then it is more analogous to the distant agent.
For a first experiment, you might want to see the Monty Meets World hackathon we did. In that, we take a single, large RGB-Depth image, and have the agent move “within” that image. It could be a good way of testing your setup before your robotic arm actually starts moving. @Zachary_Danzig has been working on something similar as part of a research project he’s doing, so you may also be interested in the discussion in this thread.
Note that once you start setting up voting and having the robotic arm move, I would suggest that you define the multiple SMs as part of a camera pre-processing stage, rather than trying to use a “view-finder” as a single sensor module. I think that is what you were planning on doing anyways, but I just thought I’d clarify that for voting to work, you should define multiple SMs, rather than trying to route a single (large) SM to multiple LMs.