Using MCP as the cognitive layer in a robotics stack

Rich_Morin · December 25, 2025, 8:54pm

I’ve been speculating about modules (i.e., support services) that might be useful for Monty in a robotic context. Some of these will be found in (or adapted from) Elixir or MCP archive offerings; others will need to be created from scratch. Comments and suggestions welcome…

Modules

Image Grabber

Monty will use a digital camera for image acquisition. The Image Grabber can collect either single images or timed sequences (e.g., video).

The returned output will be sets of rectangular pixel arrays for specific image planes, e.g.: RGBD (red, green, blue, depth), infra-red, ultra-violet, …

Temporal Manager

This serves as an indexible queue (FIFO) and/or time-series database, allowing Monty to request images taken at specified times. One use for this is to emulate cortical transmission delays, but it can also remove glitches caused by asynchronous message handling.

Patch Grabber

Monty isn’t prepared to handle large arrays of pixels, so the Patch Grabber will retrieve small patches of pixels at designated locations from (much larger) pixel arrays.

Sensor Module

This is a coordination and interface module, allowing Monty’s Learning Modules to access desired data in their preferred manner (e.g., CMP).

Transform Manager

There are all sorts of data transformations that could make the input images more usable by Monty. These include Fourier transforms, limiting, log scaling, smoothing, etc. The Transform Manager can construct and manage a data transformation pipeline for any needed processing.

Discussion

Back in Mermaid musings: simple graphs of actors, I presented this diagram:

graph LR;
  EP_MM["Eye Position<br>Motor Module"];
  LE_SH["Left Eye<br>Sensor Hardware"];
  LE_SM["Left Eye<br>Sensor Module"];
  RE_SH["Right Eye<br>Sensor Hardware"];
  RE_SM["Right Eye<br>Sensor Module"];
  BV_LM["Binocular Vision<br>Learning Modules"]

  LE_SH <-- Raw --> LE_SM;
  RE_SH <-- Raw --> RE_SM;
  
  LE_SM <-- CMP --> BV_LM;
  RE_SM <-- CMP --> BV_LM;
  
  BV_LM <-- CMP --> EP_MM;

Let’s decompose an Eye Sensor Module, using the modules described above:

graph LR;
  E1a["<br>"];
  E1b["<br>"];
  IG["Image<br>Grabber"];
  PG["Patch<br>Grabber"];
  SH["Sensor<br>Hardware"];
  SM["Sensor<br>Module"];
  TM1["Temporal<br>Manager"];
  TM2["Transform<br>Manager"];

  SH --> IG --> TM1 --> E1a;
  E1b --> TM2 --> PG --> SM;