@xavier from ETH Zurich presents his master’s thesis, exploring how Thousand Brains systems could scale on modern hardware. His research examines how scaling the number of learning modules affects computing performance on GPUs, CPUs, and processing-in-memory (PIM) architectures. GPUs aren’t a great fit because autoregressive algorithms like Monty have low operational intensity. CPUs can scale reasonably well but require more computation time as the number of modules increases. PiMs offer a promising alternative by placing computation near memory, enabling large-scale parallelism. Xavier shows results from scaling to 2,500 learning modules, representing millions of neurons and billions of synapses.
0:00 Introduction
0:49 An Overview of Monty’s Structure
2:12 Motivation: Rapid, Continuous, and Compute Efficient Learning
2:51 Scaling Cortical Columns
3:48 Scaling an Algorithm with an Auto-regressive Loop
4:39 Investigate the Scalability of Thousand Brains Systems
5:41 Montyll – A Novel Thousand Brains System
6:25 Why HTM Networks?
8:03 Scaling on GPUs
9:56 Scaling on GPUs: Operational Intensity
11:54 Scaling on CPUs
13:07 Can Multicore CPUs Handle the Amount of Data Movement?
15:03 Scaling on Processing-in-Memory (PiM) Chips
15:47 Scaling on PiMs: DRAM Banks
24:14 Scaling in Data Centers
25:13 The Montyll Implementation
30:20 Cat Cortex Scale System (2500 Learning Modules)
32:33 Why Logic Frequency Is Low?
33:58 Processing-in-Memory Chip Illustration
38:20 WRAM
39:26 MRAM
40:41 Connection Transfer
41:12 Tasklet Level Parallelism
42:08 Barriers and Synchronization
43:36 The Results
45:35 Results: Time per Step
56:40 Neurons and Synapses vs Devices



