“Do generative video models understand physical principles?”
Perhaps, one could attempt a Monty version of the benchmark at some point.