Mozes Jacobs

I am a third-year computer science PhD candidate at Harvard University, advised by Professor Demba Ba. I am supported by the Kempner Institute Graduate Fellowship.

Representation LearningInterpretabilityFoundation Models

I'm interested in the computational principles of deep learning: what neural networks learn and why they learn it. My research sits at the intersection of representation learning and interpretability. I seek both mechanistic understanding of how networks compute and normative understanding of why those computations emerge, and aim to leverage these insights to improve neural network efficiency, generalization, and safety.

Previously, I worked at the AI Institute in Dynamic Systems with Nathan Kutz and Ryan Raut. I earned my B.S. in computer science from the Allen School at the University of Washington, where I worked with Rajesh Rao and William Noble.

Email  /  CV  /  Google Scholar  /  Twitter  /  LinkedIn

profile photo
Selected Publications

* denotes equal contribution

ICLR 2026
raptor
Block-Recurrent Dynamics in ViTs
Mozes Jacobs*, Thomas Fel*, Richard Hakim*, Alessandra Brondetta, Demba Ba, T. Andy Keller

We introduce the Block-Recurrent Hypothesis (BRH), arguing that trained ViTs admit a block-recurrent depth structure. To validate this, we train recurrent surrogates called Raptor. We demonstrate that a Raptor model can recover 96% of DINOv2 ImageNet-1k linear probe accuracy in only 2 blocks while maintaining equivalent runtime. We leverage our hypothesis to perform dynamical interpretability, revealing directional convergence into class-dependent basins, token-specific trajectory dynamics, and low-rank attractor structure in late layers.

CCN 2025 (Oral)
traveling_waves
Traveling Waves Integrate Spatial Information Through Time
Mozes Jacobs, Robert C. Budzinski, Lyle Muller, Demba Ba, T. Andy Keller
blog / talk

We investigate how traveling waves of neural activity enable spatial information integration in convolutional recurrent networks. Our models learn to generate traveling waves in response to visual stimuli, effectively expanding receptive fields of locally connected neurons. This mechanism significantly outperforms local feed-forward networks on semantic segmentation tasks requiring global spatial context, achieving comparable performance to non-local U-Nets while using significantly fewer parameters.

Other Work
Traveling Waves Integrate Spatial Information Into Spectral Representations
Mozes Jacobs, Robert C. Budzinski, Lyle Muller, Demba Ba, T. Andy Keller
ICLR 2025 Re-Align Workshop
HyperSINDY: Deep Generative Modeling of Nonlinear Stochastic Governing Equations
Mozes Jacobs, Bingni W. Brunton, Steven L. Brunton, J. Nathan Kutz, Ryan V. Raut
arXiv, 2023
Gradient Origin Predictive Coding
Mozes Jacobs, Linxing Preston Jiang, Rajesh N.P. Rao
Undergraduate senior thesis, 2022

Website template from Jon Barron