Life, Research, People, and Ideas
A Unified Theory of Sparse Dictionary Learning in Mechanistic Interpretability: Piecewise Biconvexity and Spurious Minima
The first unified theoretical framework for Sparse Dictionary Learning in mechanistic interpretability, explaining why SDL methods suffer from dead neurons, polysemanticity, and feature absorption—and providing a principled fix grounded in piecewise biconvexity and non-identifiability.
Read MoreModality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
A walkthrough of a paper that goes beyond the isotropic assumption to formally decompose the modality gap in CLIP-style models into a Stable Bias and Anisotropic Residuals. The paper introduces ReAlign, a closed-form alignment strategy, and ReVision, a training pipeline enabling text-only pretraining for MLLMs.
Read MoreBrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain
An automated framework that discovers thousands of interpretable visual concepts encoded across the human visual cortex—from "hands holding objects" to "forest scenes"—revealing fine-grained brain representations previously unreported.
Read MoreThe Evolution of Multimodal Model Architectures: A Taxonomy
A comprehensive walkthrough of Wadekar et al.'s taxonomy of multimodal architectures. Discover the four distinct architectural patterns—Type-A through Type-D—that define how models like GPT-4V, LLaVA, and Gemini combine vision, language, and other modalities.
Read MoreThe Linear Representation Hypothesis and the Geometry of Large Language Models
A walkthrough of Park et al.'s ICML 2024 paper on the linear representation hypothesis. We explore how high-level concepts are represented as linear directions in language model representation spaces, with formal definitions and theoretical foundations.
Read More