Summary
State Space Models (SSMs), popularized by the Mamba architecture, offer a compelling alternative to Transformers for sequence modeling. SSMs replace the attention mechanism with a learned state-space dynamics model that compresses the entire sequence into a hidden state, achieving linear-time inference and theoretically unlimited context length.
Key Characteristics
- Linear-Time Inference: Processes sequences in O(n) time rather than O(n^2), enabling very long context windows
- Recurrent Formulation: Operates as a recurrent neural network at inference time for constant-memory generation
- Selection Mechanism: Modern SSMs (Mamba) learn to selectively propagate or ignore information based on input content
- Hardware-Aware Design: Optimized for GPU memory hierarchy with parallel scan algorithms
Popular Models
- Mamba: The pioneering selective SSM that matches Transformer quality with linear-time inference
- Mamba-2: Simplified architecture using state-space duality theory for improved throughput
- Jamba: Hybrid architecture combining Mamba layers with Transformer attention layers
- S4 (Structured State Space Sequence Model): Foundational SSM introducing structured state-space parameterization