Back to Patterns

Reasoning Models & Test-Time Compute

Architecture Patterns

Summary

Reasoning models and test-time compute (also called inference scaling) represent a paradigm where models use additional computation during inference to improve output quality. Instead of a single forward pass, these models explore multiple reasoning paths, verify their own outputs, or iteratively refine responses before producing a final answer.

Key Characteristics

  • Inference-Time Scaling: Performance improves with more compute allocated at inference time, not just training time
  • Internal Reasoning: Model engages in self-directed reasoning steps before producing the final output
  • Verification Loops: Model checks its own work and corrects errors before finalizing
  • Search Over Outputs: Multiple candidate outputs are generated and the best one is selected

Popular Models

  • OpenAI o1 / o3: Reasoning models that think before responding, excelling at complex problem-solving
  • DeepSeek R1: Open-weight reasoning model with chain-of-thought during inference
  • Claude Opus: Extended thinking mode for complex reasoning tasks
  • Gemini 2.0 Flash Thinking: Google's reasoning-enabled model with visible thought process

Build This Pattern

Copy this prompt and paste it into Claude Code, OpenCode, Codex, or Cursor to implement this pattern.

Explain reasoning models and test-time compute. Cover: the shift from scaling pretraining to scaling inference-time computation. Key approaches: chain-of-thought with verification, self-consistency ensembles, process reward models, tree/monte-carlo search at inference time. Models: OpenAI o1/o3, DeepSeek R1. Trade-offs: latency (30s+ for hard problems), cost (10-100x more compute per query), variable response time. When to use: math, coding, science, planning. When not: simple Q&A, creative writing.