Summary

Self-Consistency is a decoding strategy that samples multiple reasoning paths from the LLM and selects the most frequent answer. By aggregating diverse reasoning traces, this technique reduces errors from single-path reasoning and improves answer reliability. It builds on Chain of Thought by sampling multiple CoT outputs rather than using greedy decoding.

How it works

  1. Generate multiple paths: Sample N reasoning traces for the same problem
  2. Extract answers: Parse final answers from each trace
  3. Vote/aggregate: Select the most common answer
  4. Return result: Output the consensus answer

Key considerations

  • Sample count: More samples increase accuracy but cost
  • Aggregation: Majority vote, weighted, or confidence-based
  • Diversity: Temperature and sampling parameters affect variety
  • Speed: Parallel generation can mitigate latency

When to use

  • Tasks where multiple reasoning paths exist
  • Applications requiring high reliability
  • Scenarios where cost-latency trade-off is acceptable
  • Math, logic, and factual reasoning tasks

Build This Pattern

Copy this prompt and paste it into Claude Code, OpenCode, Codex, or Cursor to implement this pattern.

Build me a self-consistency system for LLM reasoning. Architecture: implement a parallel generation module that creates N independent reasoning paths (default 5, temperature 0.7) with different random seeds. Each path produces a structured output. An aggregation module then combines answers using type-specific strategies: majority voting for multiple-choice, median with variance for numeric answers, and theme clustering for free-text. Include an answer type detection module that automatically selects the aggregation strategy. Error handling: handle paths producing no answer by excluding them from aggregation. If all paths disagree (agreement rate below 0.3), return all answers with a low-confidence warning. Handle empty or invalid path outputs gracefully. Edge cases: handle ties in majority voting by using secondary criteria (path confidence). Handle numeric answers with units by normalizing before aggregation. Support answers that are lists or sets by using Jaccard similarity for clustering. Best practices: return consensus answer plus individual path outputs, aggregation results, confidence score (agreement rate), and which paths disagreed. Make N and temperature configurable. Log per-path generation time. Testing: unit test each aggregation strategy independently. Test with known consensus and dissensus scenarios. Verify confidence scores reflect agreement accurately. TypeScript.