Self-Consistency is a decoding strategy that samples multiple reasoning paths from the LLM and selects the most frequent answer. By aggregating diverse reasoning traces, this technique reduces errors from single-path reasoning and improves answer reliability. It builds on Chain of Thought by sampling multiple CoT outputs rather than using greedy decoding.
How it works
Generate multiple paths: Sample N reasoning traces for the same problem
Extract answers: Parse final answers from each trace
Vote/aggregate: Select the most common answer
Return result: Output the consensus answer
Key considerations
Sample count: More samples increase accuracy but cost
Aggregation: Majority vote, weighted, or confidence-based
Diversity: Temperature and sampling parameters affect variety
Speed: Parallel generation can mitigate latency
When to use
Tasks where multiple reasoning paths exist
Applications requiring high reliability
Scenarios where cost-latency trade-off is acceptable
Math, logic, and factual reasoning tasks
Build This Pattern
Copy this prompt and paste it into Claude Code, OpenCode, Codex, or Cursor to implement this pattern.
Build me a self-consistency system for LLM reasoning. Architecture: implement a parallel generation module that creates N independent reasoning paths (default 5, temperature 0.7) with different random seeds. Each path produces a structured output. An aggregation module then combines answers using type-specific strategies: majority voting for multiple-choice, median with variance for numeric answers, and theme clustering for free-text. Include an answer type detection module that automatically selects the aggregation strategy. Error handling: handle paths producing no answer by excluding them from aggregation. If all paths disagree (agreement rate below 0.3), return all answers with a low-confidence warning. Handle empty or invalid path outputs gracefully. Edge cases: handle ties in majority voting by using secondary criteria (path confidence). Handle numeric answers with units by normalizing before aggregation. Support answers that are lists or sets by using Jaccard similarity for clustering. Best practices: return consensus answer plus individual path outputs, aggregation results, confidence score (agreement rate), and which paths disagreed. Make N and temperature configurable. Log per-path generation time. Testing: unit test each aggregation strategy independently. Test with known consensus and dissensus scenarios. Verify confidence scores reflect agreement accurately. TypeScript.