Skip to content

Zoher15/Aha

Repository files navigation

Detecting Aha Moments in Chain-of-Thought Reasoning

Executive Summary

Problem

Chain-of-thought (CoT) reasoning improves model performance - whether elicited through prompting, trained via fine-tuning or RL, or aggregated via self-consistency. But after which token in the reasoning trace does the model realize the answer?

Detecting this "Aha moment" - the token step when the model's answer crystallizes - could unlock:

  • Efficiency: Terminate generation early once the Aha moment occurs, saving compute
  • Better training data: Select high-quality reasoning traces for self-training or distillation - either traces with strong Aha magnitude (good reasoning) or early Aha moments (efficient reasoning). Could also identify candidate tokens for gradient masking during RL.
  • Smarter decoding: Use Aha magnitude instead of token log-probability as the pruning criterion in beam search, discarding unpromising reasoning paths earlier

My approach: Track answer-token logits throughout CoT generation to detect when and how strongly the model's prediction emerges.

Key Takeaways

Core finding: Tracking answer-token logits (e.g., "a", "b", "c", "d" or "True", "False") throughout CoT generation reveals "Aha moments" - the answer logits spike sharply at key moments in the reasoning.

Comparison with controls:

Method Tracked Behavior Interpretation
Answer tokens Logits of a,b,c,d or True,False Structured spikes at key moments Tracks Aha moments
Control tokens Logits of 1,2,3,4 or Yes,No Random fluctuations No consistent pattern - signal is answer-specific
Top logits Logits of argmax token Spikes on token transitions Tracks next-token prediction, not final answer
Top probs Logprobs of argmax tokens Correlates with prompt tokens Reflects surface-level token likelihood

Implication: Answer-token logit fluctuations may identify Aha moments in CoT reasoning.

Key Experiments

Experiment 1: Logit dynamics across reasoning traces

I track the Δlogit (relative change in logits) at each generation step: (logit_t - logit_{t-1}) / |logit_{t-1}|. This measures how much the logit changed relative to its previous value. I visualize by color-coding tokens by Δlogit magnitude.

  • Model: Qwen3-30B-A3B-Instruct
  • Dataset: StrategyQA (True/False)
  • Methods tracked: All 4 from the comparison table

Example from StrategyQA: "Were Raphael's paintings influenced by the country of Guam?" (Answer: False)

Answer tokens The token "he" spikes sharply (Δlogit = 3801) right before stating the decisive fact: "he died in 1520" - before Europeans ever reached Guam (1521). This is the Aha moment.

Control tokens The largest spike (Δlogit = 300) is at the "1" in 14th century - not semantically meaningful.

Top logit The spike (Δlogit = 0.9) is at -ious in harmonious - just a token transition artifact.

Top probs The spike (Δlogit = 3.3) is at Raphael - simply echoing the question subject.

Finding: Answer-token logits spike at semantically meaningful reasoning moments; controls do not.

Experiment 2: Does Aha magnitude matter?

For the same question, I sample 10 reasoning traces and compare the highest vs lowest Δlogit magnitude.

Max Aha example Max sample (Δlogit = 3801): The reasoning explicitly states "he died in 1520" before European contact with Guam - a concrete, decisive fact.

Min Aha example Min sample (Δlogit = 52): Also concludes False, but never states the specific 1520 death date. Spikes occur at vague moments like "What is Guam" and "was not known to Europeans".

Finding: Higher Δlogit magnitude correlates with more concrete, factual reasoning.

About

Detecting Aha Moments in Chain-of-Thought Reasoning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors