"The Entropy Trajectory"

The Entropy Trajectory

When a language model reasons step by step, it maintains a distribution over possible answers at each step. This distribution has an entropy — a measure of how uncertain the model is. The total entropy reduction from first step to last tells you something about the model’s confidence. But it doesn’t tell you whether the model will get the right answer.

The shape of the trajectory does.

A monotone entropy trajectory — one where uncertainty decreases at every step — predicts 68.8% accuracy. A non-monotone trajectory — where uncertainty increases at any step — predicts 46.8%. The gap is large and statistically significant. Total entropy reduction, by contrast, has no predictive power. Two reasoning chains can reduce entropy by the same total amount, but the one that does so smoothly is far more likely to be correct.

Performance degrades with the number of violations. Zero violations: 68.8%. One: declining. Two: 28.6%. Each moment where the model becomes more confused mid-reasoning is a signal that something has gone wrong — not a temporary detour but a structural failure in the reasoning process.

The practical advantage is efficiency. Monitoring entropy trajectory requires approximately 1,500 tokens per question — one-eighth the cost of sampling 40 independent chains for self-consistency. The structural signal is cheaper to extract than the statistical one.

The deeper point: aggregate measures (total reduction, final confidence) lose information that sequential measures (step-by-step trajectory) preserve. A correct reasoning chain resolves uncertainty monotonically because each step genuinely constrains the answer space. A wrong chain wanders — it discovers contradictions, backtracks, introduces new uncertainty. The shape of the path is evidence about the path’s validity.


Write a comment
No comments yet.