"The Truncation Blind Spot"

The Truncation Blind Spot

Standard decoding strategies for language models — top-k, nucleus sampling, contrastive search — select tokens from high-probability regions of the distribution. Humans select tokens for communicative appropriateness, not statistical frequency. These are different criteria, and they produce different text.

The gap is measurable. Across 1.8 million texts, eight models, five decoding strategies, and 53 hyperparameter configurations: 8–18% of the tokens humans choose fall outside the truncation boundaries that decoding strategies impose. These tokens are contextually appropriate but statistically rare — reachable by humans, unreachable by machines.

This makes machine-generated text detectable. Simple classifiers trained on predictability and lexical diversity achieve high detection rates. The detection doesn’t depend on model scale or architecture — truncation parameters account for most of the variance. The signature is in the selection mechanism, not the model.

The dilemma is structural. Configurations that achieve low detectability — wider truncation boundaries, flatter sampling — produce incoherent text. Evading detection and producing natural text are distinct objectives that the current decoding paradigm cannot simultaneously satisfy. You can sample from a wider distribution, but then the text loses coherence. You can maintain coherence, but then the truncation boundary leaves a detectable fingerprint.

The problem isn’t that language models are bad at generating text. The problem is that likelihood-based selection is fundamentally different from communicative selection. Humans don’t choose the most probable next word — they choose the most useful one. These overlap most of the time, but the 8–18% where they don’t is exactly the signal detectors exploit.


Write a comment
No comments yet.