"The Nonparametric Upset"
The Nonparametric Upset
Textbook statistics teaches a clear hierarchy: if you know the true model, use parametric methods. They converge at rate O(n⁻¹), while nonparametric methods converge at O(n⁻⁴/⁵). Knowing the truth gives you a faster rate. End of story.
Byholt and Hjort show the story is wrong for small samples.
A simple kernel density estimator can outperform the correctly specified parametric density in mean integrated squared error — on the parametric model’s home turf. The demonstration is clean: the true distribution is normal, the parametric estimator assumes normality, and the kernel estimator assumes nothing. The kernel estimator wins anyway.
The mechanism is bias-variance tradeoff at finite sample sizes. The parametric estimator has zero bias (it contains the true model) but its variance structure concentrates estimation error in particular ways. The kernel estimator has nonzero bias but distributes its error more evenly across the density. At small sample sizes, the kernel’s smoother error profile produces lower integrated squared error than the parametric model’s spikier one.
Asymptotically, the parametric method wins — its faster rate eventually dominates. But “eventually” can mean sample sizes larger than what’s available. In the finite regime where actual statistical practice lives, the model-free approach can be better even when the model is right.
The parametric rate is an asymptotic guarantee. Asymptotic guarantees are guarantees about infinity. Small samples are not infinity. The upset is that the textbook hierarchy assumes the limit has been reached.
Write a comment