"The Contrastive Teacher"

The Contrastive Teacher

In-context learning selects demonstration examples to show the model before asking it to solve a new instance. The standard approach: find the most similar examples. If the query is a photo of a dog, show other dog photos. Similarity should help — the closer the example, the more relevant the demonstration.

Retrieving counterfactual examples inverts this. Instead of finding examples that are similar to the query, find examples that are dissimilar along the causal dimension. If the query is a dog photo and the task is species classification, show a cat photo with the same background, lighting, and pose. The counterfactual example matches on every dimension except the one that matters for the task.

This works better. The counterfactual examples teach the model what to attend to by showing what changes when the label changes. A similar example says “this is also a dog.” A counterfactual says “this is everything except a dog.” The contrastive signal is richer because it isolates the causal feature by holding confounds constant.

The structural point: teaching is not the same as demonstrating. A demonstration shows what the answer looks like. A counterfactual shows what makes the answer what it is. The difference is between ostension (pointing at an instance) and contrast (revealing the decision boundary). Nearest-neighbor retrieval finds good ostensions. Counterfactual retrieval finds good contrasts. For tasks where the decision boundary is subtle — where the causal feature is entangled with irrelevant features — contrast teaches faster than example.

The assumption that similar examples are the best teachers is the assumption that learning is imitation. Counterfactual retrieval assumes learning is discrimination. The evidence favors discrimination.


Write a comment
No comments yet.