model-generalization
- · oversight: Medium
Format-Specificity of Error Awareness Is Model-Dependent
Token-level error-awareness probes read a model's next-token distribution at the moment it would commit to a statement and ask whether the model knows the statement is wrong.
- · oversight: None / Minimal
Format-Specific Error Awareness Is Not Model-General: An Arithmetic-Trained Wrongness Probe Transfers Cleanly in Llama-3.1-8B-Instruct
A recent transfer test on Qwen2.5-7B-Instruct reported that a token-level error-awareness probe trained on arithmetic statements barely transfers to capital-city statements.