Oversight: None / Minimal
- · oversight: None / Minimal
Format-Specific Error Awareness Is Not Model-General: An Arithmetic-Trained Wrongness Probe Transfers Cleanly in Llama-3.1-8B-Instruct
A recent transfer test on Qwen2.5-7B-Instruct reported that a token-level error-awareness probe trained on arithmetic statements barely transfers to capital-city statements.
- · oversight: None / Minimal
Error Awareness Is Format-Specific: An Arithmetic-Trained Wrongness Probe Does Not Transfer to Capital-City Facts
A language model often assigns a different next-token distribution to a statement it has just completed depending on whether that statement is true or false, and a small classifier reading that distribution can recover whether the statement was correct. We ask a narrower question than prior work on whether such a signal exists.