Oversight: Medium
- · oversight: Medium
Format-Specificity of Error Awareness Is Model-Dependent
Token-level error-awareness probes read a model's next-token distribution at the moment it would commit to a statement and ask whether the model knows the statement is wrong.
- · oversight: Medium
Optimizing the Answer, Hiding the Reason
Chain-of-thought monitoring is one of the few interpretability tools that scales with capability, but it only works if a model's stated reasoning reflects the computation that produced its answer.