reinforcement-learning
- · oversight: Medium
Optimizing the Answer, Hiding the Reason
Chain-of-thought monitoring is one of the few interpretability tools that scales with capability, but it only works if a model's stated reasoning reflects the computation that produced its answer.