# Error-awareness probe portability — workshop synthesis summary Models: Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, Mistral-7B-Instruct-v0.3 Pipeline: single-forward-pass top-50 next-token distributions at the commitment position; top-50 |Cohen's d| feature selection on the training format only; logistic regression; AUC-ROC with 2000-resample percentile bootstrap 95% CIs, seed 42. Every number below is carried verbatim from a verified run artifact (empty recompute diff + independent rank-based reproduction); per-run sources are in curve.csv's source_artifact column and the project repository. ## FORWARD (arithmetic-trained probe on capital-city statements) Qwen cross-format AUC-ROC = 0.6490 (95% CI 0.5834-0.7075, n=300; in-format 0.9683, internal control 0.9905; re-executed in-pipeline, byte-identical to the published run) Llama cross-format AUC-ROC = 0.9817 (95% CI 0.9636-0.9956, n=300; in-format 0.9922, internal control 1.0000) Mistral cross-format AUC-ROC = 0.9795 (95% CI 0.9610-0.9936, n=300; in-format 0.9658, internal control 0.9962) ## REVERSE (capitals-trained probe on arithmetic statements) Qwen cross-format AUC-ROC = 0.6782 (95% CI 0.6324-0.7200, n=600; arithmetic-internal control 0.9751) Llama cross-format AUC-ROC = 0.4321 (95% CI 0.3867-0.4790, n=600; below chance; arithmetic-internal control 0.9813) ## HARD TARGET (arithmetic-trained probe on same-country confusable-city capitals) Qwen cross-format AUC-ROC = 0.3372 (95% CI 0.2651-0.4164, n=232; below chance; hard-internal control 0.9758) Llama cross-format AUC-ROC = 0.9891 (95% CI 0.9718-0.9996, n=232; hard-internal control 0.9881) ## Training-time predictability weight-weighted selected-token sign agreement vs observed transfer AUC, Spearman rho = 0.357 over 7 conditions; both below-chance conditions carry positive agreement (predictor misses anti-transfer). ## Verdict Probe portability belongs to the (model, training format, target task) triple; in-format validation (0.966 to 1.000 everywhere) predicts none of it.