Skip to content

Add cluster-bootstrap 95% CIs for the perturbation recall tables#103

Draft
dangng2004 wants to merge 1 commit into
mainfrom
feat/recall-cis
Draft

Add cluster-bootstrap 95% CIs for the perturbation recall tables#103
dangng2004 wants to merge 1 commit into
mainfrom
feat/recall-cis

Conversation

@dangng2004

Copy link
Copy Markdown
Contributor

Split out of #98 (which is now AUC-only).

Adds benchmarks/perturbation/ci_recall.py: 95% confidence intervals around the §5 recall numbers via a cluster bootstrap over papers (resampling unit = paper, pooled recall = sum(detected)/sum(injected), 5000 draws, seed 42, percentile method). Point estimates match the perturbation scorer, e.g. GPT-5.5 progressive = 571/797 = 71.6%.

Also gitignores generated figures under benchmarks/perturbation/plots/.

Result JSONs are gitignored (large), so running requires the local data.

ci_recall.py reports the §5 recall tables with 95% bootstrap CIs over
papers (resampling unit = paper, pooled recall = sum(detected)/sum(injected)).
Point estimates match the perturbation scorer. Also gitignores generated
figures under perturbation/plots/.

Split out of the AUC CI PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant