Add cluster-bootstrap 95% CIs for the perturbation recall tables by dangng2004 · Pull Request #103 · ChicagoHAI/OpenAIReview

dangng2004 · 2026-06-24T22:41:14Z

Split out of #98 (which is now AUC-only).

Adds benchmarks/perturbation/ci_recall.py: 95% confidence intervals around the §5 recall numbers via a cluster bootstrap over papers (resampling unit = paper, pooled recall = sum(detected)/sum(injected), 5000 draws, seed 42, percentile method). Point estimates match the perturbation scorer, e.g. GPT-5.5 progressive = 571/797 = 71.6%.

Also gitignores generated figures under benchmarks/perturbation/plots/.

Result JSONs are gitignored (large), so running requires the local data.

ci_recall.py reports the §5 recall tables with 95% bootstrap CIs over papers (resampling unit = paper, pooled recall = sum(detected)/sum(injected)). Point estimates match the perturbation scorer. Also gitignores generated figures under perturbation/plots/. Split out of the AUC CI PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

dangng2004 marked this pull request as draft June 24, 2026 22:42

dangng2004 mentioned this pull request Jun 24, 2026

Add cluster-bootstrap 95% CIs for the paper's AUC tables #98

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add cluster-bootstrap 95% CIs for the perturbation recall tables#103

Add cluster-bootstrap 95% CIs for the perturbation recall tables#103
dangng2004 wants to merge 1 commit into
mainfrom
feat/recall-cis

dangng2004 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dangng2004 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant