From 1de58680820f1d001252f89e65f413288c269f5c Mon Sep 17 00:00:00 2001 From: connerlambden Date: Thu, 4 Jun 2026 22:40:48 -0600 Subject: [PATCH] Add BGPT REFUTE benchmark (scientific critique & calibration) --- README.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/README.md b/README.md index 670cf015d..45c4bfac8 100644 --- a/README.md +++ b/README.md @@ -158,3 +158,8 @@ If you use VLMEvalKit in your research or wish to refer to published OpenSource [github-license-shield]: https://img.shields.io/github/license/open-compass/VLMEvalKit?color=white&labelColor=black&style=flat-square [github-stars-link]: https://github.com/open-compass/VLMEvalKit/stargazers [github-stars-shield]: https://img.shields.io/github/stars/open-compass/VLMEvalKit?color=ffcb47&labelColor=black&style=flat-square + + +## Benchmarks + +- [REFUTE](https://huggingface.co/datasets/BGPT-OFFICIAL/refute) — Scientific critique & epistemic calibration on recent science summaries (Apache-2.0). [Leaderboard](https://huggingface.co/spaces/BGPT-OFFICIAL/refute-leaderboard) · [Technical report](https://huggingface.co/datasets/BGPT-OFFICIAL/refute/blob/main/TECHNICAL_REPORT.md) · [Integrators](https://huggingface.co/datasets/BGPT-OFFICIAL/refute/blob/main/INTEGRATORS.md)