Skip to content

Enhance SciDocBench evaluation handling#1574

Open
yuhangzang wants to merge 6 commits into
open-compass:mainfrom
yuhangzang:main
Open

Enhance SciDocBench evaluation handling#1574
yuhangzang wants to merge 6 commits into
open-compass:mainfrom
yuhangzang:main

Conversation

@yuhangzang

Copy link
Copy Markdown
Collaborator
  • Strip thinking tags for answer scoring while preserving raw predictions for reasoning review
  • Support segmented image/text prompts in SciDocBench examples
  • Make evaluation item serialization JSON-safe
  • Add major-category score summaries

yuhangzang and others added 6 commits April 21, 2026 12:22
- Add SciDocBench dataset with VQA-style evaluation
- Skip DATASET_MD5 hash check to accommodate TSV updates
- Decouple reasoning evaluation from answer score
- Add SciDocBench dataset with VQA-style evaluation
- Skip DATASET_MD5 hash check to accommodate TSV updates
- Decouple reasoning evaluation from answer score
- Strip thinking tags for answer scoring while preserving raw predictions for reasoning review
- Support segmented image/text prompts in SciDocBench examples
- Make evaluation item serialization JSON-safe
- Add major-category score summaries
@mzr1996 mzr1996 added this pull request to the merge queue Jun 12, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant