Skip to content

[Fix] Restore missing commas in extract_characters_regex prefix list#1560

Open
YoungZSh wants to merge 2 commits into
open-compass:mainfrom
YoungZSh:fix/extract-characters-regex-missing-commas
Open

[Fix] Restore missing commas in extract_characters_regex prefix list#1560
YoungZSh wants to merge 2 commits into
open-compass:mainfrom
YoungZSh:fix/extract-characters-regex-missing-commas

Conversation

@YoungZSh

Copy link
Copy Markdown

Summary

Two pairs of adjacent string literals in the answer_prefixes list inside
extract_characters_regex (vlmeval/dataset/utils/multiple_choice.py) are
missing commas, so Python silently concatenates them at parse time. Four
prefixes that were intended to be stripped before regex matching are
therefore never stripped:

  • 'The best option is' + 'The correct option is'
    'The best option isThe correct option is'
  • 'Best answer:' + 'Best option:'
    'Best answer:Best option:'

The two Best ... prefixes are the harmful pair, because the B in Best
is itself an option letter: a model that responds with e.g.
"Best answer: D" will have the leading B matched by
re.search(r'[ABCDE]', s) and be scored as B instead of D.

Fix

Add the two missing commas so each of the four prefixes is its own list
element and gets stripped as intended. No behaviour change for predictions
that did not already contain "Best answer:" / "Best option:" prefixes.

YoungZSh and others added 2 commits May 30, 2026 17:07
Two pairs of adjacent string literals in the answer_prefixes list
inside extract_characters_regex are missing commas, so Python silently
concatenates them at parse time. Four prefixes that were intended to be
stripped before regex matching are therefore never stripped:

- 'The best option is' + 'The correct option is'
  -> 'The best option isThe correct option is'
- 'Best answer:' + 'Best option:'
  -> 'Best answer:Best option:'

The two 'Best ...' prefixes are the harmful pair: the 'B' in 'Best' is
itself an option letter, so a model that responds with e.g.
'Best answer: D' has the leading 'B' matched by re.search(r'[ABCDE]', s)
and is scored as 'B' instead of 'D'.

Add the two missing commas so each of the four prefixes is its own list
element and gets stripped as intended.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant