[Fix] Restore missing commas in extract_characters_regex prefix list#1560
Open
YoungZSh wants to merge 2 commits into
Open
[Fix] Restore missing commas in extract_characters_regex prefix list#1560YoungZSh wants to merge 2 commits into
YoungZSh wants to merge 2 commits into
Conversation
Two pairs of adjacent string literals in the answer_prefixes list inside extract_characters_regex are missing commas, so Python silently concatenates them at parse time. Four prefixes that were intended to be stripped before regex matching are therefore never stripped: - 'The best option is' + 'The correct option is' -> 'The best option isThe correct option is' - 'Best answer:' + 'Best option:' -> 'Best answer:Best option:' The two 'Best ...' prefixes are the harmful pair: the 'B' in 'Best' is itself an option letter, so a model that responds with e.g. 'Best answer: D' has the leading 'B' matched by re.search(r'[ABCDE]', s) and is scored as 'B' instead of 'D'. Add the two missing commas so each of the four prefixes is its own list element and gets stripped as intended.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two pairs of adjacent string literals in the
answer_prefixeslist insideextract_characters_regex(vlmeval/dataset/utils/multiple_choice.py) aremissing commas, so Python silently concatenates them at parse time. Four
prefixes that were intended to be stripped before regex matching are
therefore never stripped:
'The best option is'+'The correct option is'→
'The best option isThe correct option is''Best answer:'+'Best option:'→
'Best answer:Best option:'The two
Best ...prefixes are the harmful pair, because theBinBestis itself an option letter: a model that responds with e.g.
"Best answer: D"will have the leadingBmatched byre.search(r'[ABCDE]', s)and be scored asBinstead ofD.Fix
Add the two missing commas so each of the four prefixes is its own list
element and gets stripped as intended. No behaviour change for predictions
that did not already contain
"Best answer:"/"Best option:"prefixes.