-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Add entropy regularization to GRPO #6140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
albertvillanova
wants to merge
42
commits into
main
Choose a base branch
from
worktree-fix-3320
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+369
−0
Open
Changes from 15 commits
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
ac50a11
Add fields to GRPOConfig
albertvillanova dcaaf67
Add init fields to GRPOTrainer
albertvillanova 0f6306e
Update _compute_loss
albertvillanova 9b1cc65
Add checkpoint persistence
albertvillanova e944713
Update GRPO docs
albertvillanova f47d5a5
Add tests
albertvillanova 2484e70
Address issues from review
albertvillanova 4507747
Fix wrong entropy for adaptive control
albertvillanova 9b70a4a
Fix Liger skips adaptive entropy guard
albertvillanova 9d79e4a
Fix inconsistent inequality
albertvillanova 46c8a64
Fix mean reduction with sum-count-divide
albertvillanova 3f7a669
Set _last_world_entropy at init
albertvillanova a05c979
Cache world_entropy at sync point and use that cached value for apply…
albertvillanova fe03dd1
Persist also _last_world_entropy
albertvillanova f099349
Add paper_index entry
albertvillanova 5288cd5
Capture the pure policy loss before normalization
albertvillanova 03f4208
Fix luspo loss
albertvillanova dbc0c75
Gate policy_loss logging and align style
albertvillanova 391da7a
Merge remote-tracking branch 'upstream/main' into worktree-fix-3320
albertvillanova 506fbf9
Fix entropy state written to wrong path
albertvillanova 8a6b53d
Fix is_world_process_zero() vs args.should_save guard mismatch
albertvillanova 474b30c
Update docs: policy_loss only logged inside entropy block
albertvillanova a0b9ec6
Log entropy_coef only when sync_gradients=True
albertvillanova 608b1e0
Add guard for entropy-loss dispatch matching policy-loss dispatch
albertvillanova 81841ad
Remove entropy_loss
albertvillanova bee5126
Gate on train mode to avoid entropy state update during eval
albertvillanova 5c442a0
Merge remote-tracking branch 'upstream/main' into worktree-fix-3320
albertvillanova 2f34d15
Fix entropy bonus ignores quantile mask
albertvillanova 806078d
Use effective_mask for the world_entropy all-reduce too
albertvillanova 2845ef4
Update docs
albertvillanova 2ed11c0
Use unified formula with mean per-token entropy of active tokens
albertvillanova 7f0562b
Merge remote-tracking branch 'upstream/main' into worktree-fix-3320
albertvillanova 76255d3
Make three-branch entropy-loss split
albertvillanova fc76d4b
Compute bonus from frozen state, update per optimizer step
albertvillanova bed5188
Fix "nearly always triggers" docs
albertvillanova 6e8f498
Add scale test and grad-accumulation adaptive test
albertvillanova 607d911
Fix dr_grpo entropy scale mismatch
albertvillanova 0cfad37
Accumulate to mean per-token entropy, independent of how each loss ty…
albertvillanova 8e05132
Update tests
albertvillanova f15e04a
Merge remote-tracking branch 'upstream/main' into worktree-fix-3320
albertvillanova bccd8eb
Add clarifying sentence
albertvillanova 0f3e145
Merge branch 'main' into worktree-fix-3320
qgallouedec File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.