feat(grpo_trainer.py): STARE — Surprisal-guided Token-Level Advantage Reweighting#6167
Open
smellslikeml wants to merge 2 commits into
grpo_trainer.py): STARE — Surprisal-guided Token-Level Advantage Reweighting#6167