Align data collators across DPO / SFT / Reward / KTO by qgallouedec · Pull Request #6178 · huggingface/trl

qgallouedec · 2026-06-25T15:33:44Z

Consistency pass over the data collators so the same thing is written the same way everywhere. No behavior change.

Docstrings: unified intro lines, arg wording (max_length, truncation_mode, return_tensors) and the # special case for Qwen2.5-VL comment. Added the missing Examples blocks to the KTO collators.
Naming: KTO batch → output, ex → example, torch.int64 → torch.long.
Structure: KTO vision now uses inline "token_type_ids" in processed_prompts checks (dropping the has_tti/has_mm_tti locals) so its flush-left / truncate / output blocks match DPO/SFT word-for-word; simplified mm_token_type_ids handling to match.
Fixed a misplaced comma in the repeated BOS comment.

Deeper output-key naming/semantics (e.g. KTO's completion_input_ids holds prompt+completion) is left for a follow-up PR.

Note

Low Risk
Collator-only refactor with stated no behavior change; the mm_token_type_ids merge simplification could affect edge-case VLMs if completion tensors previously carried non-zero mm types.

Overview
Consistency pass across DPO, SFT, and KTO data collators so docstrings, naming, and vision collator control flow match; the PR description states no intended behavior change.

Documentation: Unified max_length, truncation_mode, and return_tensors wording; expanded KTO text and vision collator docs with Examples blocks; reordered DPO vision output key list; clarified SFT max_length as truncate-before-pad for text collators.

KTO text collator: Renames batch → output, ex → example; uses torch.long instead of explicit int64 on tensors; adds inline Truncate / Pad comments aligned with DPO.

Vision collators (KTO, DPO, SFT): Replaces has_tti / has_mm_tti locals with inline "token_type_ids" in processed_prompts checks; aligns flush-left, truncate, and output blocks with DPO/SFT; sets completion-side mm_token_type_ids via torch.zeros_like(completion_ids) instead of merging processor completion mm_token_type_ids (KTO KL path similarly); fixes BOS comment typo (BOS, twice → BOS twice); DPO adds a Truncate if necessary comment before the vision truncation block.

^{Reviewed by Cursor Bugbot for commit 640f53b. Bugbot is set up for automated code reviews on this repo. Configure here.}

bot-ci-comment · 2026-06-25T15:37:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Align data collators across DPO / SFT / Reward / KTO

ab57c31

Merge branch 'main' into align-collators

d86cf14

qgallouedec requested review from AmineDiro, albertvillanova and kashif June 25, 2026 22:32

Merge branch 'main' into align-collators

640f53b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Align data collators across DPO / SFT / Reward / KTO#6178

Align data collators across DPO / SFT / Reward / KTO#6178
qgallouedec wants to merge 3 commits into
mainfrom
align-collators

qgallouedec commented Jun 25, 2026 •

edited by cursor Bot

Loading

Uh oh!

bot-ci-comment Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

qgallouedec commented Jun 25, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bot-ci-comment Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qgallouedec commented Jun 25, 2026 •

edited by cursor Bot

Loading