Align experimental KTOTrainer docstring and signature with DPOTrainer by qgallouedec · Pull Request #6183 · huggingface/trl

qgallouedec · 2026-06-25T16:36:29Z

Make the experimental KTOTrainer mirror DPOTrainer as closely as possible.

Rewrite the class docstring to match DPO word-for-word, diverging only where KTO genuinely differs (KTO intro/paper, kto-mix-14k example, unpaired-preference dataset type, KTOConfig, and the DataCollatorForUnpairedPreference / DataCollatorForVisionUnpairedPreference defaults).
Add the DPO-style intro paragraph and runnable Example block.
Reorder the __init__ signature and docstring args to DPO's order: model, ref_model, args, data_collator, train_dataset, eval_dataset, processing_class, compute_metrics, callbacks, optimizers, peft_config.

No behavior change. All call sites pass arguments by keyword (only model/ref_model are positional), so the reorder is safe.

Note

Low Risk
Docstring and signature reorder only; no logic changes. Positional callers beyond model/ref_model could theoretically break, but the PR states keyword usage at call sites.

Overview
Documentation-only alignment for experimental KTOTrainer with DPOTrainer: no training or runtime behavior changes.

The class docstring is expanded to match DPO’s structure—KTO-specific intro (paper link), a runnable example using trl-lib/kto-mix-14k, and richer Args text for data_collator (default unpaired / vision collators), unpaired dataset formats, compute_metrics (including batch_eval_metrics), callbacks, and optimizers.

__init__ parameter order is reordered to mirror DPO: data_collator moves before datasets; compute_metrics moves before callbacks; type hints for optimizers now allow None in the tuple elements. Call sites are expected to use keywords, so reordering is safe.

^{Reviewed by Cursor Bugbot for commit a76851d. Bugbot is set up for automated code reviews on this repo. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7ef07980e6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-25T16:38:26Z

        model: "str | PreTrainedModel | PeftModel",
        ref_model: PreTrainedModel | None = None,
        args: KTOConfig | None = None,
+        data_collator: DataCollator | None = None,


Preserve positional KTOTrainer argument order

This moves data_collator ahead of train_dataset without making the following arguments keyword-only. Any existing script using the previous positional signature, for example KTOTrainer(model, ref_model, args, train_dataset), now binds the dataset to data_collator and leaves train_dataset as None, so initialization raises ValueError("train_dataset is required"); the public trl.KTOTrainer wrapper also forwards positional args directly. Please keep the old positional slots or add a compatibility shim before reordering.

Useful? React with 👍 / 👎.

bot-ci-comment · 2026-06-25T16:39:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Align experimental KTOTrainer docstring and signature with DPOTrainer

7ef0798

chatgpt-codex-connector Bot reviewed Jun 25, 2026

View reviewed changes

qgallouedec requested review from AmineDiro, albertvillanova and kashif June 25, 2026 22:30

Merge branch 'main' into align-docstring-and-args

a76851d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Align experimental KTOTrainer docstring and signature with DPOTrainer#6183

Align experimental KTOTrainer docstring and signature with DPOTrainer#6183
qgallouedec wants to merge 2 commits into
mainfrom
align-docstring-and-args

qgallouedec commented Jun 25, 2026 •

edited by cursor Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

bot-ci-comment Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

qgallouedec commented Jun 25, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

bot-ci-comment Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qgallouedec commented Jun 25, 2026 •

edited by cursor Bot

Loading