Skip to content

Align experimental KTOTrainer docstring and signature with DPOTrainer#6183

Open
qgallouedec wants to merge 2 commits into
mainfrom
align-docstring-and-args
Open

Align experimental KTOTrainer docstring and signature with DPOTrainer#6183
qgallouedec wants to merge 2 commits into
mainfrom
align-docstring-and-args

Conversation

@qgallouedec

@qgallouedec qgallouedec commented Jun 25, 2026

Copy link
Copy Markdown
Member

Make the experimental KTOTrainer mirror DPOTrainer as closely as possible.

  • Rewrite the class docstring to match DPO word-for-word, diverging only where KTO genuinely differs (KTO intro/paper, kto-mix-14k example, unpaired-preference dataset type, KTOConfig, and the DataCollatorForUnpairedPreference / DataCollatorForVisionUnpairedPreference defaults).
  • Add the DPO-style intro paragraph and runnable Example block.
  • Reorder the __init__ signature and docstring args to DPO's order: model, ref_model, args, data_collator, train_dataset, eval_dataset, processing_class, compute_metrics, callbacks, optimizers, peft_config.

No behavior change. All call sites pass arguments by keyword (only model/ref_model are positional), so the reorder is safe.


Note

Low Risk
Docstring and signature reorder only; no logic changes. Positional callers beyond model/ref_model could theoretically break, but the PR states keyword usage at call sites.

Overview
Documentation-only alignment for experimental KTOTrainer with DPOTrainer: no training or runtime behavior changes.

The class docstring is expanded to match DPO’s structure—KTO-specific intro (paper link), a runnable example using trl-lib/kto-mix-14k, and richer Args text for data_collator (default unpaired / vision collators), unpaired dataset formats, compute_metrics (including batch_eval_metrics), callbacks, and optimizers.

__init__ parameter order is reordered to mirror DPO: data_collator moves before datasets; compute_metrics moves before callbacks; type hints for optimizers now allow None in the tuple elements. Call sites are expected to use keywords, so reordering is safe.

Reviewed by Cursor Bugbot for commit a76851d. Bugbot is set up for automated code reviews on this repo. Configure here.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7ef07980e6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

model: "str | PreTrainedModel | PeftModel",
ref_model: PreTrainedModel | None = None,
args: KTOConfig | None = None,
data_collator: DataCollator | None = None,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve positional KTOTrainer argument order

This moves data_collator ahead of train_dataset without making the following arguments keyword-only. Any existing script using the previous positional signature, for example KTOTrainer(model, ref_model, args, train_dataset), now binds the dataset to data_collator and leaves train_dataset as None, so initialization raises ValueError("train_dataset is required"); the public trl.KTOTrainer wrapper also forwards positional args directly. Please keep the old positional slots or add a compatibility shim before reordering.

Useful? React with 👍 / 👎.

@bot-ci-comment

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant