Align experimental KTOTrainer docstring and signature with DPOTrainer#6183
Align experimental KTOTrainer docstring and signature with DPOTrainer#6183qgallouedec wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7ef07980e6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| model: "str | PreTrainedModel | PeftModel", | ||
| ref_model: PreTrainedModel | None = None, | ||
| args: KTOConfig | None = None, | ||
| data_collator: DataCollator | None = None, |
There was a problem hiding this comment.
Preserve positional KTOTrainer argument order
This moves data_collator ahead of train_dataset without making the following arguments keyword-only. Any existing script using the previous positional signature, for example KTOTrainer(model, ref_model, args, train_dataset), now binds the dataset to data_collator and leaves train_dataset as None, so initialization raises ValueError("train_dataset is required"); the public trl.KTOTrainer wrapper also forwards positional args directly. Please keep the old positional slots or add a compatibility shim before reordering.
Useful? React with 👍 / 👎.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Make the experimental
KTOTrainermirrorDPOTraineras closely as possible.kto-mix-14kexample, unpaired-preference dataset type,KTOConfig, and theDataCollatorForUnpairedPreference/DataCollatorForVisionUnpairedPreferencedefaults).Exampleblock.__init__signature and docstring args to DPO's order:model, ref_model, args, data_collator, train_dataset, eval_dataset, processing_class, compute_metrics, callbacks, optimizers, peft_config.No behavior change. All call sites pass arguments by keyword (only
model/ref_modelare positional), so the reorder is safe.Note
Low Risk
Docstring and signature reorder only; no logic changes. Positional callers beyond model/ref_model could theoretically break, but the PR states keyword usage at call sites.
Overview
Documentation-only alignment for experimental
KTOTrainerwithDPOTrainer: no training or runtime behavior changes.The class docstring is expanded to match DPO’s structure—KTO-specific intro (paper link), a runnable example using
trl-lib/kto-mix-14k, and richerArgstext fordata_collator(default unpaired / vision collators), unpaired dataset formats,compute_metrics(includingbatch_eval_metrics), callbacks, and optimizers.__init__parameter order is reordered to mirror DPO:data_collatormoves before datasets;compute_metricsmoves beforecallbacks; type hints foroptimizersnow allowNonein the tuple elements. Call sites are expected to use keywords, so reordering is safe.Reviewed by Cursor Bugbot for commit a76851d. Bugbot is set up for automated code reviews on this repo. Configure here.