Skip to content

Promote KTO to stable API#6175

Open
albertvillanova wants to merge 10 commits into
mainfrom
align-kto-dpo-stable
Open

Promote KTO to stable API#6175
albertvillanova wants to merge 10 commits into
mainfrom
align-kto-dpo-stable

Conversation

@albertvillanova

@albertvillanova albertvillanova commented Jun 25, 2026

Copy link
Copy Markdown
Member

Promote KTO to stable API.

Close #4786.

This PR promotes the KTOTrainer and KTOConfig from the experimental API (trl.experimental.kto) to the stable API (trl). It updates all relevant documentation, scripts, and tests to use the new import paths, and deprecates the old experimental import with a warning. This change simplifies usage for end users and signals that the KTO API is now considered stable.

Changes

API Promotion and Deprecation:

  • The experimental KTOConfig and KTOTrainer classes now inherit from the stable API and emit a deprecation warning if imported from the experimental path, indicating users should switch to from trl import ....

Documentation Updates:

  • docs/source/kto_trainer.md, docs/source/paper_index.md, docs/source/reducing_memory_usage.md, docs/source/speeding_up_training.md: All references and code examples now use the stable import path for KTOTrainer and KTOConfig.
  • The warning about KTO being experimental has been removed.

Code and Script Updates:

  • examples/scripts/kto.py, trl/scripts/kto.py: Updated imports to use from trl import KTOConfig, KTOTrainer and related types.

Test Updates:

  • tests/test_kto_trainer.py (renamed from tests/experimental/test_kto_trainer.py): Updated to import from the stable API and moved to the main test directory, reflecting the stable status.

Note

Medium Risk
Large code move with import-path churn for downstream users still on trl.experimental.kto, though behavior is intended to be unchanged via delegation. KTO training touches reference models, KL batching, and PEFT paths, so regressions would affect alignment workflows.

Overview
KTO (KTOTrainer, KTOConfig) is promoted from trl.experimental.kto to the stable trl / trl.trainer surface. The full trainer and config implementations now live under trl/trainer/; the experimental modules are thin subclasses that delegate to the stable types and emit a FutureWarning (removal planned in v2.0.0).

Docs, examples (examples/scripts/kto.py, trl/scripts/kto.py), and memory/speed guides now show from trl import KTOConfig, KTOTrainer. The KTO trainer doc drops the experimental-only warning block. tests/test_kto_trainer.py imports the stable API and trainer helpers from trl.trainer.kto_trainer.

Reviewed by Cursor Bugbot for commit 021039c. Bugbot is set up for automated code reviews on this repo. Configure here.

@bot-ci-comment

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec

Copy link
Copy Markdown
Member

I'm a bit worried that this would break the git history:

Both trl/trainer/kto_trainer.py and trl/experimental/kto/kto_trainer.py already exist on main. This PR swaps their contents rather than renaming a file, so git sees ~1500 lines deleted from one and added to the other as independent edits -> no rename is detected, and blame/log --follow on the promoted implementation will collapse to this one commit.

We could make it a real rename instead: git mv the experimental implementation onto trl/trainer/kto_trainer.py (removing the old shim first), then add the new experimental shim as a fresh file. That lets git detect the rename and preserves blame history. Same applies to kto_config.py.

@albertvillanova

albertvillanova commented Jun 26, 2026

Copy link
Copy Markdown
Member Author

@qgallouedec, thanks for pointing this out. I agree with the concern in principle: this promotion creates a large history boundary for kto_trainer.py/kto_config.py.

That said, since this PR will be squash-merged, I don’t think using git mv inside the branch would materially change the final repository history. Git does not store renames explicitly; rename detection is inferred from the final diff. Because both files already exist on main and both still exist after the PR, the squashed commit would still look like a large rewrite/content swap rather than a clean rename.

I tried the suggested approach anyway, but GitHub still shows additions/deletions rather than a rename.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KTO refactoring

2 participants