Promote KTO to stable API#6175
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
I'm a bit worried that this would break the git history: Both We could make it a real rename instead: |
|
@qgallouedec, thanks for pointing this out. I agree with the concern in principle: this promotion creates a large history boundary for kto_trainer.py/kto_config.py. That said, since this PR will be squash-merged, I don’t think using I tried the suggested approach anyway, but GitHub still shows additions/deletions rather than a rename. |
Promote KTO to stable API.
Close #4786.
This PR promotes the
KTOTrainerandKTOConfigfrom the experimental API (trl.experimental.kto) to the stable API (trl). It updates all relevant documentation, scripts, and tests to use the new import paths, and deprecates the old experimental import with a warning. This change simplifies usage for end users and signals that the KTO API is now considered stable.Changes
API Promotion and Deprecation:
KTOConfigandKTOTrainerclasses now inherit from the stable API and emit a deprecation warning if imported from the experimental path, indicating users should switch tofrom trl import ....Documentation Updates:
docs/source/kto_trainer.md,docs/source/paper_index.md,docs/source/reducing_memory_usage.md,docs/source/speeding_up_training.md: All references and code examples now use the stable import path forKTOTrainerandKTOConfig.Code and Script Updates:
examples/scripts/kto.py,trl/scripts/kto.py: Updated imports to usefrom trl import KTOConfig, KTOTrainerand related types.Test Updates:
tests/test_kto_trainer.py(renamed fromtests/experimental/test_kto_trainer.py): Updated to import from the stable API and moved to the main test directory, reflecting the stable status.Note
Medium Risk
Large code move with import-path churn for downstream users still on
trl.experimental.kto, though behavior is intended to be unchanged via delegation. KTO training touches reference models, KL batching, and PEFT paths, so regressions would affect alignment workflows.Overview
KTO (
KTOTrainer,KTOConfig) is promoted fromtrl.experimental.ktoto the stabletrl/trl.trainersurface. The full trainer and config implementations now live undertrl/trainer/; the experimental modules are thin subclasses that delegate to the stable types and emit aFutureWarning(removal planned in v2.0.0).Docs, examples (
examples/scripts/kto.py,trl/scripts/kto.py), and memory/speed guides now showfrom trl import KTOConfig, KTOTrainer. The KTO trainer doc drops the experimental-only warning block.tests/test_kto_trainer.pyimports the stable API and trainer helpers fromtrl.trainer.kto_trainer.Reviewed by Cursor Bugbot for commit 021039c. Bugbot is set up for automated code reviews on this repo. Configure here.