Support PEFT with Liger in DPO by albertvillanova · Pull Request #6159 · huggingface/trl

albertvillanova · 2026-06-24T07:26:12Z

Support PEFT models with use_liger_kernel=True in DPOTrainer.

This PR enables PEFT models (e.g. LoRA) to be used with use_liger_kernel=True in DPOTrainer, lifting the blanket NotImplementedError that previously blocked all PEFT+Liger combinations.

Motivation

The Liger fused DPO loss bypasses the model's forward() and multiplies hidden states by lm_head.weight directly. The previous guard raised NotImplementedError for any PEFT model, but this was too broad: the only genuinely incompatible case is when lm_head itself is wrapped by a PEFT adapter (e.g. "lm_head" in target_modules), because then lm_head.weight is the frozen base weight and the adapter delta is silently ignored. When lm_head is not adapted, PEFT+Liger works correctly.

Changes

Compatibility and Error Handling

Added a check to prevent using use_liger_kernel=True when a PEFT adapter is applied to lm_head, raising a clear error if this unsupported configuration is detected. This avoids silent failures where the head adapter would not be trained.
Imported BaseTunerLayer from peft.tuners.tuners_utils to enable the above compatibility check.

Model Unwrapping and Reference Handling

Add PEFT double-unwrap (model = model.base_model.model) in _compute_loss_liger before backbone resolution, mirroring GRPOTrainer._get_last_hidden_state
Handle self.ref_model is None in _compute_loss_liger (PEFT with no explicit reference model) by recovering reference behaviour via adapter disabling/switching, consistent with the existing _compute_loss logit path

Note

Medium Risk
Changes core DPO training loss paths for PEFT+Liger; incorrect reference or backbone unwrapping could skew gradients, though tests and explicit guards reduce silent failure risk.

Overview
DPOTrainer now allows use_liger_kernel=True with PEFT when the setup is actually safe, instead of rejecting every PEFT model.

The old blanket NotImplementedError is gone. Init-time ValueError checks block lm_head in target_modules (Liger reads frozen lm_head.weight and would skip head LoRA) and prompt-learning PEFT (Liger bypasses PeftModel.forward(), so virtual tokens never apply).

_compute_loss_liger unwraps PEFT via model.base_model.model, and when there is no separate ref_model it builds reference hidden states using the same use_adapter / ref-adapter path as the standard DPO loss.

Tests cover the new error cases and an end-to-end LoRA (no lm_head) + Liger training run.

^{Reviewed by Cursor Bugbot for commit 5cb1748. Bugbot is set up for automated code reviews on this repo. Configure here.}

…ref_model

bot-ci-comment · 2026-06-24T07:29:12Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copilot

Pull request overview

Enables using use_liger_kernel=True with PEFT (e.g., LoRA) in DPOTrainer by replacing the previous blanket rejection with targeted compatibility checks and by extending the Liger reference-computation path to work when ref_model is not instantiated for PEFT.

Changes:

Replace the PEFT+Liger blanket NotImplementedError with a targeted validation that rejects PEFT adapters applied to lm_head (to avoid silently ignoring head adapters in the fused loss path).
Update _compute_loss_liger to unwrap PEFT models before backbone execution and to compute reference hidden states via adapter disabling/switching when ref_model is None.
Remove the test that asserted PEFT+Liger init always fails.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`trl/trainer/dpo_trainer.py`	Adds PEFT+Liger compatibility checks and extends Liger loss computation to support PEFT reference behavior.
`tests/test_dpo_trainer.py`	Removes the outdated PEFT+Liger init-failure test (needs replacement coverage for the new behavior).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

albertvillanova · 2026-06-26T06:10:45Z

-                peft_config=LoraConfig(),
-            )
-
    def test_train_with_iterable_dataset(self):


I'm adding tests.

albertvillanova · 2026-06-26T06:08:44Z

+                output_embeddings = model.get_output_embeddings()
+                if isinstance(output_embeddings, BaseTunerLayer):
+                    raise ValueError(
+                        "`use_liger_kernel=True` is incompatible with applying a PEFT adapter to `lm_head`. The Liger "
+                        "fused DPO loss reads `lm_head.weight` directly, so the adapter on the head is ignored and "
+                        "never trained. Either remove `'lm_head'` from your `target_modules`, or set "
+                        "`use_liger_kernel=False`."
+                    )


OK, I'm implementing the guard differently. Additionally, I'm checking if other trainers need this guard as well.

I opened:

Add prompt-learning guard for PEFT with Liger in GRPO #6186

albertvillanova added 2 commits June 24, 2026 09:12

Support peft in _compute_loss_liger with double-unwrap and None self.…

e3415db

…ref_model

Raise if lm_head is a BaseTunerLayer

230a45f

Remove test fails with peft and liger

e880a14

qgallouedec requested review from Copilot and kashif and removed request for Copilot June 24, 2026 13:36

Copilot started reviewing on behalf of qgallouedec June 24, 2026 13:37 View session

albertvillanova requested a review from Copilot June 25, 2026 08:58

Copilot started reviewing on behalf of albertvillanova June 25, 2026 08:59 View session

Merge remote-tracking branch 'upstream/main' into dpo-support-liger-peft

71f7817

Copilot AI reviewed Jun 25, 2026

View reviewed changes

albertvillanova added 3 commits June 26, 2026 06:16

Merge remote-tracking branch 'upstream/main' into dpo-support-liger-peft

24ec973

Add prompt-learning guard

f89e5d2

Add tests

5cb1748

albertvillanova mentioned this pull request Jun 26, 2026

Add prompt-learning guard for PEFT with Liger in GRPO #6186

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support PEFT with Liger in DPO#6159

Support PEFT with Liger in DPO#6159
albertvillanova wants to merge 7 commits into
mainfrom
dpo-support-liger-peft

albertvillanova commented Jun 24, 2026 •

edited by cursor Bot

Loading

Uh oh!

bot-ci-comment Bot commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

albertvillanova Jun 26, 2026

Uh oh!

albertvillanova Jun 26, 2026 •

edited

Loading

Uh oh!

albertvillanova Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

albertvillanova commented Jun 24, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Compatibility and Error Handling

Model Unwrapping and Reference Handling

Uh oh!

bot-ci-comment Bot commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

albertvillanova Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

albertvillanova Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertvillanova Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

albertvillanova commented Jun 24, 2026 •

edited by cursor Bot

Loading

albertvillanova Jun 26, 2026 •

edited

Loading