Skip to content

Support PEFT with Liger in DPO#6159

Open
albertvillanova wants to merge 7 commits into
mainfrom
dpo-support-liger-peft
Open

Support PEFT with Liger in DPO#6159
albertvillanova wants to merge 7 commits into
mainfrom
dpo-support-liger-peft

Conversation

@albertvillanova

@albertvillanova albertvillanova commented Jun 24, 2026

Copy link
Copy Markdown
Member

Support PEFT models with use_liger_kernel=True in DPOTrainer.

This PR enables PEFT models (e.g. LoRA) to be used with use_liger_kernel=True in DPOTrainer, lifting the blanket NotImplementedError that previously blocked all PEFT+Liger combinations.

Motivation

The Liger fused DPO loss bypasses the model's forward() and multiplies hidden states by lm_head.weight directly. The previous guard raised NotImplementedError for any PEFT model, but this was too broad: the only genuinely incompatible case is when lm_head itself is wrapped by a PEFT adapter (e.g. "lm_head" in target_modules), because then lm_head.weight is the frozen base weight and the adapter delta is silently ignored. When lm_head is not adapted, PEFT+Liger works correctly.

Changes

Compatibility and Error Handling

  • Added a check to prevent using use_liger_kernel=True when a PEFT adapter is applied to lm_head, raising a clear error if this unsupported configuration is detected. This avoids silent failures where the head adapter would not be trained.
  • Imported BaseTunerLayer from peft.tuners.tuners_utils to enable the above compatibility check.

Model Unwrapping and Reference Handling

  • Add PEFT double-unwrap (model = model.base_model.model) in _compute_loss_liger before backbone resolution, mirroring GRPOTrainer._get_last_hidden_state
  • Handle self.ref_model is None in _compute_loss_liger (PEFT with no explicit reference model) by recovering reference behaviour via adapter disabling/switching, consistent with the existing _compute_loss logit path

Note

Medium Risk
Changes core DPO training loss paths for PEFT+Liger; incorrect reference or backbone unwrapping could skew gradients, though tests and explicit guards reduce silent failure risk.

Overview
DPOTrainer now allows use_liger_kernel=True with PEFT when the setup is actually safe, instead of rejecting every PEFT model.

The old blanket NotImplementedError is gone. Init-time ValueError checks block lm_head in target_modules (Liger reads frozen lm_head.weight and would skip head LoRA) and prompt-learning PEFT (Liger bypasses PeftModel.forward(), so virtual tokens never apply).

_compute_loss_liger unwraps PEFT via model.base_model.model, and when there is no separate ref_model it builds reference hidden states using the same use_adapter / ref-adapter path as the standard DPO loss.

Tests cover the new error cases and an end-to-end LoRA (no lm_head) + Liger training run.

Reviewed by Cursor Bugbot for commit 5cb1748. Bugbot is set up for automated code reviews on this repo. Configure here.

@bot-ci-comment

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables using use_liger_kernel=True with PEFT (e.g., LoRA) in DPOTrainer by replacing the previous blanket rejection with targeted compatibility checks and by extending the Liger reference-computation path to work when ref_model is not instantiated for PEFT.

Changes:

  • Replace the PEFT+Liger blanket NotImplementedError with a targeted validation that rejects PEFT adapters applied to lm_head (to avoid silently ignoring head adapters in the fused loss path).
  • Update _compute_loss_liger to unwrap PEFT models before backbone execution and to compute reference hidden states via adapter disabling/switching when ref_model is None.
  • Remove the test that asserted PEFT+Liger init always fails.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
trl/trainer/dpo_trainer.py Adds PEFT+Liger compatibility checks and extends Liger loss computation to support PEFT reference behavior.
tests/test_dpo_trainer.py Removes the outdated PEFT+Liger init-failure test (needs replacement coverage for the new behavior).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_dpo_trainer.py
peft_config=LoraConfig(),
)

def test_train_with_iterable_dataset(self):

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding tests.

Comment on lines +762 to +769
output_embeddings = model.get_output_embeddings()
if isinstance(output_embeddings, BaseTunerLayer):
raise ValueError(
"`use_liger_kernel=True` is incompatible with applying a PEFT adapter to `lm_head`. The Liger "
"fused DPO loss reads `lm_head.weight` directly, so the adapter on the head is ignored and "
"never trained. Either remove `'lm_head'` from your `target_modules`, or set "
"`use_liger_kernel=False`."
)

@albertvillanova albertvillanova Jun 26, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm implementing the guard differently. Additionally, I'm checking if other trainers need this guard as well.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants