Add `quantization_config` trainer argument (streamline QLoRA) by qgallouedec · Pull Request #6157 · huggingface/trl

qgallouedec · 2026-06-24T00:13:19Z

Adds a quantization_config argument to SFTTrainer, DPOTrainer, GRPOTrainer, RLOOTrainer, and RewardTrainer, so QLoRA no longer requires reaching into model_init_kwargs (or worse, manual model loading)

After:

SFTTrainer(
    model="meta-llama/Llama-2-7b-hf",
    quantization_config=BitsAndBytesConfig(load_in_4bit=True),
    peft_config=LoraConfig(),
    train_dataset=dataset,
)

Compare with before (many ressources are written like this!):

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=BitsAndBytesConfig(load_in_4bit=True),
    device_map="auto",
)
SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=LoraConfig(),
)

Before (the "right" way, but not very popular):

SFTTrainer(
    model="meta-llama/Llama-2-7b-hf",
    args=SFTConfig(model_init_kwargs={"quantization_config": BitsAndBytesConfig(load_in_4bit=True)}),
    peft_config=LoraConfig(),
    train_dataset=dataset,
)

It sits next to peft_config (the other non-serializable QLoRA ingredient), flows into from_pretrained, and raises if also set in args.model_init_kwargs.

Changes

New quantization_config arg on the five trainers above (+ docstrings).
The trl/scripts/{sft,dpo,grpo,rloo,reward}.py CLIs now pass it directly instead of injecting into model_init_kwargs.
This drops the redundant model_init_kwargs["device_map"] = get_kbit_device_map() line: verified on 8×H100 that QLoRA trains identically with and without it, across transformers 4.56.2 (min supported) and 5.13; distributed runs override device_map to None anyway, and single-process runs auto-place quantized weights on the current CUDA device. See Remove redundant get_kbit_device_map() #6158
Updated the QLoRA example in docs/source/peft_integration.md.

Note

Medium Risk
Touches model loading for all major TRL trainers and reference-model paths; behavior change for QLoRA users but scoped to optional loading kwargs with explicit conflict checks.

Overview
Adds a quantization_config trainer argument (alongside peft_config) on SFTTrainer, DPOTrainer, GRPOTrainer, RLOOTrainer, and RewardTrainer, so QLoRA can pass a model id string and let the trainer load/quantize via from_pretrained instead of pre-loading with AutoModelForCausalLM.

When the model is loaded from a string, the trainer merges quantization_config into model_init_kwargs (and the same for reference models where applicable), errors if it is also set in args.model_init_kwargs, and warns if a pre-instantiated model is passed. CLI entrypoints and example scripts now pass get_quantization_config(model_args) directly to the trainer and no longer inject quantization_config / get_kbit_device_map() into model_init_kwargs.

Docs and Colab notebooks are updated to the new pattern (model id + quantization_config on the trainer, model_init_kwargs for attn/dtype where needed).

^{Reviewed by Cursor Bugbot for commit 7be97c4. Bugbot is set up for automated code reviews on this repo. Configure here.}

bot-ci-comment · 2026-06-24T00:16:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 45d6a2decd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-24T00:17:49Z

        optimizers: tuple[torch.optim.Optimizer | None, torch.optim.lr_scheduler.LambdaLR | None] = (None, None),
        optimizer_cls_and_kwargs: tuple[type[torch.optim.Optimizer], dict[str, Any]] | None = None,
        preprocess_logits_for_metrics: Callable[[torch.Tensor, torch.Tensor], torch.Tensor] | None = None,
+        quantization_config: "BitsAndBytesConfig | None" = None,


Preserve positional peft_config compatibility

Adding quantization_config before the existing peft_config parameter shifts any current positional peft_config argument into quantization_config because this public constructor is not keyword-only. In existing calls that pass peft_config positionally, a model id will forward a PeftConfig object to from_pretrained(..., quantization_config=...) and fail, while an already-instantiated model will ignore it and train without the adapter; the same signature insertion appears in the other updated trainers. Put the new argument after peft_config or otherwise preserve the old positional layout.

Useful? React with 👍 / 👎.

although not specifically disallowed, it would be very surprising that peft_config is used as positional arg

…ument

cursor

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 0bb426c. Configure here.}

sergiopaniego

the example scripts/notebooks in the examples/ folder should also be reviewed and updated

qgallouedec · 2026-06-26T15:51:18Z

right @sergiopaniego , updated!

Add quantization_config trainer argument (streamline QLoRA)

45d6a2d

cursor Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread trl/trainer/dpo_trainer.py

chatgpt-codex-connector Bot reviewed Jun 24, 2026

View reviewed changes

style

38626f3

qgallouedec requested review from AmineDiro, albertvillanova and kashif June 24, 2026 03:39

cursor Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread trl/trainer/grpo_trainer.py

Clarify error message for quantization_config to prefer trainer arg…

0bb426c

…ument

cursor Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread trl/trainer/sft_trainer.py

qgallouedec added 2 commits June 24, 2026 13:01

Merge branch 'main' into native-quantization-config

261f974

Merge branch 'main' into native-quantization-config

65f79a9

sergiopaniego reviewed Jun 26, 2026

View reviewed changes

qgallouedec and others added 3 commits June 26, 2026 10:37

Merge branch 'main' into native-quantization-config

2e37125

fix quantization configuration handling in trainers and scripts

f6a660b

update notebooks

7be97c4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `quantization_config` trainer argument (streamline QLoRA)#6157

Add `quantization_config` trainer argument (streamline QLoRA)#6157
qgallouedec wants to merge 8 commits into
mainfrom
native-quantization-config

qgallouedec commented Jun 24, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

bot-ci-comment Bot commented Jun 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 24, 2026

Uh oh!

qgallouedec Jun 24, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

sergiopaniego left a comment

Uh oh!

qgallouedec commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

qgallouedec commented Jun 24, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

Uh oh!

bot-ci-comment Bot commented Jun 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

qgallouedec Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

qgallouedec commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qgallouedec commented Jun 24, 2026 •

edited by cursor Bot

Loading