Add quantization_config trainer argument (streamline QLoRA)#6157
Add quantization_config trainer argument (streamline QLoRA)#6157qgallouedec wants to merge 8 commits into
quantization_config trainer argument (streamline QLoRA)#6157Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 45d6a2decd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| optimizers: tuple[torch.optim.Optimizer | None, torch.optim.lr_scheduler.LambdaLR | None] = (None, None), | ||
| optimizer_cls_and_kwargs: tuple[type[torch.optim.Optimizer], dict[str, Any]] | None = None, | ||
| preprocess_logits_for_metrics: Callable[[torch.Tensor, torch.Tensor], torch.Tensor] | None = None, | ||
| quantization_config: "BitsAndBytesConfig | None" = None, |
There was a problem hiding this comment.
Preserve positional peft_config compatibility
Adding quantization_config before the existing peft_config parameter shifts any current positional peft_config argument into quantization_config because this public constructor is not keyword-only. In existing calls that pass peft_config positionally, a model id will forward a PeftConfig object to from_pretrained(..., quantization_config=...) and fail, while an already-instantiated model will ignore it and train without the adapter; the same signature insertion appears in the other updated trainers. Put the new argument after peft_config or otherwise preserve the old positional layout.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
although not specifically disallowed, it would be very surprising that peft_config is used as positional arg
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0bb426c. Configure here.
sergiopaniego
left a comment
There was a problem hiding this comment.
the example scripts/notebooks in the examples/ folder should also be reviewed and updated
|
right @sergiopaniego , updated! |

Adds a
quantization_configargument toSFTTrainer,DPOTrainer,GRPOTrainer,RLOOTrainer, andRewardTrainer, so QLoRA no longer requires reaching intomodel_init_kwargs(or worse, manual model loading)After:
Compare with before (many ressources are written like this!):
Before (the "right" way, but not very popular):
It sits next to
peft_config(the other non-serializable QLoRA ingredient), flows intofrom_pretrained, and raises if also set inargs.model_init_kwargs.Changes
quantization_configarg on the five trainers above (+ docstrings).trl/scripts/{sft,dpo,grpo,rloo,reward}.pyCLIs now pass it directly instead of injecting intomodel_init_kwargs.model_init_kwargs["device_map"] = get_kbit_device_map()line: verified on 8×H100 that QLoRA trains identically with and without it, across transformers 4.56.2 (min supported) and 5.13; distributed runs overridedevice_maptoNoneanyway, and single-process runs auto-place quantized weights on the current CUDA device. See Remove redundantget_kbit_device_map()#6158docs/source/peft_integration.md.Note
Medium Risk
Touches model loading for all major TRL trainers and reference-model paths; behavior change for QLoRA users but scoped to optional loading kwargs with explicit conflict checks.
Overview
Adds a
quantization_configtrainer argument (alongsidepeft_config) onSFTTrainer,DPOTrainer,GRPOTrainer,RLOOTrainer, andRewardTrainer, so QLoRA can pass a model id string and let the trainer load/quantize viafrom_pretrainedinstead of pre-loading withAutoModelForCausalLM.When the model is loaded from a string, the trainer merges
quantization_configintomodel_init_kwargs(and the same for reference models where applicable), errors if it is also set inargs.model_init_kwargs, and warns if a pre-instantiated model is passed. CLI entrypoints and example scripts now passget_quantization_config(model_args)directly to the trainer and no longer injectquantization_config/get_kbit_device_map()intomodel_init_kwargs.Docs and Colab notebooks are updated to the new pattern (model id +
quantization_configon the trainer,model_init_kwargsfor attn/dtype where needed).Reviewed by Cursor Bugbot for commit 7be97c4. Bugbot is set up for automated code reviews on this repo. Configure here.