Add quantization_config trainer argument (streamline QLoRA)#6157
Add quantization_config trainer argument (streamline QLoRA)#6157qgallouedec wants to merge 11 commits into
quantization_config trainer argument (streamline QLoRA)#6157Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 45d6a2decd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| optimizers: tuple[torch.optim.Optimizer | None, torch.optim.lr_scheduler.LambdaLR | None] = (None, None), | ||
| optimizer_cls_and_kwargs: tuple[type[torch.optim.Optimizer], dict[str, Any]] | None = None, | ||
| preprocess_logits_for_metrics: Callable[[torch.Tensor, torch.Tensor], torch.Tensor] | None = None, | ||
| quantization_config: "BitsAndBytesConfig | None" = None, |
There was a problem hiding this comment.
Preserve positional peft_config compatibility
Adding quantization_config before the existing peft_config parameter shifts any current positional peft_config argument into quantization_config because this public constructor is not keyword-only. In existing calls that pass peft_config positionally, a model id will forward a PeftConfig object to from_pretrained(..., quantization_config=...) and fail, while an already-instantiated model will ignore it and train without the adapter; the same signature insertion appears in the other updated trainers. Put the new argument after peft_config or otherwise preserve the old positional layout.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
although not specifically disallowed, it would be very surprising that peft_config is used as positional arg
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0bb426c. Configure here.
sergiopaniego
left a comment
There was a problem hiding this comment.
the example scripts/notebooks in the examples/ folder should also be reviewed and updated
|
right @sergiopaniego , updated! |

Adds a
quantization_configargument toSFTTrainer,DPOTrainer,GRPOTrainer,RLOOTrainer, andRewardTrainer, so QLoRA no longer requires reaching intomodel_init_kwargs(or worse, manual model loading)After:
Compare with before (many ressources are written like this!):
Before (the "right" way, but not very popular):
It sits next to
peft_config(the other non-serializable QLoRA ingredient), flows intofrom_pretrained, and raises if also set inargs.model_init_kwargs.Changes
quantization_configarg on the five trainers above (+ docstrings).trl/scripts/{sft,dpo,grpo,rloo,reward}.pyCLIs now pass it directly instead of injecting intomodel_init_kwargs.model_init_kwargs["device_map"] = get_kbit_device_map()line: verified on 8×H100 that QLoRA trains identically with and without it, across transformers 4.56.2 (min supported) and 5.13; distributed runs overridedevice_maptoNoneanyway, and single-process runs auto-place quantized weights on the current CUDA device. See Remove redundantget_kbit_device_map()#6158docs/source/peft_integration.md.Note
Medium Risk
Touches core model-loading paths for all major trainers and reference-model creation; behavior change if callers relied on
get_kbit_device_map()inmodel_init_kwargs, though distributed runs still forcedevice_map=None.Overview
Adds a
quantization_configtrainer argument onSFTTrainer,DPOTrainer,GRPOTrainer,RLOOTrainer, andRewardTrainer, so QLoRA can pass a model id plusBitsAndBytesConfignext topeft_configinstead of pre-loading the model or stuffing quantization intoargs.model_init_kwargs. When the model is loaded from a string, the trainer mergesquantization_configintofrom_pretrainedkwargs, errors if it is also set inmodel_init_kwargs, and warns if a pre-instantiated model is passed withquantization_config. Reference models in DPO/GRPO/RLOO get the same quantization kwargs when built from a path.trl/scripts(sft,dpo,grpo,rloo,reward) and example scripts now passget_quantization_config(model_args)directly to the trainer and no longer injectdevice_mapviaget_kbit_device_map()intomodel_init_kwargs.Docs and Colab notebooks (
peft_integration.md, SFT/GRPO QLoRA notebooks) are updated to the trainer-driven loading pattern (model=model_id,quantization_config=...).Reviewed by Cursor Bugbot for commit 516d977. Bugbot is set up for automated code reviews on this repo. Configure here.