Skip to content

wip#1284

Open
haoyangli0109 wants to merge 1 commit into
ROCm:wuhuikx/atom-m3-bf16-to-mainfrom
haoyangli0109:lhy/mxfp8_dequant
Open

wip#1284
haoyangli0109 wants to merge 1 commit into
ROCm:wuhuikx/atom-m3-bf16-to-mainfrom
haoyangli0109:lhy/mxfp8_dequant

Conversation

@haoyangli0109

@haoyangli0109 haoyangli0109 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor
  1. Currently, Atom does not support loading or inference for minimax-m3-mxfp8; this PR only implements online quantization for minimax-m3-mxfp8.
  2. mxfp4 and ptpc_fp8 quantized inference both run smoothly, but ptpc_fp8 inference encounters a kernel error in aiter.
HIP_VISIBLE_DEVICES=4,5,6,7 \
python -m atom.entrypoints.openai_server  --model /shareddata/MiniMaxAI/MiniMax-M3-MXFP8   -tp 4   --trust-remote-code   --block-size 128   --server-port 01345   --online_quant_config '{"global_quant_config": "ptpc_fp8", "layer_quant_config": {"*expert*": "mxfp4"}, "exclude_layer": ["lm_head", "model.embed_tokens", "vision_tower", "multi_modal_projector", "patch_merge_mlp", "*block_sparse_moe.gate"]}'


lm_eval --model local-completions \
  --model_args "model=/shareddata/MiniMaxAI/MiniMax-M3-MXFP8,base_url=http://localhost:01345/v1/completions,num_concurrent=65,max_retries=3,tokenized_requests=False,trust_remote_code=True" \
  --tasks gsm8k \
  --num_fewshot 5
Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.8810 ± 0.0089
strict-match 5 exact_match 0.8787 ± 0.0090

Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
@haoyangli0109 haoyangli0109 marked this pull request as ready for review June 18, 2026 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant