wip by haoyangli0109 · Pull Request #1284 · ROCm/ATOM

haoyangli0109 · 2026-06-18T09:38:40Z

Currently, Atom does not support loading or inference for minimax-m3-mxfp8; this PR only implements online quantization for minimax-m3-mxfp8.
mxfp4 and ptpc_fp8 quantized inference both run smoothly, but ptpc_fp8 inference encounters a kernel error in aiter.

HIP_VISIBLE_DEVICES=4,5,6,7 \
python -m atom.entrypoints.openai_server  --model /shareddata/MiniMaxAI/MiniMax-M3-MXFP8   -tp 4   --trust-remote-code   --block-size 128   --server-port 01345   --online_quant_config '{"global_quant_config": "ptpc_fp8", "layer_quant_config": {"*expert*": "mxfp4"}, "exclude_layer": ["lm_head", "model.embed_tokens", "vision_tower", "multi_modal_projector", "patch_merge_mlp", "*block_sparse_moe.gate"]}'


lm_eval --model local-completions \
  --model_args "model=/shareddata/MiniMaxAI/MiniMax-M3-MXFP8,base_url=http://localhost:01345/v1/completions,num_concurrent=65,max_retries=3,tokenized_requests=False,trust_remote_code=True" \
  --tasks gsm8k \
  --num_fewshot 5

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.8810	±	0.0089
		strict-match	5	exact_match	↑	0.8787	±	0.0090

Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>

wip

0bc353f

Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>

haoyangli0109 marked this pull request as ready for review June 18, 2026 09:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip#1284

wip#1284
haoyangli0109 wants to merge 1 commit into
ROCm:wuhuikx/atom-m3-bf16-to-mainfrom
haoyangli0109:lhy/mxfp8_dequant

haoyangli0109 commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

haoyangli0109 commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

haoyangli0109 commented Jun 18, 2026 •

edited

Loading