Skip to content

merge build sparse block into topk#1282

Open
ganyi1996ppo wants to merge 1 commit into
wuhuikx/atom-m3-bf16-to-mainfrom
ganyi/merge_build_sparse_table
Open

merge build sparse block into topk#1282
ganyi1996ppo wants to merge 1 commit into
wuhuikx/atom-m3-bf16-to-mainfrom
ganyi/merge_build_sparse_table

Conversation

@ganyi1996ppo

@ganyi1996ppo ganyi1996ppo commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Motivation

deps:
ROCm/aiter#3795

launch script:

export AITER_LOG_LEVEL=WARNING ATOM_M3_SPARSE_USE_ASM_PA=1 HIP_VISIBLE_DEVICES=4,5,6,7
python -m atom.entrypoints.openai_server --model /workspace/shared/data/amd_int/models/MiniMax-M3-MXFP4 \
  -tp 4 --server-port 8013 --trust-remote-code --gpu-memory-utilization 0.8 --block-size 128 --no-enable_prefix_caching \
  --max-num-batched-tokens 32768 --max-model-len 32768 --max-num-seqs 128

accuracy:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9393|±  |0.0066|
|     |       |strict-match    |     5|exact_match|↑  |0.9401|±  |0.0065|

before:

============ Serving Benchmark Result ============
Successful requests:                     640       
Failed requests:                         0         
Maximum request concurrency:             64        
Benchmark duration (s):                  331.70    
Total input tokens:                      5419318   
Total generated tokens:                  657257    
Request throughput (req/s):              1.93      
Output token throughput (tok/s):         1981.47   
Peak output token throughput (tok/s):    4864.00   
Peak concurrent requests:                73.00     
Total token throughput (tok/s):          18319.41  
---------------Time to First Token----------------
Mean TTFT (ms):                          1121.23   
Median TTFT (ms):                        418.59    
P99 TTFT (ms):                           14163.40  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          30.37     
Median TPOT (ms):                        30.77     
P99 TPOT (ms):                           46.36     
---------------Inter-token Latency----------------
Mean ITL (ms):                           30.06     
Median ITL (ms):                         15.50     
P99 ITL (ms):                            376.82    
==================================================

after:

============ Serving Benchmark Result ============
Successful requests:                     640       
Failed requests:                         0         
Maximum request concurrency:             64        
Benchmark duration (s):                  320.99    
Total input tokens:                      5419318   
Total generated tokens:                  657257    
Request throughput (req/s):              1.99      
Output token throughput (tok/s):         2047.58   
Peak output token throughput (tok/s):    4800.00   
Peak concurrent requests:                69.00     
Total token throughput (tok/s):          18930.65  
---------------Time to First Token----------------
Mean TTFT (ms):                          1042.35   
Median TTFT (ms):                        382.99    
P99 TTFT (ms):                           12908.83  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          29.29     
Median TPOT (ms):                        29.69     
P99 TPOT (ms):                           39.90     
---------------Inter-token Latency----------------
Mean ITL (ms):                           29.05     
Median ITL (ms):                         15.70     
P99 ITL (ms):                            334.21    
==================================================

Technical Details

Test Plan

Test Result

Submission Checklist

Signed-off-by: ganyi <ygan@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant