Skip to content

[Triton] DSV4 replace einsum with Triton BMM#1270

Open
k50112113 wants to merge 2 commits into
mainfrom
shaoclee/dsv4_einsum_bmm
Open

[Triton] DSV4 replace einsum with Triton BMM#1270
k50112113 wants to merge 2 commits into
mainfrom
shaoclee/dsv4_einsum_bmm

Conversation

@k50112113

Copy link
Copy Markdown
Collaborator

The tunned config is at AITER PR: ROCm/aiter#3784

GFX1250, DSV4-Flash, TP1

einsum:

============ Serving Benchmark Result ============
Successful requests:                     4         
Benchmark duration (s):                  68.67     
Total input tokens:                      3804      
Total generated tokens:                  3650      
Request throughput (req/s):              0.06      
Output token throughput (tok/s):         53.16     
Total Token throughput (tok/s):          108.56    
---------------Time to First Token----------------
Mean TTFT (ms):                          174.14    
Median TTFT (ms):                        176.35    
P99 TTFT (ms):                           178.23    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          18.64     
Median TPOT (ms):                        18.61     
P99 TPOT (ms):                           18.71     
---------------Inter-token Latency----------------
Mean ITL (ms):                           18.62     
Median ITL (ms):                         18.56     
P99 ITL (ms):                            21.27     
----------------End-to-end Latency----------------
Mean E2EL (ms):                          17165.68  
Median E2EL (ms):                        16982.13  
P99 E2EL (ms):                           19072.43  
==================================================

============ Serving Benchmark Result ============
Successful requests:                     256       
Benchmark duration (s):                  142.02    
Total input tokens:                      236166    
Total generated tokens:                  234891    
Request throughput (req/s):              1.80      
Output token throughput (tok/s):         1653.97   
Total Token throughput (tok/s):          3316.93   
---------------Time to First Token----------------
Mean TTFT (ms):                          1190.33   
Median TTFT (ms):                        281.47    
P99 TTFT (ms):                           6298.33   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          35.43     
Median TPOT (ms):                        35.96     
P99 TPOT (ms):                           39.05     
---------------Inter-token Latency----------------
Mean ITL (ms):                           35.45     
Median ITL (ms):                         27.72     
P99 ITL (ms):                            172.39    
----------------End-to-end Latency----------------
Mean E2EL (ms):                          33712.90  
Median E2EL (ms):                        32984.94  
P99 E2EL (ms):                           42550.89  
==================================================

Tuned triton BMM

============ Serving Benchmark Result ============
Successful requests:                     4         
Benchmark duration (s):                  50.88     
Total input tokens:                      3804      
Total generated tokens:                  3650      
Request throughput (req/s):              0.08      
Output token throughput (tok/s):         71.74     
Total Token throughput (tok/s):          146.51    
---------------Time to First Token----------------
Mean TTFT (ms):                          164.75    
Median TTFT (ms):                        168.53    
P99 TTFT (ms):                           171.25    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          13.77     
Median TPOT (ms):                        13.77     
P99 TPOT (ms):                           13.89     
---------------Inter-token Latency----------------
Mean ITL (ms):                           13.76     
Median ITL (ms):                         13.64     
P99 ITL (ms):                            20.01     
----------------End-to-end Latency----------------
Mean E2EL (ms):                          12718.34  
Median E2EL (ms):                        12602.44  
P99 E2EL (ms):                           14022.22  
==================================================

============ Serving Benchmark Result ============
Successful requests:                     256       
Benchmark duration (s):                  119.07    
Total input tokens:                      236166    
Total generated tokens:                  234891    
Request throughput (req/s):              2.15      
Output token throughput (tok/s):         1972.73   
Total Token throughput (tok/s):          3956.16   
---------------Time to First Token----------------
Mean TTFT (ms):                          858.36    
Median TTFT (ms):                        207.14    
P99 TTFT (ms):                           4395.17   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          29.88     
Median TPOT (ms):                        30.53     
P99 TPOT (ms):                           33.18     
---------------Inter-token Latency----------------
Mean ITL (ms):                           29.89     
Median ITL (ms):                         24.02     
P99 ITL (ms):                            145.91    
----------------End-to-end Latency----------------
Mean E2EL (ms):                          28286.96  
Median E2EL (ms):                        27832.29  
P99 E2EL (ms):                           35253.11  
==================================================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant