support TBO decode in Deepseek v4#1275
Open
ZhangLirong-amd wants to merge 3 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds Two-Batch Overlap (TBO) decode support for the Deepseek V4 stack, with the goal of keeping DP collectives and CUDAGraph/HIPGraph replay stable when decode is split into concurrent micro-batches.
Changes:
- Add DP-synchronized per-ubatch sizing/metadata in the TBO wrapper so MoE DP collectives use consistent per-ubatch token counts.
- Introduce Deepseek V4 decode-path adjustments for TBO (stable scratch buffers for graph capture, avoid using padded block_table rows, and disable async/dual-stream paths that are unsafe under concurrent ubatch threads).
- Add Deepseek V4 attention metadata support for building per-ubatch decode metadata into
ub{0,1}_*buffer sets, and tighten TBO capture gating tobs > 2.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
atom/utils/tbo/ubatch_wrapper.py |
Builds per-ubatch DPMetadata and uses DP-unified per-ubatch padded decode batch sizing; threads receive per-ubatch DP metadata. |
atom/models/deepseek_v4.py |
Adds per-ubatch fixed-address scratch for graph stability; fixes decode top-k sizing under padded TBO metadata; disables async compressor under TBO. |
atom/model_ops/moe.py |
Disables custom CA/IPC all-gather during TBO overlap to avoid cross-thread corruption/deadlock. |
atom/model_ops/module_dispatch_ops.py |
Disables dual-stream MoE forwarding while TBO overlap is active. |
atom/model_ops/attentions/deepseek_v4_attn.py |
Adds TBO decode metadata preparation + per-ubatch buffer allocation and prefixes to avoid cross-ubatch buffer sharing. |
atom/model_engine/model_runner.py |
Updates TBO capture gating from bs >= 2 to bs > 2 for decode. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1232
to
+1236
| from atom.utils.tbo.ubatch_wrapper import UBatchWrapper | ||
|
|
||
| ctx = get_forward_context() | ||
| padded_list = [ | ||
| UBatchWrapper._decode_ub_padded_bs(ctx, i, N, bs) for i in range(N) |
Comment on lines
+1185
to
+1190
| self._ubatch_decode_meta = None | ||
| if ( | ||
| self.model_runner.config.enable_tbo_decode | ||
| and scheduled_bs > 2 | ||
| and not batch.is_dummy_run | ||
| ): |
Comment on lines
+1232
to
+1237
| from atom.utils.tbo.ubatch_wrapper import UBatchWrapper | ||
|
|
||
| ctx = get_forward_context() | ||
| padded_list = [ | ||
| UBatchWrapper._decode_ub_padded_bs(ctx, i, N, bs) for i in range(N) | ||
| ] |
Comment on lines
+2314
to
2317
| # Create ubatch slices for TBO capture (need > 2 requests) | ||
| ubatch_slices = None | ||
| if is_tbo and self.config.enable_tbo_decode and bs >= 2: | ||
| if is_tbo and self.config.enable_tbo_decode and bs > 2: | ||
| ubatch_slices = maybe_create_ubatch_slices( |
Comment on lines
+251
to
256
| from atom.utils.tbo.ubatching import tbo_active | ||
|
|
||
| use_cag = use_cag and not tbo_active() | ||
| gathered_hidden_states = get_dp_group().all_gather( | ||
| padded_x, use_custom=use_cag, dim=0 | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Technical Details
Test Plan
Test Result
Submission Checklist