Skip to content

mla#1280

Open
feifei14119 wants to merge 6 commits into
mainfrom
feiw/pr/mla2
Open

mla#1280
feifei14119 wants to merge 6 commits into
mainfrom
feiw/pr/mla2

Conversation

@feifei14119

Copy link
Copy Markdown

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings June 18, 2026 08:25

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates ATOM’s MLA attention stack to support/use segmented MLA KV-cache kernels and a configurable MLA page size, and propagates the new fused “_seg” kernel entrypoints into vLLM/SGLang plugin integrations.

Changes:

  • Add ATOM_MLA_PAGE_SIZE env var and use it to configure MLA metadata builder block/page sizing.
  • Switch multiple call sites to segmented fused MLA cache-update kernels (*_mla_seg) and add segmented-layout handling/validation in MLAAttention.
  • Adjust MLA decode/prefill paths to pass/use the actual KV cache page size (instead of implicitly assuming 1).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
atom/utils/envs.py Adds ATOM_MLA_PAGE_SIZE env var for configuring MLA page/block sizing.
atom/model_ops/attentions/aiter_mla.py Uses ATOM_MLA_PAGE_SIZE to set the metadata builder’s block_size.
atom/model_ops/attention_mla.py Implements segmented KV-cache layout support, adds validation, adjusts page-size handling, and updates kernel call paths.
atom/plugin/vllm/attention/layer_mla.py Updates vLLM plugin calls to use segmented fused MLA cache-update kernel.
atom/plugin/sglang/models/deepseek_mla_attention.py Updates SGLang plugin call to use segmented fused MLA cache-update kernel wrapper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread atom/utils/envs.py
os.getenv("ATOM_USE_TRITON_MLA_SHUFFLE_KV", "0") == "1"
),
"ATOM_USE_TRITON_MOE": lambda: os.getenv("ATOM_USE_TRITON_MOE", "0") == "1",
"ATOM_MLA_PAGE_SIZE": lambda: int(os.getenv("ATOM_MLA_PAGE_SIZE", "1")),
Comment thread atom/model_ops/attention_mla.py Outdated
Comment on lines 842 to 846
# DEBUG(seg): zero-init instead of empty so any region the decode asm
# does not write shows up as 0 rather than garbage (isolates
# uninitialized-read bugs in the seg pass).
o = torch.zeros(
B,
Comment thread atom/model_ops/attention_mla.py Outdated
Comment on lines 942 to 946
# DEBUG(seg): zero-init instead of empty so any region the decode asm
# does not write shows up as 0 rather than garbage (isolates
# uninitialized-read bugs in the seg pass).
o = torch.zeros(
B,
Comment thread atom/model_ops/attention_mla.py Outdated
# ids at block granularity, so PAGE_SIZE must be the real KV cache
# block size for the kernel's page// and intra-page% addressing.
page_size = get_current_atom_config().kv_cache_block_size
logger.info("triton_mla decode: page_size=%d", page_size)
Comment thread atom/model_ops/attention_mla.py Outdated
Comment on lines +1201 to +1209
q_out = torch.zeros(
(
q_nope.shape[0],
self.num_heads,
_MLA_Q_OUT_PADDED_DIM,
),
dtype=attn_metadata.dtype_q,
device=q_nope.device,
)
Copilot AI review requested due to automatic review settings June 18, 2026 12:29

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Copilot AI review requested due to automatic review settings June 19, 2026 06:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Copilot AI review requested due to automatic review settings June 19, 2026 12:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Copilot AI review requested due to automatic review settings June 19, 2026 13:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants