Skip to content

Update Nemotron 3 Super B200 BF16 config#4621

Merged
malay-nagda merged 1 commit into
r0.5.0from
zuriz/nemotron-b200-config-r0.5.0
Jul 3, 2026
Merged

Update Nemotron 3 Super B200 BF16 config#4621
malay-nagda merged 1 commit into
r0.5.0from
zuriz/nemotron-b200-config-r0.5.0

Conversation

@zuriz-nv

@zuriz-nv zuriz-nv commented Jul 1, 2026

Copy link
Copy Markdown

Update Nemotron3 Super B200 BF16 Config

Signed-off-by: Zuri Zheng <zuriz@nvidia.com>
@zuriz-nv zuriz-nv self-assigned this Jul 1, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33 yaoyu-33 added area:perf Performance optimizations and benchmarking feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer labels Jul 2, 2026
@malay-nagda

Copy link
Copy Markdown
Contributor

/claude review

@claude

claude Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Light review — LGTM

Small, self-contained perf-config update for Nemotron 3 Super on B200 BF16. No correctness issues found.

Observations (non-blocking):

  • Removing expert_model_parallel_size=64 from BASE_NEMOTRON_3_SUPER_CONFIG_B200 is a no-op: the parent BASE_NEMOTRON_3_SUPER_CONFIG already sets EP=64, so B200_FP8_MX_V1 (= base) and B200_NVFP4_V1 (replace from base, TP=2/EP=64) are functionally unchanged. Good cleanup.
  • B200_BF16_V1 now recomputes core_attn while attn is in cuda_graph_scope. Other configs in this file that recompute core_attn keep attn out of the graph scope (e.g. NANO H100 FP8 uses scope ["mamba"]). Worth confirming the measured B200 run captured cleanly with attn graphed and core_attn recomputed, since it differs from the sibling pattern.
  • cuda_graph_scope order here is ["mamba", "attn", ...] vs ["attn", "mamba", ...] elsewhere — cosmetic only, order is not functional.

Suggested test cases

  • nemotron_3_super_64gpu_b200_bf16_perf
  • nemotron_3_super_64gpu_b200_fp8_mx_perf
  • nemotron_3_super_64gpu_b200_nvfp4_perf

(The base BASE_NEMOTRON_3_SUPER_CONFIG_B200 was edited, so all three B200 precisions derived from it are in scope; BF16 changed materially, FP8-MX/NVFP4 should be net-neutral.)

@malay-nagda malay-nagda enabled auto-merge (squash) July 3, 2026 07:43
@malay-nagda malay-nagda added docs-only With great power comes great responsibility. 26.06.01 labels Jul 3, 2026
@malay-nagda

Copy link
Copy Markdown
Contributor

/ok to test 2e28c81

@malay-nagda malay-nagda merged commit fd4b01c into r0.5.0 Jul 3, 2026
41 checks passed
@malay-nagda malay-nagda deleted the zuriz/nemotron-b200-config-r0.5.0 branch July 3, 2026 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

26.06.01 area:perf Performance optimizations and benchmarking docs-only With great power comes great responsibility. feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants