Update Nemotron 3 Super B200 BF16 config by zuriz-nv · Pull Request #4621 · NVIDIA-NeMo/Megatron-Bridge

zuriz-nv · 2026-07-01T23:32:11Z

Update Nemotron3 Super B200 BF16 Config

Signed-off-by: Zuri Zheng <zuriz@nvidia.com>

copy-pr-bot · 2026-07-01T23:32:15Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

malay-nagda · 2026-07-03T07:34:44Z

/claude review

claude · 2026-07-03T07:40:11Z

Light review — LGTM

Small, self-contained perf-config update for Nemotron 3 Super on B200 BF16. No correctness issues found.

Observations (non-blocking):

Removing expert_model_parallel_size=64 from BASE_NEMOTRON_3_SUPER_CONFIG_B200 is a no-op: the parent BASE_NEMOTRON_3_SUPER_CONFIG already sets EP=64, so B200_FP8_MX_V1 (= base) and B200_NVFP4_V1 (replace from base, TP=2/EP=64) are functionally unchanged. Good cleanup.
B200_BF16_V1 now recomputes core_attn while attn is in cuda_graph_scope. Other configs in this file that recompute core_attn keep attn out of the graph scope (e.g. NANO H100 FP8 uses scope ["mamba"]). Worth confirming the measured B200 run captured cleanly with attn graphed and core_attn recomputed, since it differs from the sibling pattern.
cuda_graph_scope order here is ["mamba", "attn", ...] vs ["attn", "mamba", ...] elsewhere — cosmetic only, order is not functional.

Suggested test cases

nemotron_3_super_64gpu_b200_bf16_perf
nemotron_3_super_64gpu_b200_fp8_mx_perf
nemotron_3_super_64gpu_b200_nvfp4_perf

(The base BASE_NEMOTRON_3_SUPER_CONFIG_B200 was edited, so all three B200 precisions derived from it are in scope; BF16 changed materially, FP8-MX/NVFP4 should be net-neutral.)

malay-nagda · 2026-07-03T07:45:02Z

/ok to test 2e28c81

Update Nemotron 3 Super B200 BF16 config

2e28c81

Signed-off-by: Zuri Zheng <zuriz@nvidia.com>

zuriz-nv requested review from ko3n1g, malay-nagda, rhmukundan and scsudhakaran July 1, 2026 23:32

zuriz-nv self-assigned this Jul 1, 2026

zuriz-nv mentioned this pull request Jul 1, 2026

Improve Nemotron3 Super B200 BF16 Config #4474

Closed

yaoyu-33 added area:perf Performance optimizations and benchmarking feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer labels Jul 2, 2026

malay-nagda approved these changes Jul 3, 2026

View reviewed changes

malay-nagda enabled auto-merge (squash) July 3, 2026 07:43

malay-nagda added docs-only With great power comes great responsibility. 26.06.01 labels Jul 3, 2026

malay-nagda merged commit fd4b01c into r0.5.0 Jul 3, 2026
41 checks passed

malay-nagda deleted the zuriz/nemotron-b200-config-r0.5.0 branch July 3, 2026 07:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update Nemotron 3 Super B200 BF16 config#4621

Update Nemotron 3 Super B200 BF16 config#4621
malay-nagda merged 1 commit into
r0.5.0from
zuriz/nemotron-b200-config-r0.5.0

zuriz-nv commented Jul 1, 2026

Uh oh!

copy-pr-bot Bot commented Jul 1, 2026

Uh oh!

malay-nagda commented Jul 3, 2026

Uh oh!

claude Bot commented Jul 3, 2026

Uh oh!

malay-nagda commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

zuriz-nv commented Jul 1, 2026

Uh oh!

copy-pr-bot Bot commented Jul 1, 2026

Uh oh!

malay-nagda commented Jul 3, 2026

Uh oh!

claude Bot commented Jul 3, 2026

Uh oh!

malay-nagda commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants