Skip to content

Improve Nemotron3 Super B200 BF16 Config#4474

Closed
zuriz-nv wants to merge 1 commit into
mainfrom
zuriz/nemotron-b200-config
Closed

Improve Nemotron3 Super B200 BF16 Config#4474
zuriz-nv wants to merge 1 commit into
mainfrom
zuriz/nemotron-b200-config

Conversation

@zuriz-nv

Copy link
Copy Markdown

Update Nemotron3 Super B200 BF16 Config to use GB200 Config

  • Logic remains the same for B200 FP8 and NVFP4 Configs.
  • NEMOTRON_3_SUPER_PRETRAIN_CONFIG_B200_NVFP4_V1 already follows BASE_NEMOTRON_3_SUPER_CONFIG_GB200, meaning changing BASE_NEMOTRON_3_SUPER_CONFIG_B200 would allow BF16 and NVFP4 to directly use it, only needing to change NEMOTRON_3_SUPER_PRETRAIN_CONFIG_B200_FP8_MX_V1.
  • BASE_NEMOTRON_3_SUPER_CONFIG_B200's expert_model_parallel_size=64 is removed as it is already in BASE_NEMOTRON_3_SUPER_CONFIG

@copy-pr-bot

copy-pr-bot Bot commented Jun 23, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33 yaoyu-33 added area:perf Performance optimizations and benchmarking feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer labels Jun 23, 2026
@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

LGTM. The refactoring is correct. BF16 is intentionally changed to match the GB200 config (tp=2, CUDA graphs enabled, no recompute). FP8_MX and NVFP4 are preserved with identical effective configs. Removing expert_model_parallel_size=64 from BASE_B200 is correct since BASE_NEMOTRON_3_SUPER_CONFIG already sets it to 64. Suggested test cases: test_nemotron_3_super_perf_config_instantiation and test_nemotron_3_super_perf_config_nvfp4 (both existing, only exercise GB300, do not cover B200). No perf tests impacted.

@zuriz-nv zuriz-nv force-pushed the zuriz/nemotron-b200-config branch 2 times, most recently from fec1633 to b2d5de5 Compare June 23, 2026 20:57

@malay-nagda malay-nagda left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuriz-nv can you post the TFLOPs and step time you get for all 3 precisions with these changes?

@yaoyu-33 yaoyu-33 added waiting-on-customer Waiting on the original author to respond and removed needs-review PR is ready for code review and waiting on a reviewer labels Jun 26, 2026
Signed-off-by: Zuri Zheng <zuriz@nvidia.com>
@zuriz-nv zuriz-nv force-pushed the zuriz/nemotron-b200-config branch from b2d5de5 to 8a39cf8 Compare July 1, 2026 23:17
@zuriz-nv

zuriz-nv commented Jul 1, 2026

Copy link
Copy Markdown
Author

Closing this PR as it has been superseded by PR #4621

@zuriz-nv zuriz-nv closed this Jul 1, 2026
@zuriz-nv zuriz-nv deleted the zuriz/nemotron-b200-config branch July 1, 2026 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:perf Performance optimizations and benchmarking feature New capabilities, enhancements, or enablement work waiting-on-customer Waiting on the original author to respond

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants