Skip to content

chore(beep boop 🤖): Bump uv.lock (r0.5.0, mcore-core_r0.18.0) (2026-06-30)#4590

Closed
svcnvidia-nemo-ci wants to merge 1 commit into
r0.5.0from
bump-ci-container-2026-06-30-r0.5.0-core_r0.18.0
Closed

chore(beep boop 🤖): Bump uv.lock (r0.5.0, mcore-core_r0.18.0) (2026-06-30)#4590
svcnvidia-nemo-ci wants to merge 1 commit into
r0.5.0from
bump-ci-container-2026-06-30-r0.5.0-core_r0.18.0

Conversation

@svcnvidia-nemo-ci

Copy link
Copy Markdown
Contributor

🚀 PR to bump uv.lock in r0.5.0.

🤖 This PR will be merged automatically once CI passes.

…-06-30)

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@svcnvidia-nemo-ci

Copy link
Copy Markdown
Contributor Author

/ok to test 87f27ab

@copy-pr-bot

copy-pr-bot Bot commented Jun 30, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33

Copy link
Copy Markdown
Contributor

MCore bump auto-fix status for release-r0.5.0:

Classification: Bridge broke itself
Evidence: As of 2026-06-30 07:17 PDT, PR #4590 advances 3rdparty/Megatron-LM and .main.commit from d30c93ffae858b22eece3fa71c734c8f43161eff to 458c8d0ecafdf6d9e36771600d62ade27f2a67b7. The MCore range is two commits; 22d950d25a5f adopts TransformerEngine b9d690e042b1c4e455214e7dab65d6d3512c05d6, and 458c8d0ecafd is the MCore release lock bump. However, base branch r0.5.0 already adopted that same TransformerEngine revision through merged PR #4535 on 2026-06-26, while nvidia-modelopt==0.44.0rc5 remained unchanged. PR #4535 did not run the L2 Qwen quantization jobs. On PR #4590, H100 job 84293940807 and GB200 job 84293940813 both fail TestQwen3MoeQuantizationWorkflow::test_qwen3_moe_quantization_and_generation_with_expert_parallelism. TransformerEngine grouped_linear.py:1764 calls ModelOpt te_grouped_quantized_linear_fn; ModelOpt reads non_tensor_args[0] as the split sequence and raises TypeError: object of type 'bool' has no len(). The TransformerEngine compare confirms the live signature changed to include explicit m_splits before non_tensor_args. A 2026-06-30 search found no open Bridge or ModelOpt fix PR covering this failure, and current ModelOpt main still uses the incompatible non_tensor_args[0] path.
Fix PR: not opened
Guards: none added or removed. A Bridge-side monkeypatch of ModelOpt internals would be brittle and is not appropriate for an automated release-line fix.
Validation: Linear issue MB-618, PR #4590 metadata/diff/checks, both failed Actions job logs, MCore compare d30c93ffae858b22eece3fa71c734c8f43161eff...458c8d0ecafdf6d9e36771600d62ade27f2a67b7, TransformerEngine compare 4220403e831d29e93868f7793693ea83f6b8b05b...b9d690e042b1c4e455214e7dab65d6d3512c05d6, release PR #4535, and current ModelOpt source were inspected on 2026-06-30. No local or CW interactive test was run because no safe Bridge patch was produced.
Next action: maintainer decision needed. Preferred implementation is for ModelOpt/quantization owners to update te_grouped_quantized_linear_fn to use explicit m_splits when that parameter is present, while preserving the older non_tensor_args[0] path, then publish the fix and land any Bridge dependency update under dependency policy. If the TransformerEngine signature change is unintended, explicitly roll back the TE revision instead. Validate the ModelOpt TransformerEngine plugin tests across the old and new signatures, then rerun both the H100 and GB200 Qwen3 MoE quantization jobs on PR #4590.

@yaoyu-33

yaoyu-33 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Closing stale daily bump PR; keeping the latest 2026-07-02 bump PR open.

@yaoyu-33 yaoyu-33 closed this Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants