[Common] Update NCCL submodule to have the fix for MAX_SUPPORTED_TOKENS_PER_RANK#3150
[Common] Update NCCL submodule to have the fix for MAX_SUPPORTED_TOKENS_PER_RANK#3150phu0ngng wants to merge 8 commits into
Conversation
Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
Greptile SummaryThis PR bumps the
Confidence Score: 5/5Safe to merge — the submodule bump carries a targeted upstream fix, the setup.py reorder aligns build-time and runtime NCCL resolution, and the NVLink guards are defensive skip-on-absence checks. All changes are narrowly scoped: the submodule update is a single-commit upstream fix, the setup.py probe reordering preserves existing fallback paths and adds no new failure modes, and the NVLink shell guards only cause a test to skip gracefully rather than alter any test logic. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[_discover_nccl_home] --> B{NCCL_HOME\nenv var set?}
B -- yes --> C{nccl.h + libnccl\nexist at path?}
C -- yes --> D[Return NCCL_HOME]
C -- no --> E[Warning: NCCL_HOME\nset but invalid]
B -- no --> F[ldconfig -p probe\nnew: checked first]
E --> F
F -- found --> G[Return ldconfig path]
F -- not found --> H[Well-known prefix scan\n/opt/nvidia/nccl\n/usr/local/nccl\n/usr]
H -- found --> I[Return prefix path]
H -- not found --> J[pip wheel probe\nnvidia.nccl site-packages\nnew: last resort]
J -- found --> K[Return pip wheel path]
J -- not found --> L[RuntimeError:\nCould not locate NCCL]
style F fill:#d4edda,stroke:#28a745
style J fill:#fff3cd,stroke:#ffc107
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[_discover_nccl_home] --> B{NCCL_HOME\nenv var set?}
B -- yes --> C{nccl.h + libnccl\nexist at path?}
C -- yes --> D[Return NCCL_HOME]
C -- no --> E[Warning: NCCL_HOME\nset but invalid]
B -- no --> F[ldconfig -p probe\nnew: checked first]
E --> F
F -- found --> G[Return ldconfig path]
F -- not found --> H[Well-known prefix scan\n/opt/nvidia/nccl\n/usr/local/nccl\n/usr]
H -- found --> I[Return prefix path]
H -- not found --> J[pip wheel probe\nnvidia.nccl site-packages\nnew: last resort]
J -- found --> K[Return pip wheel path]
J -- not found --> L[RuntimeError:\nCould not locate NCCL]
style F fill:#d4edda,stroke:#28a745
style J fill:#fff3cd,stroke:#ffc107
Reviews (4): Last reviewed commit: "Merge branch 'main' into update_nccl" | Re-trigger Greptile |
|
/te-ci L1 |
Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
|
/te-ci L1 |
Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
|
/te-ci L1 |
Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
|
/te-ci L1 |
jberchtold-nvidia
left a comment
There was a problem hiding this comment.
LGTM, thanks!
Description
Update NCCL submodule to have the fix for MAX_SUPPORTED_TOKENS_PER_RANK
Type of change
Checklist: