Keep GFD product label within the label-value length limit#1880
Draft
rajathagasthya wants to merge 1 commit into
Draft
Keep GFD product label within the label-value length limit#1880rajathagasthya wants to merge 1 commit into
rajathagasthya wants to merge 1 commit into
Conversation
gpu-feature-discovery builds the product label value by joining the GPU model with the MIG marker and profile (<model>-MIG-<profile>). For GPUs with long product names this can exceed the Kubernetes 63-character label-value limit, and NFD then drops the label, so nvidia.com/gpu.product never lands on the node and product-based node selectors break. Bound the value in getProductName, the single point all product labels flow through. The first part is the GPU model and everything after it (MIG, the profile, SHARED) is discriminating, so only the model is truncated and the profile is preserved. The truncated model has any trailing separator removed so the value remains a valid label value. Values already within the limit are returned unchanged. Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gpu-feature-discovery builds the
*.productlabel value by joining the GPUmodel with the MIG marker and profile (
<model>-MIG-<profile>). For GPUs withlong product names this can exceed the Kubernetes 63-character label-value
limit, and NFD then drops the label entirely — so
nvidia.com/gpu.productnever lands on the node and product-based node selectors break.
Example (RTX PRO 6000 Blackwell +
1g.24gb), 67 characters:Change
Bound the generated product value to 63 characters in
getProductName(thesingle point all product labels flow through, so it covers full-GPU, MIG, the
-MIG-INVALIDpath, the-SHAREDsuffix, and both output backends).The first part is the GPU model; everything after it (
MIG, the profile,SHARED) is discriminating, so only the model is truncated — the profile,which distinguishes MIG labels, is always preserved. The truncated model has
any trailing
-/./_removed so the value remains a valid label value(must end alphanumeric). Values already within the limit are returned
unchanged, so existing labels are byte-identical.
Example for the case above:
A
+meprofile behaves the same and shows the trailing-separator trim:...-Workstation-MIG-1g.24gb.me(the model's truncation point landed on a-,which is trimmed).
Notes
NFD was dropping it), so no label applied today changes. The truncated form
is deterministic and can be used in node selectors.
Resolves #1876