Skip to content

Keep GFD product label within the label-value length limit#1880

Draft
rajathagasthya wants to merge 1 commit into
mainfrom
fix/gfd-product-label-length
Draft

Keep GFD product label within the label-value length limit#1880
rajathagasthya wants to merge 1 commit into
mainfrom
fix/gfd-product-label-length

Conversation

@rajathagasthya

Copy link
Copy Markdown
Contributor

Summary

gpu-feature-discovery builds the *.product label value by joining the GPU
model with the MIG marker and profile (<model>-MIG-<profile>). For GPUs with
long product names this can exceed the Kubernetes 63-character label-value
limit, and NFD then drops the label entirely — so nvidia.com/gpu.product
never lands on the node and product-based node selectors break.

Example (RTX PRO 6000 Blackwell + 1g.24gb), 67 characters:

NVIDIA-RTX-PRO-6000-Blackwell-Max-Q-Workstation-Edition-MIG-1g.24gb

Change

Bound the generated product value to 63 characters in getProductName (the
single point all product labels flow through, so it covers full-GPU, MIG, the
-MIG-INVALID path, the -SHARED suffix, and both output backends).

The first part is the GPU model; everything after it (MIG, the profile,
SHARED) is discriminating, so only the model is truncated — the profile,
which distinguishes MIG labels, is always preserved. The truncated model has
any trailing -/./_ removed so the value remains a valid label value
(must end alphanumeric). Values already within the limit are returned
unchanged, so existing labels are byte-identical.

Example for the case above:

full   = "NVIDIA-...-Edition-MIG-1g.24gb"   (67, over limit)
suffix = "MIG-1g.24gb"                       (11)
room for model = 63 - (11 + 1) = 51
model  = "NVIDIA-RTX-PRO-6000-Blackwell-Max-Q-Workstation-Edi"   (first 51 chars)
result = "NVIDIA-RTX-PRO-6000-Blackwell-Max-Q-Workstation-Edi-MIG-1g.24gb"   (63)

A +me profile behaves the same and shows the trailing-separator trim:
...-Workstation-MIG-1g.24gb.me (the model's truncation point landed on a -,
which is trimmed).

Notes

  • Values change only for hardware where the label is currently absent (because
    NFD was dropping it), so no label applied today changes. The truncated form
    is deterministic and can be used in node selectors.

Resolves #1876

gpu-feature-discovery builds the product label value by joining the GPU
model with the MIG marker and profile (<model>-MIG-<profile>). For GPUs
with long product names this can exceed the Kubernetes 63-character
label-value limit, and NFD then drops the label, so nvidia.com/gpu.product
never lands on the node and product-based node selectors break.

Bound the value in getProductName, the single point all product labels
flow through. The first part is the GPU model and everything after it
(MIG, the profile, SHARED) is discriminating, so only the model is
truncated and the profile is preserved. The truncated model has any
trailing separator removed so the value remains a valid label value.
Values already within the limit are returned unchanged.

Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: GFD generates [nvidia.com/gpu.product](https://nvidia.com/gpu.product) label exceeding Kubernetes 63-character limit with long MIG profile names

1 participant