api/config/v1: keep user-specified rename/devices for shared resources#1825
api/config/v1: keep user-specified rename/devices for shared resources#1825jonyhy96 wants to merge 3 commits into
Conversation
`disableResoureRenaming` currently resets the `Rename` and `Devices`
fields of every entry under `sharing.timeSlicing.resources` (and
`sharing.mps.resources`) to their defaults. That makes it impossible to:
* pin a shared resource to a specific subset of GPU UUIDs
(per-UUID time-slicing), and
* expose two resource names on the same node, e.g.
`nvidia.com/gpu` for full cards and `nvidia.com/gpu.shared` for
the time-sliced subset.
Preserve both fields and emit a warning instead, so that user intent
is honored. Downstream resource-manager code already supports
per-UUID device lists and custom rename targets, so no additional
plumbing is required.
Signed-off-by: haoyun <haoyun.96@bytedance.com>
|
@jonyhy96 Can you provide a corresponding unit test case here? Possibly one that shows how two resource names are used within a node. |
Adds TestDisableResoureRenamingKeepsUserSpec covering:
* per-UUID rename preserved when renameByDefault=false
* per-UUID devices list preserved when renameByDefault=true
* two resource names on a single node: nvidia.com/gpu for full cards
and nvidia.com/gpu.shared for a UUID-selected subset (the use case
requested by the reviewer)
* nil receiver is a no-op
This exercises the behavior change in disableResoureRenaming so that
user-specified rename / devices fields survive the call, enabling
per-UUID time-slicing alongside an unmodified nvidia.com/gpu resource
on the same node.
Signed-off-by: haoyun <haoyun.96@bytedance.com>
b5d1686 to
85954f6
Compare
|
@rajatchopra Thanks for the review! I've pushed a unit test TestDisableResoureRenamingKeepsUserSpec in api/config/v1/replicas_test.go (commit 85954f6) covering four cases:
All four sub-tests pass locally (go test ./api/config/v1/... is green). PTAL. |
disableResoureRenamingcurrently resets theRenameandDevicesfields of every entry undersharing.timeSlicing.resources(andsharing.mps.resources) to their defaults. That makes it impossible to:nvidia.com/gpufor full cards andnvidia.com/gpu.sharedfor the time-sliced subset.Preserve both fields and emit a warning instead, so that user intent is honored. Downstream resource-manager code already supports per-UUID device lists and custom rename targets, so no additional plumbing is required.