The current pull deduplication key is based on volumeName and mountID, not on model reference.
For static inline volumes, nodePublishVolumeStaticInlineVolume calls worker.PullModel(..., volumeName, "", reference, modelDir, ...), and Worker.pullModel uses:
inflightKey := fmt.Sprintf("pull-%s/%s", volumeName, mountID)
As a result, if two pods on the same node publish inline volumes with the same model reference but different volume names, they will not share the same in-flight pull.
That means on a single node:
two pods requesting the same model reference can still trigger duplicate pulls if they use different volume identities
there should be 2 level pull concurrency control, singleflight by model reference, semaphore for global download parallelism.
- Per-reference deduplication with singleflight
- If multiple requests on the same node pull the same model reference concurrently, only one real pull should run.
- Other requests should wait on the same in-flight result.
- Global concurrency limiting with a semaphore
- If multiple requests pull different model references concurrently, the node should limit how many model pulls can run at once.
- This should prevent uncontrolled fan-out of network and disk IO.
The current pull deduplication key is based on volumeName and mountID, not on model reference.
For static inline volumes, nodePublishVolumeStaticInlineVolume calls worker.PullModel(..., volumeName, "", reference, modelDir, ...), and Worker.pullModel uses:
inflightKey := fmt.Sprintf("pull-%s/%s", volumeName, mountID)As a result, if two pods on the same node publish inline volumes with the same model reference but different volume names, they will not share the same in-flight pull.
That means on a single node:
two pods requesting the same model reference can still trigger duplicate pulls if they use different volume identities
there should be 2 level pull concurrency control, singleflight by model reference, semaphore for global download parallelism.