Deduplicate concurrent pulls for identical model references across pods with node-level singleflight

The current pull deduplication key is based on volumeName and mountID, not on model reference.
For static inline volumes, nodePublishVolumeStaticInlineVolume calls worker.PullModel(..., volumeName, "", reference, modelDir, ...), and Worker.pullModel uses:

`inflightKey := fmt.Sprintf("pull-%s/%s", volumeName, mountID)`

As a result, if two pods on the same node publish inline volumes with the same model reference but different volume names, they will not share the same in-flight pull.

That means on a single node:
two pods requesting the same model reference can still trigger duplicate pulls if they use different volume identities

there should be 2 level pull concurrency control, singleflight by model reference, semaphore for global download parallelism.
 
1. Per-reference deduplication with singleflight      
      - If multiple requests on the same node pull the same model reference concurrently, only one real pull should run.
      - Other requests should wait on the same in-flight result.
2. Global concurrency limiting with a semaphore
      - If multiple requests pull different model references concurrently, the node should limit how many model pulls can run at once.
      - This should prevent uncontrolled fan-out of network and disk IO.






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deduplicate concurrent pulls for identical model references across pods with node-level singleflight #35

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Deduplicate concurrent pulls for identical model references across pods with node-level singleflight #35

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions