test(jobs-manager): lock spawned-job RESOURCE_REQUESTS/LIMITS default at 8Gi by saadqbal · Pull Request #260 · tracebloc/client

saadqbal · 2026-06-16T09:03:41Z

What

Adds helm-unittest assertions pinning the rendered per-spawned-training-job RESOURCE_REQUESTS / RESOURCE_LIMITS env to cpu=2,memory=8Gi on both containers that receive it (api + pods-monitor), plus an operator-override case.

Why

The chart is the single effective source of truth for these values — jobs-manager-deployment.yaml always injects them with a templated "cpu=2,memory=8Gi" fallback, and client-runtime's jobs_manager.py only falls back to its own default when they're absent. Those two had silently drifted (the chart said 8Gi; the code's dead-code default was ~202Mi request / 20G limit). This guard renders the template and asserts the value so the drift can't recur unnoticed — the "render-and-assert test" called for in tracebloc/backend#745.

The two contains blocks per container also guard against one of the template's two RESOURCE_* blocks being edited without the other.

Test

helm unittest -f 'tests/jobs_manager_test.yaml' ./client   # 17 passed

Companion

The actual reconciliation (code fallback → 8Gi) lives in the runtime: tracebloc/client-runtime#111.

Refs tracebloc/backend#745

🤖 Generated with Claude Code

Note

Low Risk
Test-only change; no Helm template or runtime behavior is modified in this PR.

Overview
Adds helm-unittest coverage in jobs_manager_test.yaml so the jobs-manager chart cannot silently drift from the intended per-spawned-training-job resource env vars.

New cases render jobs-manager-deployment.yaml and assert that RESOURCE_REQUESTS and RESOURCE_LIMITS default to cpu=2,memory=8Gi on both the api (containers[0]) and pods-monitor (containers[1]) containers, each emitted once. A second test confirms operators can override those values via values.env.RESOURCE_REQUESTS / RESOURCE_LIMITS.

This is a contract test for tracebloc/backend#745: the chart is the effective source of truth for what client-runtime sees when spawning training Jobs (companion runtime change is elsewhere).

^{Reviewed by Cursor Bugbot for commit 431fe9d. Bugbot is set up for automated code reviews on this repo. Configure here.}

… at 8Gi Add helm-unittest assertions pinning the rendered per-spawned-job RESOURCE_REQUESTS / RESOURCE_LIMITS env to "cpu=2,memory=8Gi" on both containers (api + pods-monitor), plus an operator-override case. The chart is the single effective source of truth for these values (it always injects them), so this guards against silent drift between the chart and client-runtime's jobs_manager.py fallback — the drift reconciled in tracebloc/backend#745. Refs tracebloc/backend#745 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

LukasWodka · 2026-06-16T09:04:57Z

👋 Heads-up — Code review queue is at 19 / 8

Above the WIP limit. The team convention is to review existing PRs before opening new work.

Open PRs currently in Code review (oldest first):

averaging-service#115 — Release: staging → main (averaging correctness sweep → production) · author: @aptracebloc · no reviewer assigned
backend#806 — feat(#805): store & distribute contributor tokenizer as a model artifact (Task 1 — backend) · author: @shujaatTracebloc · reviewer: @aptracebloc
cli#78 — fix(dataset rm): delete staging files from a uid-65532 pod, not jobs-manager (bug: dataset rm cannot delete staging files — ingestor (uid 65534) vs jobs-manager uid mismatch, no shared fsGroup #259) · author: @LukasWodka · no reviewer assigned
cli#79 — chore(schema): re-sync vendored ingest.v1.json from data-ingestors master · author: @LukasWodka · no reviewer assigned
client-runtime#108 — fix(authz): match ingest table prefixes at a segment boundary (close cross-tenant straddle) · author: @LukasWodka · no reviewer assigned
client-runtime#111 — fix(resources): align jobs_manager fallback defaults with chart 8Gi default · author: @saadqbal · no reviewer assigned
data-ingestors#270 — docs(releasing): correct ingestor rollout — floating tag + imagePullPolicy=Always, not INGESTOR_IMAGE_DIGEST rewrite · author: @saadqbal · no reviewer assigned
data-ingestors#271 — refactor(P4a): stop reconfiguring root logging at import in 20 modules · author: @LukasWodka · no reviewer assigned
data-ingestors#272 — refactor(P4b): inject the run's Config into validators via map_validators · author: @LukasWodka · no reviewer assigned
data-ingestors#273 — refactor(P4c): thread Config into file_transfer; delete the env-var bridge · author: @LukasWodka · no reviewer assigned

Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.)

saadqbal self-assigned this Jun 16, 2026

saadqbal requested a review from aptracebloc June 16, 2026 09:09

shujaatTracebloc approved these changes Jun 16, 2026

View reviewed changes

aptracebloc approved these changes Jun 16, 2026

View reviewed changes

saadqbal merged commit ac78ba8 into develop Jun 16, 2026
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(jobs-manager): lock spawned-job RESOURCE_REQUESTS/LIMITS default at 8Gi#260

test(jobs-manager): lock spawned-job RESOURCE_REQUESTS/LIMITS default at 8Gi#260
saadqbal merged 1 commit into
developfrom
fix/job-resource-defaults-745

saadqbal commented Jun 16, 2026 •

edited by cursor Bot

Loading

Uh oh!

LukasWodka commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

saadqbal commented Jun 16, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Test

Companion

Uh oh!

LukasWodka commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

saadqbal commented Jun 16, 2026 •

edited by cursor Bot

Loading