bug: `dataset rm` cannot delete staging files — ingestor (uid 65534) vs jobs-manager uid mismatch, no shared fsGroup

## Summary

`tracebloc dataset rm <name>` drops the table but fails to delete the dataset's staging files on the shared PVC. The error is:

```
teardown incomplete — the table <schema>.<dataset> was dropped, but removing its files failed;
re-run `tracebloc dataset rm <dataset>` to remove the leftover files: removing PVC paths:
exec stream against tracebloc/<jobs-manager-pod>: command terminated with exit code 1
(rm: cannot remove '/data/shared/.tracebloc-staging/<dataset>/labels.csv': Permission denied)
```

The suggested "re-run" never succeeds — it fails on the same permission error every time. Orphaned staging files accumulate on the shared PVC while the dataset *appears* removed (the table is gone), masking the leak.

Reported during dataset ingestion/removal testing.

## Root cause (verified)

A UID mismatch with no `fsGroup` bridge:

- The ingestor Job writes staging files as **uid 65534** — `tracebloc/data-ingestors` `Dockerfile:55` → `USER 65534`.
- The teardown `rm -rf` is exec'd **inside the jobs-manager pod**, which runs as its image UID (not 65534):
  - `tracebloc/cli` `internal/push/teardown.go:91` builds the rm (`append([]string{"rm","-rf"}, plan.PVCPaths...)`), error wrap `:93`; exec stream `internal/push/stream.go:100`; user-facing wrap `internal/cli/dataset_rm.go:190-191`.
- The jobs-manager pod sets `runAsNonRoot: true` but **no `runAsUser` and no `fsGroup`** — `client` chart `client/templates/jobs-manager-deployment.yaml:30-33`.

A non-root UID that is not 65534 cannot delete uid-65534-owned files in a directory that is not group-writable → `Permission denied`. The cli comment at `internal/cli/dataset_rm.go:187` ("idempotent, so re-running completes the cleanup") assumes a *transient* failure; that assumption does not hold for a permission error, so the retry advice is dead-end.

## Caveat that makes this non-trivial

`fsGroup` is **not applied to hostPath volumes** (kubernetes/kubernetes#138411 — already noted in this chart for the bare-metal mysql init). On bare-metal / hostPath clusters, adding `fsGroup` alone will not fix it.

## Options (design decision needed before coding)

1. **Shared `fsGroup`** on both pods + group-writable staging — clean on CSI / dynamic PVs, **no-op on hostPath**.
2. **Ingestor creates staging dirs group-writable / setgid** so any group member can clean up.
3. **Ingestor owns cleanup** of its own staging (delete from a uid-65534 context); cli only drops the table.
4. **Run the teardown `rm` as uid 65534** (dedicated pod / initContainer).

Affected repos: `client` (chart securityContext — this issue's home), `client-runtime` (jobs-manager image / uid), `data-ingestors` (staging dir perms), `cli` (teardown path + the misleading retry message).

## Secondary fix (cli)

`tracebloc/cli` `internal/cli/dataset_rm.go:187-191`: do not advise "re-run … to remove the leftover files" when the failure is a permission error — re-running cannot help. Detect `EACCES` and give accurate guidance (or an operator-side privileged cleanup path).

## Acceptance criteria

- [ ] `tracebloc dataset rm <name>` removes both the table **and** all staging files on supported volume types; hostPath behavior documented explicitly.
- [ ] No "re-run" advice surfaced for a non-recoverable permission failure.

## Refs

- Related (a failed ingest that reaches file-transfer leaks the same staging files this can't clean up): tracebloc/data-ingestors#260


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: `dataset rm` cannot delete staging files — ingestor (uid 65534) vs jobs-manager uid mismatch, no shared fsGroup #259

Summary

Root cause (verified)

Caveat that makes this non-trivial

Options (design decision needed before coding)

Secondary fix (cli)

Acceptance criteria

Refs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bug: dataset rm cannot delete staging files — ingestor (uid 65534) vs jobs-manager uid mismatch, no shared fsGroup #259

Description

Summary

Root cause (verified)

Caveat that makes this non-trivial

Options (design decision needed before coding)

Secondary fix (cli)

Acceptance criteria

Refs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

bug: `dataset rm` cannot delete staging files — ingestor (uid 65534) vs jobs-manager uid mismatch, no shared fsGroup #259