Watch-first ingestion with (optional!) audit-based attribution#174
Watch-first ingestion with (optional!) audit-based attribution#174sunib wants to merge 37 commits into
Conversation
…erate testing project to exactly get how the kube-api versions are responding to edge cases
… structure way (so that we can also use this to evalaute new (or older) Kubernetes versions
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a validating admission webhook, a mutation-capture lab binary and recorders, watch-state streaming, normalization and corpus generation, and e2e scenarios with committed fixtures and deployment scripts. ChangesMutation capture lab and admission webhook
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a validating admission webhook to the main controller and adds a new "mutation-capture lab" utility (cmd/mutation-capture-lab) along with its associated packages under internal/mutationlab/ to record and normalize Kubernetes watch, audit, and admission events into a versioned corpus. Feedback on the changes highlights a concurrency issue in the lab's shutdown sequence where multiple HTTP servers are shut down sequentially using a single shared timeout context, which could prevent subsequent servers from shutting down gracefully.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| for _, s := range servers { | ||
| _ = s.srv.Shutdown(shutdownCtx) | ||
| } |
There was a problem hiding this comment.
Calling Shutdown sequentially on multiple servers using a single shared timeout context (shutdownCtx) can prevent subsequent servers from shutting down gracefully if a preceding server takes a long time to shut down. If the first server takes the entire 5 seconds to shut down, the context will expire, and the remaining servers will immediately fail to shut down gracefully.
We should shut down the servers concurrently.
var shutdownWg sync.WaitGroup
for _, s := range servers {
shutdownWg.Add(1)
go func(srv *http.Server) {
defer shutdownWg.Done()
_ = srv.Shutdown(shutdownCtx)
}(s.srv)
}
shutdownWg.Wait()There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (2)
hack/e2e/lab-port-forward.sh (1)
71-71: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueOptional: rename the unused loop counter to
_.Shellcheck SC2034 flags
ias unused; the loop body never references it.♻️ Proposed tweak
-for i in {1..15}; do +for _ in {1..15}; do🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hack/e2e/lab-port-forward.sh` at line 71, The loop counter in the port-forward retry loop is unused, triggering Shellcheck SC2034. Update the `for` loop in `lab-port-forward.sh` to use `_` instead of `i` since the body never references the counter, keeping the retry behavior unchanged.Source: Linters/SAST tools
test/mutationlab/e2e/harness_test.go (1)
232-256: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win
docan hang indefinitely:http.DefaultClienthas no timeout.The 30s
deadlineonly bounds the retry loop between attempts. A singlehttp.DefaultClient.Dohas no client-side timeout, so if the port-forward accepts the connection but stalls without responding (a plausible failure for the watchdog-restarted forward described in the comment), the call blocks forever and the loop deadline is never re-evaluated. Use a client with an explicit per-request timeout (or a request context with deadline).♻️ Proposed fix
-func (h *harness) do(method, path string) ([]byte, int, error) { - var lastErr error - deadline := time.Now().Add(30 * time.Second) - for time.Now().Before(deadline) { - req, err := http.NewRequest(method, h.apiURL+path, nil) - if err != nil { - return nil, 0, err - } - resp, err := http.DefaultClient.Do(req) +func (h *harness) do(method, path string) ([]byte, int, error) { + var lastErr error + client := &http.Client{Timeout: 10 * time.Second} + deadline := time.Now().Add(30 * time.Second) + for time.Now().Before(deadline) { + req, err := http.NewRequest(method, h.apiURL+path, nil) + if err != nil { + return nil, 0, err + } + resp, err := client.Do(req)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/mutationlab/e2e/harness_test.go` around lines 232 - 256, The harness do method can block forever because http.DefaultClient.Do has no per-request timeout, so the loop deadline is never checked during a stalled request. Update do to use a client or request context with an explicit timeout, and keep the existing retry/deadline logic in h.do so each attempt fails fast before the 30s overall limit is reached.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/main.go`:
- Around line 708-710: The admission webhook port validation in main’s startup
config check only rejects non-positive values and still allows values above the
valid TCP range. Update the validation around cfg.admissionWebhookPort to reject
ports greater than 65535 as well, and keep the error message clear in the same
check. Also add an invalid-flag test case covering
--admission-webhook-port=65536 to ensure the new range check is enforced.
In `@config/webhook/kustomization.yaml`:
- Around line 4-5: The kustomization.yaml entry currently leaves an extra
trailing blank line, which triggers YAMLlint. Update the kustomization content
so the final listed resource/reference (the validating-webhook.yaml entry) is
the last line in the file, with no blank line after it.
In `@config/webhook/validating-webhook.yaml`:
- Line 52: The YAML file has an extra trailing blank line that triggers the
empty-lines lint error; remove the final empty line so the document ends
cleanly. Update the webhook manifest content in validating-webhook.yaml and
verify there are no stray blank lines after the last YAML entry.
- Around line 38-51: Shorten the validating webhook timeout and clean up the
YAML formatting: in the webhook configuration with the rules block and
timeoutSeconds setting, reduce timeoutSeconds from 10 to a few seconds so all
CREATE/UPDATE/DELETE requests are less likely to stall, and remove the trailing
blank line at the end of the file to satisfy the repo’s YAML lint rules.
In `@docs/design/mutation-capture-lab-design.md`:
- Around line 21-23: The deletecollection admission contract text is
inconsistent with the captured behavior and corpus references. Update the
mutation-capture design doc so the `deletecollection` row and any related
matrix/layout text consistently describe per-object validating admission calls
and per-object corpus outputs, and make sure the references in the
`deletecollection` section match the actual captured files and stack context
used by `admission.delete.collection.yaml` and the per-object
`admission.delete.cm-*` entries.
- Around line 24-27: The document has two conflicting statements about
deletecollection audit bodies, so reconcile them by making one versioned source
of truth in the design doc. Update the relevant sections around the
deletecollection discussion and the later guidance so they agree on whether
`responseObject` contains removed objects for the named cluster version, or
clearly label one as an earlier hypothesis that was superseded. Use the
`deletecollection` audit event and `responseObject` references to locate both
passages and align their wording.
- Around line 639-646: The setup description is inconsistent about how `task
lab-e2e` runs the lab: the isolated image-swap flow in `swap-image.sh` says it
reuses the already-prepared e2e cluster, but the M0/setup wording claims it
brings up its own k3d profile. Update the M0/setup text and any related wording
in the design doc to match the image-swap strategy, using the `task lab-e2e` and
`test/mutationlab/swap-image.sh` references consistently and removing any
mention of a separate cluster bring-up.
In `@internal/mutationlab/record.go`:
- Line 57: The Record.Key field currently uses omitempty on a plain ObjectKey,
which will still serialize as an empty object and mislead callers. Update the
Record struct to either remove omitempty from Key in record.go, or change Key to
*ObjectKey if it is meant to be optional; use the Record type and its Key field
as the place to make this adjustment.
In `@test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml`:
- Around line 53-56: Normalize the cluster-specific
`userInfo.extra["authentication.kubernetes.io/credential-id"]` value in the
admission/audit corpus so it stays stable across runs. Update the relevant
corpus normalization path to replace this fingerprint with a fixed placeholder,
or normalize the entire `userInfo.extra` map if that is the shared pattern, and
ensure the fixture under the admission create-succeeds case follows the same
rule.
In
`@test/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yaml`:
- Around line 60-63: The audit fixture still hardcodes a concrete
authentication.kubernetes.io/credential-id fingerprint, so update this corpus
entry to use a placeholder value instead of a real cluster-specific SHA256.
Locate the audit payload fixture under the deletecollection corpus and replace
the literal credential-id in the extra section so compare/write logic sees a
normalized, stable value.
In `@test/mutationlab/e2e/harness_test.go`:
- Around line 221-226: clearRecords currently only checks the error from
harness.do and can silently accept non-2xx responses, leaving stale records
behind. Update clearRecords in harness_test.go to validate the HTTP status
returned by h.do for the "/records" DELETE call, and fail the test unless it is
the expected http.StatusNoContent; keep the existing test helper flow and use
the clearRecords and do methods to locate the change.
---
Nitpick comments:
In `@hack/e2e/lab-port-forward.sh`:
- Line 71: The loop counter in the port-forward retry loop is unused, triggering
Shellcheck SC2034. Update the `for` loop in `lab-port-forward.sh` to use `_`
instead of `i` since the body never references the counter, keeping the retry
behavior unchanged.
In `@test/mutationlab/e2e/harness_test.go`:
- Around line 232-256: The harness do method can block forever because
http.DefaultClient.Do has no per-request timeout, so the loop deadline is never
checked during a stalled request. Update do to use a client or request context
with an explicit timeout, and keep the existing retry/deadline logic in h.do so
each attempt fails fast before the 30s overall limit is reached.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 321027e0-fe96-4101-9dba-8bdfb939a640
📒 Files selected for processing (57)
cmd/main.gocmd/main_audit_server_test.gocmd/mutation-capture-lab/main.goconfig/certs/certificates.yamlconfig/deployment.yamlconfig/kustomization.yamlconfig/service.yamlconfig/webhook/kustomization.yamlconfig/webhook/validating-webhook.yamldocs/design/mutation-capture-lab-design.mdgo.modhack/e2e/lab-port-forward.shinternal/mutationlab/corpus/golden.gointernal/mutationlab/corpus/golden_test.gointernal/mutationlab/labserver/api.gointernal/mutationlab/labserver/api_test.gointernal/mutationlab/labserver/resources.gointernal/mutationlab/labserver/resources_test.gointernal/mutationlab/normalize/normalize.gointernal/mutationlab/normalize/normalize_test.gointernal/mutationlab/record.gointernal/mutationlab/recorder/admission.gointernal/mutationlab/recorder/audit.gointernal/mutationlab/recorder/recorder_test.gointernal/mutationlab/recorder/scenario.gointernal/mutationlab/recorder/watch.gointernal/mutationlab/store/store.gointernal/mutationlab/store/store_test.gointernal/webhook/admission_allow_handler.gointernal/webhook/admission_allow_handler_test.gotest/e2e/Taskfile.ymltest/e2e/gittarget_overlap_e2e_test.gotest/mutationlab/Dockerfiletest/mutationlab/README.mdtest/mutationlab/corpus/CLUSTER.mdtest/mutationlab/corpus/configmap/create-succeeds/admission.create.yamltest/mutationlab/corpus/configmap/create-succeeds/audit.create.yamltest/mutationlab/corpus/configmap/create-succeeds/watch.added.yamltest/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-a.yamltest/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-b.yamltest/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-c.yamltest/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yamltest/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-a.yamltest/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-b.yamltest/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-c.yamltest/mutationlab/corpus/configmap/dry-run-create/admission.create.yamltest/mutationlab/corpus/configmap/dry-run-create/audit.create.yamltest/mutationlab/corpus/configmap/record-and-reject/admission.create.yamltest/mutationlab/corpus/configmap/record-and-reject/audit.create.yamltest/mutationlab/corpus/configmap/update/admission.update.yamltest/mutationlab/corpus/configmap/update/audit.update.yamltest/mutationlab/corpus/configmap/update/watch.modified.yamltest/mutationlab/e2e/create_succeeds_test.gotest/mutationlab/e2e/doc.gotest/mutationlab/e2e/harness_test.gotest/mutationlab/e2e/m1_scenarios_test.gotest/mutationlab/swap-image.sh
| Source Source `json:"source"` | ||
| Scenario string `json:"scenario,omitempty"` | ||
| ObservedAt time.Time `json:"observedAt"` | ||
| Key ObjectKey `json:"key,omitempty"` |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor
🧩 Analysis chain
🌐 Web query:
Does Go encoding/json omitempty work on non-pointer struct fields?
💡 Result:
Yes—omitempty can omit non-pointer struct fields, but only when the field’s value is considered “empty” by encoding/json’s definition of emptiness. In encoding/json, the “omitempty” option omits a struct field from the output if the field value is an empty Go value, defined as false, 0, a nil pointer, a nil interface value, and any array/slice/map/string of length zero [1]. A struct value is not inherently an “empty value” under that definition, so a non-pointer struct field will usually still be emitted. Concrete implications for non-pointer struct fields: 1) If the field is a non-pointer struct type (e.g., a nested struct), encoding/json does not treat it as empty just because its subfields are zero; omitempty generally does not omit it. This is why time.Time (a non-pointer struct) does not get omitted with omitempty in many cases—zero time.Time is still a struct value, and the encoder can’t decide it’s empty based solely on subfields [2][3]. 2) However, if the struct field is actually of a type that can be treated as empty by encoding/json’s rules (for example, via nil being possible only for pointers/interfaces, not plain struct values), then omitempty can omit it—but for plain non-pointer struct values, that usually doesn’t apply. Recommended pattern: use pointers for optional nested structs/values, so the field can be nil and thus “empty” to encoding/json’s omitempty logic [4][5]. Alternative (Go 1.24+): use the “omitzero” tag, which is designed to omit struct fields when their value is the zero value (and can use an IsZero method), allowing non-pointer types like time.Time to be omitted when they’re zero [1][6].
Citations:
- 1: https://pkg.go.dev/encoding/json
- 2: https://cyrillschumacher.com/2016/02/01/go-encoding/jsonxml-omitempty-with-non-pointer-structs/
- 3: https://groups.google.com/g/golang-nuts/c/CaF9dlrC24Y
- 4: https://www.sohamkamani.com/golang/omitempty/
- 5: https://golangbot.com/omitzero-go/
- 6: encoding/json: add omitzero option golang/go#45669
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the relevant files and definitions.
git ls-files internal/mutationlab/record.go internal/mutationlab | sed -n '1,120p'
echo '--- record.go ---'
cat -n internal/mutationlab/record.go
echo '--- ObjectKey search ---'
rg -n "type ObjectKey|ObjectKey struct|ObjectKey" internal -g '!**/*_test.go'Repository: ConfigButler/gitops-reverser
Length of output: 7786
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the relevant files and definitions.
git ls-files internal/mutationlab/record.go internal/mutationlab | sed -n '1,120p'
echo '--- record.go ---'
cat -n internal/mutationlab/record.go
echo '--- ObjectKey search ---'
rg -n "type ObjectKey|ObjectKey struct|ObjectKey" internal -g '!**/*_test.go'Repository: ConfigButler/gitops-reverser
Length of output: 7786
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the relevant files and definitions.
git ls-files internal/mutationlab/record.go internal/mutationlab | sed -n '1,120p'
echo '--- record.go ---'
cat -n internal/mutationlab/record.go
echo '--- ObjectKey search ---'
rg -n "type ObjectKey|ObjectKey struct|ObjectKey" internal -g '!**/*_test.go'Repository: ConfigButler/gitops-reverser
Length of output: 7786
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the relevant file and the ObjectKey type before rewriting the review comment.
echo '--- internal/mutationlab/record.go ---'
cat -n internal/mutationlab/record.go
echo '--- ObjectKey definition(s) ---'
rg -n "type ObjectKey|ObjectKey struct|type .*ObjectKey" internal -g '!**/*_test.go'
echo '--- surrounding matches ---'
rg -n -A3 -B3 "ObjectKey" internal/mutationlab internal -g '!**/*_test.go'Repository: ConfigButler/gitops-reverser
Length of output: 21090
Remove omitempty from Record.Key
encoding/json does not treat a plain struct field as empty, so Key will still be emitted as "key": {} when all of its fields are zero. If the field should be optional, make it *ObjectKey; otherwise drop omitempty to avoid misleading behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/mutationlab/record.go` at line 57, The Record.Key field currently
uses omitempty on a plain ObjectKey, which will still serialize as an empty
object and mislead callers. Update the Record struct to either remove omitempty
from Key in record.go, or change Key to *ObjectKey if it is meant to be
optional; use the Record type and its Key field as the place to make this
adjustment.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/mutationlab/normalize/normalize.go (2)
398-420: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy liftDon't rewrite namespaces with global substring replacement.
replaceNamespacesrewrites every matching substring in the URI/key, not just the namespace field/segment. A path like/api/v1/namespaces/lab/configmaps/labwould normalize bothlaboccurrences, corrupting the object name as well as the namespace. This should replace only the namespace segment / JSON value, not every textual match.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/mutationlab/normalize/normalize.go` around lines 398 - 420, The namespace normalization in replaceNamespaces is too broad because it does a global substring replace, which can rewrite unrelated occurrences like object names inside paths or keys. Update replaceNamespaces and its call sites such as rewriteKey so namespaces are replaced only in the actual namespace segment or JSON value being normalized, not every matching substring; keep the placeholder logic in indices/nsByLen/ns and preserve the IP rewriting behavior in rewriteKey.
201-220: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winNormalize
authentication.kubernetes.io/credential-idbefore writing corpus.The array path here only rewrites
sourceIPs, souser.extra["authentication.kubernetes.io/credential-id"]survives verbatim. The new corpus already commits raw fingerprint values intest/mutationlab/corpus/deployment/scale-patch/audit.patch.yaml, Lines 41-42, andtest/mutationlab/corpus/pod/graceful-delete/admission.delete.yaml, Lines 210-212. A rotated admin cert or fresh cluster bootstrap will change those values and fail corpus comparison even when behavior is unchanged.Also applies to: 344-357
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/mutationlab/normalize/normalize.go` around lines 201 - 220, The corpus normalization in collector.collectScalar only special-cases sourceIPs, so authentication.kubernetes.io/credential-id is still written verbatim and causes unstable corpus diffs. Add normalization for that key in collectScalar, ideally by rewriting its values to a deterministic token before they reach c.orderedFor, and keep the handling scoped to the existing collector logic so user.extra values are normalized consistently across corpus generation.
🧹 Nitpick comments (1)
test/mutationlab/e2e/m2_scenarios_test.go (1)
90-90: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winRename these tests to the repo's
TestFunctionName_Scenarioform.
TestStatusSubresource,TestScaleSubresource, andTestGracefulDeletedon't follow the*_test.gonaming convention the repo documents.TestStatusSubresource_WatchOnly, etc., would align with the rest of the suite. As per coding guidelines, "Follow naming convention 'TestFunctionName_Scenario(t *testing.T)' for Go tests".Also applies to: 127-127, 190-190
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/mutationlab/e2e/m2_scenarios_test.go` at line 90, Rename the Go test functions to match the repo’s TestFunctionName_Scenario convention so they align with the rest of the suite. Update TestStatusSubresource, TestScaleSubresource, and TestGracefulDelete in m2_scenarios_test.go to scenario-style names such as TestStatusSubresource_WatchOnly, and keep the rest of each test body unchanged.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/design/mutation-capture-lab-design.md`:
- Around line 280-281: The matrix row about `matchPolicy: Equivalent` overstates
that the webhook always sees the submitted version; update the wording in the
design note to reflect that the apiserver may send a converted object while
`requestKind` and `requestResource` preserve the original version, or change the
policy mention to `Exact` if the intent is to require the raw submitted version.
Use the existing `matchPolicy`, `requestKind`, `requestResource`, and
`apiVersions: ['*']` references in the same section to keep the statement
aligned with how the recorder observes the admission request.
In `@test/mutationlab/e2e/m2_scenarios_test.go`:
- Around line 171-172: The gracefulPod scenario uses the mutable busybox:1.36
tag, which can change the recorded imageID and drift the golden corpus. Update
the pod spec in the gracefulPod setup to pin the image by digest, or adjust the
normalization/corpus generation around the pod status imageID so it is scrubbed
consistently; use the gracefulPod test case and the image field in
m2_scenarios_test.go as the place to fix it.
---
Outside diff comments:
In `@internal/mutationlab/normalize/normalize.go`:
- Around line 398-420: The namespace normalization in replaceNamespaces is too
broad because it does a global substring replace, which can rewrite unrelated
occurrences like object names inside paths or keys. Update replaceNamespaces and
its call sites such as rewriteKey so namespaces are replaced only in the actual
namespace segment or JSON value being normalized, not every matching substring;
keep the placeholder logic in indices/nsByLen/ns and preserve the IP rewriting
behavior in rewriteKey.
- Around line 201-220: The corpus normalization in collector.collectScalar only
special-cases sourceIPs, so authentication.kubernetes.io/credential-id is still
written verbatim and causes unstable corpus diffs. Add normalization for that
key in collectScalar, ideally by rewriting its values to a deterministic token
before they reach c.orderedFor, and keep the handling scoped to the existing
collector logic so user.extra values are normalized consistently across corpus
generation.
---
Nitpick comments:
In `@test/mutationlab/e2e/m2_scenarios_test.go`:
- Line 90: Rename the Go test functions to match the repo’s
TestFunctionName_Scenario convention so they align with the rest of the suite.
Update TestStatusSubresource, TestScaleSubresource, and TestGracefulDelete in
m2_scenarios_test.go to scenario-style names such as
TestStatusSubresource_WatchOnly, and keep the rest of each test body unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 56f06b1b-0d31-4cf8-a9c0-4822cbb804bb
📒 Files selected for processing (31)
docs/design/mutation-capture-lab-design.mdinternal/mutationlab/normalize/normalize.gointernal/mutationlab/normalize/normalize_test.gotest/mutationlab/corpus/configmap/create-succeeds/admission.create.yamltest/mutationlab/corpus/configmap/create-succeeds/audit.create.yamltest/mutationlab/corpus/configmap/create-succeeds/watch.added.yamltest/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-a.yamltest/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-b.yamltest/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-c.yamltest/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yamltest/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-a.yamltest/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-b.yamltest/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-c.yamltest/mutationlab/corpus/configmap/dry-run-create/admission.create.yamltest/mutationlab/corpus/configmap/dry-run-create/audit.create.yamltest/mutationlab/corpus/configmap/record-and-reject/admission.create.yamltest/mutationlab/corpus/configmap/record-and-reject/audit.create.yamltest/mutationlab/corpus/configmap/update/admission.update.yamltest/mutationlab/corpus/configmap/update/audit.update.yamltest/mutationlab/corpus/configmap/update/watch.modified.yamltest/mutationlab/corpus/deployment/scale-patch/audit.patch.yamltest/mutationlab/corpus/deployment/scale-patch/watch.modified.1.yamltest/mutationlab/corpus/deployment/scale-patch/watch.modified.2.yamltest/mutationlab/corpus/deployment/status-update/watch.modified.1.yamltest/mutationlab/corpus/deployment/status-update/watch.modified.2.yamltest/mutationlab/corpus/pod/graceful-delete/admission.delete.yamltest/mutationlab/corpus/pod/graceful-delete/watch.deleted.yamltest/mutationlab/corpus/pod/graceful-delete/watch.modified.yamltest/mutationlab/e2e/harness_test.gotest/mutationlab/e2e/m2_scenarios_test.gotest/mutationlab/swap-image.sh
✅ Files skipped from review due to trivial changes (21)
- test/mutationlab/corpus/configmap/update/watch.modified.yaml
- test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-b.yaml
- test/mutationlab/corpus/deployment/scale-patch/watch.modified.1.yaml
- test/mutationlab/corpus/pod/graceful-delete/watch.deleted.yaml
- test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-a.yaml
- test/mutationlab/corpus/configmap/record-and-reject/admission.create.yaml
- test/mutationlab/corpus/configmap/create-succeeds/watch.added.yaml
- test/mutationlab/corpus/configmap/dry-run-create/audit.create.yaml
- test/mutationlab/corpus/configmap/update/audit.update.yaml
- test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-b.yaml
- test/mutationlab/corpus/pod/graceful-delete/watch.modified.yaml
- test/mutationlab/corpus/deployment/status-update/watch.modified.2.yaml
- test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-c.yaml
- test/mutationlab/corpus/configmap/dry-run-create/admission.create.yaml
- test/mutationlab/corpus/configmap/record-and-reject/audit.create.yaml
- test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml
- test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-a.yaml
- test/mutationlab/corpus/deployment/scale-patch/watch.modified.2.yaml
- test/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yaml
- test/mutationlab/corpus/deployment/status-update/watch.modified.1.yaml
- test/mutationlab/corpus/configmap/create-succeeds/audit.create.yaml
🚧 Files skipped from review as they are similar to previous changes (4)
- test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-c.yaml
- test/mutationlab/corpus/configmap/update/admission.update.yaml
- test/mutationlab/swap-image.sh
- test/mutationlab/e2e/harness_test.go
| Image: "busybox:1.36", | ||
| Command: []string{"sleep", "3600"}, |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Pin this Pod image by digest or scrub imageID from normalization.
The committed corpus for this scenario preserves status.containerStatuses[].imageID in test/mutationlab/corpus/pod/graceful-delete/admission.delete.yaml, Lines 162-164, but gracefulPod pulls busybox:1.36 by mutable tag. If that tag is republished or resolves to a different digest, the golden corpus will drift even though the control-plane behavior under test did not change.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/mutationlab/e2e/m2_scenarios_test.go` around lines 171 - 172, The
gracefulPod scenario uses the mutable busybox:1.36 tag, which can change the
recorded imageID and drift the golden corpus. Update the pod spec in the
gracefulPod setup to pin the image by digest, or adjust the normalization/corpus
generation around the pod status imageID so it is scrubbed consistently; use the
gracefulPod test case and the image field in m2_scenarios_test.go as the place
to fix it.
Code Review SummaryStatus: No Issues Found | Recommendation: Merge No new issues were found in the incremental changes. The changes in this PR are cleanup operations:
All previously identified issues have been addressed in prior commits. Previous Review Summaries (16 snapshots, latest commit 1c61666)Current summary above is authoritative. Previous snapshots are kept for context only. Previous review (commit 1c61666)Status: 1 Issue Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Files Reviewed (24 files)
Fix these issues in Kilo Cloud Previous review (commit 12f3aa2)Status: 1 Issue Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Files Reviewed (32 files)
Fix these issues in Kilo Cloud Previous review (commit 92fa490)Status: No Issues Found | Recommendation: Merge Files Reviewed (31 files)
Previous review (commit f2773a8)Status: No Issues Found | Recommendation: Merge Files Reviewed (5 files)
Previous review (commit 893e17f)Status: No Issues Found | Recommendation: Merge Files Reviewed (5 files)
Previous review (commit d09ab73)Status: No Issues Found | Recommendation: Merge Files Reviewed (5 files)
Previous review (commit cb8d4b0)Status: No Issues Found | Recommendation: Merge Files Reviewed (11 files)
Previous review (commit 915b524)Status: No Issues Found | Recommendation: Merge Files Reviewed (14 files)
Previous review (commit 04aa391)Status: No Issues Found | Recommendation: Merge Files Reviewed (8 files)
Previous review (commit acaea33)Status: No Issues Found | Recommendation: Merge Files Reviewed (11 files)
Previous review (commit acf73d5)Status: 1 Issue Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Files Reviewed (16 files)
Fix these issues in Kilo Cloud Previous review (commit d7bdb16)Status: 1 Issue Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Files Reviewed (16 files)
Fix these issues in Kilo Cloud Previous review (commit 7e011db)Status: No Issues Found | Recommendation: Merge Files Reviewed (16 files)
Previous review (commit b524d83)Status: No Issues Found | Recommendation: Merge Overview
Incremental Review (b2d5bc7..79a9fe6)No new code issues were identified in this incremental diff. The changes consist of documentation updates aligning with the watch-first architecture:
The incremental diff shows no logic changes in executable code paths — only documentation and configuration alignment. Files Reviewed (16 files)
Previous review (commit 79a9fe6)Status: No Issues Found | Recommendation: Merge Overview
Incremental Review (b2d5bc7..79a9fe6)No new code issues were identified in this incremental diff. The changes consist of documentation updates aligning with the watch-first architecture:
The incremental diff shows no logic changes in executable code paths — only documentation and configuration alignment. Files Reviewed (16 files)
Previous review (commit b2d5bc7)Status: Active Issues Found | Recommendation: Review existing CodeRabbit comments Overview
Incremental Review (e0b6369..HEAD)No new code issues were identified in this incremental diff. The changes consist mainly of documentation restructuring and test-file consolidation (M1→ Previously identified issues on unchanged lines remain active (see Unresolved Findings below). The following issues from prior reviews that fell within the incremental diff are now resolved:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
internal/mutationlab/recorder/conversion_test.go (1)
30-65: 🎯 Functional Correctness | 🔵 Trivial | 💤 Low valueOptional: add a large-
sizeBytescase.Both directions only test
1024/2048, which round-trip cleanly throughfloat64. A value above 2^53 would expose the precision/json.Numberissue flagged inconversion.go.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/mutationlab/recorder/conversion_test.go` around lines 30 - 65, Add a regression case to TestConvertWidget_RenamesFieldBothDirections that uses a very large spec.sizeBytes value above 2^53 and verifies both convertWidget directions preserve it correctly. The current test only covers small values that survive json.Unmarshal as float64, so extend the existing v1->v2 and v2->v1 assertions to catch precision loss and ensure conversion.go handles large integers without truncation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/mutationlab/recorder/conversion.go`:
- Around line 100-140: The size parsing in convertWidget/widgetSize is losing
precision because json.Unmarshal decodes numbers into float64, so the
json.Number branch in widgetSize is never reached for sizeBytes. Update
convertWidget to decode the raw JSON with a Decoder that calls UseNumber()
before populating obj, so spec["sizeBytes"] can be preserved as json.Number and
parsed exactly; keep the existing widgetSize logic and ensure any needed bytes
import is added.
In `@test/mutationlab/e2e/m4_scenarios_test.go`:
- Around line 113-118: Remove the committed audit-additional.create.yaml from
the flunder/aggregated-api-write corpus so the golden corpus matches what
m4_scenarios_test actually generates. The corpus is built in h.syncCorpus from
corpusRecords in test/mutationlab/e2e/m4_scenarios_test.go, where enriched is
appended only when present, so the audit-additional record must not be checked
in as a required golden file. Update the corpus under
mutationlab/corpus/flunder/aggregated-api-write and keep the generated set
aligned with the conditional logic in m4_scenarios_test and Compare() behavior
in golden.go.
---
Nitpick comments:
In `@internal/mutationlab/recorder/conversion_test.go`:
- Around line 30-65: Add a regression case to
TestConvertWidget_RenamesFieldBothDirections that uses a very large
spec.sizeBytes value above 2^53 and verifies both convertWidget directions
preserve it correctly. The current test only covers small values that survive
json.Unmarshal as float64, so extend the existing v1->v2 and v2->v1 assertions
to catch precision loss and ensure conversion.go handles large integers without
truncation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 9ddc6031-6047-4583-a86d-368b5d8d099e
📒 Files selected for processing (33)
cmd/mutation-capture-lab/main.godocs/design/mutation-capture-lab-design.mdinternal/mutationlab/normalize/normalize.gointernal/mutationlab/normalize/normalize_test.gointernal/mutationlab/record.gointernal/mutationlab/recorder/conversion.gointernal/mutationlab/recorder/conversion_test.gotest/mutationlab/corpus/configmap/create-succeeds/admission.create.yamltest/mutationlab/corpus/configmap/create-succeeds/audit.create.yamltest/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-a.yamltest/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-b.yamltest/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-c.yamltest/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yamltest/mutationlab/corpus/configmap/dry-run-create/admission.create.yamltest/mutationlab/corpus/configmap/dry-run-create/audit.create.yamltest/mutationlab/corpus/configmap/record-and-reject/admission.create.yamltest/mutationlab/corpus/configmap/record-and-reject/audit.create.yamltest/mutationlab/corpus/configmap/update/admission.update.yamltest/mutationlab/corpus/configmap/update/audit.update.yamltest/mutationlab/corpus/deployment/scale-patch/audit.patch.yamltest/mutationlab/corpus/flunder/aggregated-api-write/audit-additional.create.yamltest/mutationlab/corpus/flunder/aggregated-api-write/audit.create.yamltest/mutationlab/corpus/flunder/aggregated-api-write/watch.added.yamltest/mutationlab/corpus/pod/graceful-delete/admission.delete.yamltest/mutationlab/corpus/widget/crd-conversion/admission.create.yamltest/mutationlab/corpus/widget/crd-conversion/audit.create.yamltest/mutationlab/corpus/widget/crd-conversion/conversion.to-v1.yamltest/mutationlab/corpus/widget/crd-conversion/conversion.to-v2.yamltest/mutationlab/corpus/widget/crd-conversion/watch.added.yamltest/mutationlab/e2e/harness_test.gotest/mutationlab/e2e/m3_scenarios_test.gotest/mutationlab/e2e/m4_scenarios_test.gotest/mutationlab/swap-image.sh
✅ Files skipped from review due to trivial changes (16)
- test/mutationlab/corpus/configmap/dry-run-create/audit.create.yaml
- test/mutationlab/corpus/flunder/aggregated-api-write/audit.create.yaml
- test/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yaml
- test/mutationlab/corpus/configmap/record-and-reject/audit.create.yaml
- test/mutationlab/corpus/flunder/aggregated-api-write/audit-additional.create.yaml
- test/mutationlab/corpus/widget/crd-conversion/audit.create.yaml
- test/mutationlab/corpus/widget/crd-conversion/conversion.to-v2.yaml
- test/mutationlab/corpus/configmap/update/admission.update.yaml
- test/mutationlab/corpus/widget/crd-conversion/admission.create.yaml
- test/mutationlab/corpus/deployment/scale-patch/audit.patch.yaml
- test/mutationlab/corpus/configmap/record-and-reject/admission.create.yaml
- test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-a.yaml
- test/mutationlab/corpus/configmap/create-succeeds/audit.create.yaml
- test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-b.yaml
- test/mutationlab/corpus/pod/graceful-delete/admission.delete.yaml
- test/mutationlab/corpus/configmap/update/audit.update.yaml
🚧 Files skipped from review as they are similar to previous changes (10)
- test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-c.yaml
- test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml
- internal/mutationlab/record.go
- test/mutationlab/corpus/configmap/dry-run-create/admission.create.yaml
- test/mutationlab/swap-image.sh
- test/mutationlab/e2e/harness_test.go
- internal/mutationlab/normalize/normalize_test.go
- cmd/mutation-capture-lab/main.go
- internal/mutationlab/normalize/normalize.go
- docs/design/mutation-capture-lab-design.md
| func convertWidget(raw json.RawMessage, desiredAPIVersion string) json.RawMessage { | ||
| var obj map[string]any | ||
| if err := json.Unmarshal(raw, &obj); err != nil { | ||
| return raw | ||
| } | ||
| obj["apiVersion"] = desiredAPIVersion | ||
| spec, ok := obj["spec"].(map[string]any) | ||
| if !ok { | ||
| return remarshal(obj, raw) | ||
| } | ||
| switch versionOf(desiredAPIVersion) { | ||
| case "v2": | ||
| if n, ok := widgetSize(spec["sizeBytes"]); ok { | ||
| spec["size"] = strconv.FormatInt(n, 10) | ||
| delete(spec, "sizeBytes") | ||
| } | ||
| case "v1": | ||
| if n, ok := widgetSize(spec["size"]); ok { | ||
| spec["sizeBytes"] = n | ||
| delete(spec, "size") | ||
| } | ||
| } | ||
| return remarshal(obj, raw) | ||
| } | ||
|
|
||
| // widgetSize reads the Widget size from either the v1 integer or the v2 string | ||
| // form, returning the canonical integer. | ||
| func widgetSize(v any) (int64, bool) { | ||
| switch t := v.(type) { | ||
| case json.Number: | ||
| n, err := t.Int64() | ||
| return n, err == nil | ||
| case float64: | ||
| return int64(t), true | ||
| case string: | ||
| n, err := strconv.ParseInt(t, 10, 64) | ||
| return n, err == nil | ||
| default: | ||
| return 0, false | ||
| } | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor
🧩 Analysis chain
🌐 Web query:
In Go, does json.Unmarshal into map[string]any decode JSON numbers as float64 unless json.Decoder.UseNumber is set?
💡 Result:
Yes, that is correct. By default, when the Go encoding/json package unmarshals JSON numbers into an interface{} value (such as within a map[string]any), it decodes them as float64 [1][2]. If you use json.Decoder.UseNumber, the decoder will instead unmarshal JSON numbers into a json.Number type [3][4]. A json.Number is a string-based type that holds the literal text of the JSON number, which allows you to handle the number without losing precision or converting it to a float64 prematurely [5][6]. Summary of behavior: - Default: JSON numbers are decoded as float64 [1][2]. - With UseNumber: JSON numbers are decoded as json.Number [3][6].
Citations:
- 1: https://pkg.go.dev/encoding/json
- 2: https://go.dev/blog/json
- 3: https://go.dev/src/encoding/json/stream.go?m=text
- 4: https://github.com/golang/go/blob/master/src/encoding/json/stream.go
- 5: https://go.dev/src/encoding/json/decode.go
- 6: https://www.gofaq.org/en/validate-request-bodies/
json.Number branch in widgetSize is unreachable; large sizeBytes lose precision.
convertWidget uses json.Unmarshal, which defaults to decoding JSON numbers as float64 when populating a map[string]any. Consequently, spec["sizeBytes"] is never a json.Number, the case json.Number in widgetSize never executes, and integers larger than 2^53 lose precision before any conversion.
Update the decoder to use UseNumber() to preserve the exact integer representation:
Proposed fix
- var obj map[string]any
- if err := json.Unmarshal(raw, &obj); err != nil {
- return raw
- }
+ var obj map[string]any
+ dec := json.NewDecoder(bytes.NewReader(raw))
+ dec.UseNumber()
+ if err := dec.Decode(&obj); err != nil {
+ return raw
- }Ensure "bytes" is imported.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func convertWidget(raw json.RawMessage, desiredAPIVersion string) json.RawMessage { | |
| var obj map[string]any | |
| if err := json.Unmarshal(raw, &obj); err != nil { | |
| return raw | |
| } | |
| obj["apiVersion"] = desiredAPIVersion | |
| spec, ok := obj["spec"].(map[string]any) | |
| if !ok { | |
| return remarshal(obj, raw) | |
| } | |
| switch versionOf(desiredAPIVersion) { | |
| case "v2": | |
| if n, ok := widgetSize(spec["sizeBytes"]); ok { | |
| spec["size"] = strconv.FormatInt(n, 10) | |
| delete(spec, "sizeBytes") | |
| } | |
| case "v1": | |
| if n, ok := widgetSize(spec["size"]); ok { | |
| spec["sizeBytes"] = n | |
| delete(spec, "size") | |
| } | |
| } | |
| return remarshal(obj, raw) | |
| } | |
| // widgetSize reads the Widget size from either the v1 integer or the v2 string | |
| // form, returning the canonical integer. | |
| func widgetSize(v any) (int64, bool) { | |
| switch t := v.(type) { | |
| case json.Number: | |
| n, err := t.Int64() | |
| return n, err == nil | |
| case float64: | |
| return int64(t), true | |
| case string: | |
| n, err := strconv.ParseInt(t, 10, 64) | |
| return n, err == nil | |
| default: | |
| return 0, false | |
| } | |
| } | |
| func convertWidget(raw json.RawMessage, desiredAPIVersion string) json.RawMessage { | |
| var obj map[string]any | |
| dec := json.NewDecoder(bytes.NewReader(raw)) | |
| dec.UseNumber() | |
| if err := dec.Decode(&obj); err != nil { | |
| return raw | |
| } | |
| obj["apiVersion"] = desiredAPIVersion | |
| spec, ok := obj["spec"].(map[string]any) | |
| if !ok { | |
| return remarshal(obj, raw) | |
| } | |
| switch versionOf(desiredAPIVersion) { | |
| case "v2": | |
| if n, ok := widgetSize(spec["sizeBytes"]); ok { | |
| spec["size"] = strconv.FormatInt(n, 10) | |
| delete(spec, "sizeBytes") | |
| } | |
| case "v1": | |
| if n, ok := widgetSize(spec["size"]); ok { | |
| spec["sizeBytes"] = n | |
| delete(spec, "size") | |
| } | |
| } | |
| return remarshal(obj, raw) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/mutationlab/recorder/conversion.go` around lines 100 - 140, The size
parsing in convertWidget/widgetSize is losing precision because json.Unmarshal
decodes numbers into float64, so the json.Number branch in widgetSize is never
reached for sizeBytes. Update convertWidget to decode the raw JSON with a
Decoder that calls UseNumber() before populating obj, so spec["sizeBytes"] can
be preserved as json.Number and parsed exactly; keep the existing widgetSize
logic and ensure any needed bytes import is added.
| corpusRecords := []mutationlab.Record{*official, *added} | ||
| if enriched != nil { | ||
| corpusRecords = append(corpusRecords, *enriched) | ||
| } | ||
| h.syncCorpus(t, "flunder/aggregated-api-write", corpusRecords) | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm the additional-audit corpus is committed and check how Compare treats missing captures.
fd -t f 'audit-additional' test/mutationlab/corpus
rg -n 'stray corpus|SourceAuditAdditional|audit-webhook-additional' internal/mutationlab cmd/mutation-capture-labRepository: ConfigButler/gitops-reverser
Length of output: 1729
Remove committed audit-additional.create.yaml to prevent flaky E2E failures.
The audit-additional record is conditionally appended in test/mutationlab/e2e/m4_scenarios_test.go (line 115), meaning it is absent when enriched is nil. However, test/mutationlab/corpus/flunder/aggregated-api-write/audit-additional.create.yaml is committed to the repo. Per internal/mutationlab/corpus/golden.go (line 181), if a committed file is not generated during the test run, Compare() fails with stray corpus files no longer captured. This causes the test to flake if the e2e cluster's audit proxy does not emit the additional event.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/mutationlab/e2e/m4_scenarios_test.go` around lines 113 - 118, Remove the
committed audit-additional.create.yaml from the flunder/aggregated-api-write
corpus so the golden corpus matches what m4_scenarios_test actually generates.
The corpus is built in h.syncCorpus from corpusRecords in
test/mutationlab/e2e/m4_scenarios_test.go, where enriched is appended only when
present, so the audit-additional record must not be checked in as a required
golden file. Update the corpus under
mutationlab/corpus/flunder/aggregated-api-write and keep the generated set
aligned with the conditional logic in m4_scenarios_test and Compare() behavior
in golden.go.
Phase 0 ("finish the evidence") of the watch-only ingestion proposal: two new
"difficult cases" corpus rows captured against a live k8s v1.35.2 cluster.
- Row 10 (owner-ref cascade): one user delete fans out to a watch DELETED for
both the parent and the cascaded child; the child delete is audited under
generic-garbage-collector, not the human — concrete delete-attribution
evidence for the conservative resolver.
- Row 13 (optimistic-concurrency conflict): a stale-resourceVersion update is
rejected at the storage layer BEFORE validating admission runs, so audit is
the sole witness — no admission, no watch. Stronger than rows 11/12.
Also fixes a normalizer non-determinism the cascade surfaced: owner-ref UIDs
embedded in managedFields association keys (k:{"uid":...}) were not collapsed,
churning the corpus every run. Adds uidByLen handling mirroring the existing IP
rule.
Full lab suite 13/13 deterministic in compare mode. Updates the README scenario
index and the design doc (corpus table, rows 10/13, and the rows 16/17
resync/bookmark recorder-readiness findings).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phase 1 of docs/design/watch-only-ingestion-architecture.md ("build watch state
in parallel"): for every Synced type, optionally hold a long-lived WATCH and
record each ADDED/MODIFIED/DELETED into a per-type ⌚stream, written
ALONGSIDE the authoritative :audit:stream so the watch-derived desired set can
be diffed against the audit-derived one. Off by default; changes no Git write.
- queue: AppendWatchEvent / DeleteTypeWatchStream on RedisByTypeStreamQueue,
reusing the per-type key machinery and the :objects:items envelope shape so
the checkpoint and watch log fold byte-identically.
- watch: a runner twinned with the audit tail (started on Synced beside
startTypeAuditTail, stopped on Release); the resume cursor advances on
bookmarks and resets to the live edge after repeated un-resumable (410-style)
failures — correctness stays owned by the checkpoint, so a watch gap costs
only freshness on this experimental stream.
- cmd: --watch-state-stream flag wiring (off by default), reusing the mirror's
write-side client.
Unit-tested with a fake watch + writer (records/bookmark-cursor/relist-reset/
lifecycle). fmt/vet/lint/test green. Product test-e2e deferred (flag off, no
behavior change).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…1 payoff)
Slice 2 of the parallel watch-state stream: now that --watch-state-stream records
a ⌚stream, periodically fold it into a watch-derived desired set and diff
that against the audit-derived set, metering the divergence. This is the question
the corpus cannot answer — does a continuous watch reproduce the same desired
manifests the audit log does, on a live cluster? It changes no Git write.
- queue: RedisTypeSplicer.SpliceWatchType folds the checkpoint with the
⌚stream entries whose recorded rv > checkpoint R (the stream uses
arrival-order auto-IDs, so the post-checkpoint slice is by the rv field,
mirroring SpliceType's exclusive "(R" fold).
- watch: compareDesiredSets diffs the two sets by identity + sanitized body
(agree / mismatch / audit-only / watch-only); a per-minute Manager loop runs it
for every serviceable type and meters the result. Wired only when
--watch-state-stream is on.
- telemetry: gitopsreverser_watch_audit_divergence{gvr,reason} gauge +
gitopsreverser_watch_audit_comparisons_total{gvr,result} counter.
Unit-tested: the fold (rv-gate + delete), the diff buckets, the Manager wiring
with fake splices, and a Valkey-backed SpliceWatchType. fmt/vet/lint/test green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/mutationlab/e2e/aggregated_api_test.go (1)
115-127: 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick winFail fast before syncing the corpus on a bad watch payload.
The
t.Errorfon Lines 115-117 still allowsh.syncCorpusto run on Lines 126-127. InMUTATIONLAB_UPDATEmode that can rewrite the golden flunder fixtures with the exact broken watch body this scenario is supposed to reject.Proposed fix
- if !added.Summary.HasObject || flunderReference(added) != "some-flunder" { - t.Errorf("watch ADDED did not carry the full flunder object (spec.reference=%q, hasObject=%v)", - flunderReference(added), added.Summary.HasObject) - } + if !added.Summary.HasObject || flunderReference(added) != "some-flunder" { + t.Fatalf("watch ADDED did not carry the full flunder object (spec.reference=%q, hasObject=%v)", + flunderReference(added), added.Summary.HasObject) + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/mutationlab/e2e/aggregated_api_test.go` around lines 115 - 127, The aggregated API test is logging a bad watch payload but still continues into h.syncCorpus, which can rewrite golden fixtures with invalid data. In aggregated_api_test.go, update the check around the watch ADDED validation so the test stops immediately on a failed flunderReference(added) or added.Summary.HasObject condition, and only call h.syncCorpus after the payload has passed validation. Keep the fix localized to the existing test flow in the aggregated_api_test.go scenario.
🧹 Nitpick comments (1)
test/mutationlab/e2e/configmap_scenarios_test.go (1)
64-65: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winRename these tests to the
TestFunctionName_Scenariopattern.These new e2e cases use bare names like
TestUpdateandTestOwnerRefCascade, which doesn't match the repo's required*_test.gonaming convention.As per coding guidelines,
**/*_test.go: Follow naming conventionTestFunctionName_Scenario(t *testing.T).Also applies to: 81-82, 103-104, 128-129, 180-181, 287-288, 344-345, 441-442
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/mutationlab/e2e/configmap_scenarios_test.go` around lines 64 - 65, Rename the e2e test functions in configmap_scenarios_test.go to follow the required TestFunctionName_Scenario pattern, using the existing test names like TestCreateSucceeds, TestUpdate, and TestOwnerRefCascade as the base and adding a clear scenario suffix where needed. Update each affected test function declaration so the names are descriptive and consistent with the repo’s testing convention, and ensure the renamed functions still match their existing harness-based bodies such as newHarness(t).Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/design/mutation-capture-lab-design.md`:
- Around line 9-12: The captured-scenarios inventory is stale in the
mutation-capture lab design doc. Update the scenario counts and row status in
the sections that describe the corpus inventory so they match the committed
corpus, including the new owner-ref cascade and conflict coverage referenced by
the existing corpus/test symbols and the sibling watch-only design doc. Also
revise the “What To Capture Next” list and the affected Difficult Cases Catalog
entries together so the totals, planned rows, and captured rows are consistent
throughout the document.
In `@internal/watch/watch_state.go`:
- Around line 141-150: The watch-state restart logic is treating every session
close as a resumable failure and clearing sinceRV too aggressively. Update
runTypeWatchStream/watchStateSession handling so only definitive
resource-version expiration (for example 410 Gone / unresumable resume points)
increments toward watchStateRelistThreshold and resets sinceRV, while ordinary
watch rollovers or errWatchStateClosed channel closures should keep the current
resume cursor and reopen from the same RV. Use the existing gvr, sinceRV,
failures, and watchStateRelistThreshold flow to distinguish these cases before
logging "watch-state stream resetting to live edge".
In `@test/mutationlab/corpus/configmap/finalizer-delete/watch.deleted.yaml`:
- Around line 10-34: The terminal DELETED snapshot for the finalizer-delete
scenario still shows the old finalizer, so update the watch.deleted.yaml fixture
to reflect the post-patch state by removing the metadata.finalizers block
entirely. Make sure the deleted event matches the effect of audit.patch.yaml on
the object, and verify the remaining metadata in watch.deleted.yaml still aligns
with the finalizer-delete flow.
In `@test/mutationlab/e2e/configmap_scenarios_test.go`:
- Around line 118-120: The dry-run verification in configmap_scenarios_test
should assert a NotFound error from the ConfigMaps().Get call instead of
treating any error as success. Update the check around
h.kube.CoreV1().ConfigMaps(s.ns).Get for "cm-dry" to explicitly validate that
the failure is due to the object not existing, so transport or auth errors do
not incorrectly pass the test.
---
Outside diff comments:
In `@test/mutationlab/e2e/aggregated_api_test.go`:
- Around line 115-127: The aggregated API test is logging a bad watch payload
but still continues into h.syncCorpus, which can rewrite golden fixtures with
invalid data. In aggregated_api_test.go, update the check around the watch ADDED
validation so the test stops immediately on a failed flunderReference(added) or
added.Summary.HasObject condition, and only call h.syncCorpus after the payload
has passed validation. Keep the fix localized to the existing test flow in the
aggregated_api_test.go scenario.
---
Nitpick comments:
In `@test/mutationlab/e2e/configmap_scenarios_test.go`:
- Around line 64-65: Rename the e2e test functions in
configmap_scenarios_test.go to follow the required TestFunctionName_Scenario
pattern, using the existing test names like TestCreateSucceeds, TestUpdate, and
TestOwnerRefCascade as the base and adding a clear scenario suffix where needed.
Update each affected test function declaration so the names are descriptive and
consistent with the repo’s testing convention, and ensure the renamed functions
still match their existing harness-based bodies such as newHarness(t).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: cb5de786-dc51-4d30-99da-7f1342cfe8bc
📒 Files selected for processing (28)
cmd/main.godocs/design/mutation-capture-lab-design.mddocs/design/watch-only-ingestion-architecture.mdinternal/mutationlab/normalize/normalize.gointernal/queue/redis_watch_stream.gointernal/watch/manager.gointernal/watch/materialization.gointernal/watch/watch_state.gointernal/watch/watch_state_test.gotest/mutationlab/README.mdtest/mutationlab/corpus/configmap/conflict-update/audit.update.yamltest/mutationlab/corpus/configmap/finalizer-delete/admission.delete.yamltest/mutationlab/corpus/configmap/finalizer-delete/admission.update.yamltest/mutationlab/corpus/configmap/finalizer-delete/audit.delete.yamltest/mutationlab/corpus/configmap/finalizer-delete/audit.patch.yamltest/mutationlab/corpus/configmap/finalizer-delete/watch.deleted.yamltest/mutationlab/corpus/configmap/finalizer-delete/watch.modified.yamltest/mutationlab/corpus/configmap/owner-ref-cascade/audit.delete.cm-child.yamltest/mutationlab/corpus/configmap/owner-ref-cascade/audit.delete.cm-parent.yamltest/mutationlab/corpus/configmap/owner-ref-cascade/watch.deleted.cm-child.yamltest/mutationlab/corpus/configmap/owner-ref-cascade/watch.deleted.cm-parent.yamltest/mutationlab/corpus/widget/crd-conversion/conversion.to-v1.yamltest/mutationlab/corpus/widget/crd-conversion/conversion.to-v2.yamltest/mutationlab/e2e/aggregated_api_test.gotest/mutationlab/e2e/configmap_scenarios_test.gotest/mutationlab/e2e/crd_conversion_test.gotest/mutationlab/e2e/harness_test.gotest/mutationlab/e2e/workload_scenarios_test.go
✅ Files skipped from review due to trivial changes (7)
- test/mutationlab/corpus/widget/crd-conversion/conversion.to-v2.yaml
- test/mutationlab/corpus/configmap/owner-ref-cascade/watch.deleted.cm-child.yaml
- test/mutationlab/corpus/configmap/finalizer-delete/audit.patch.yaml
- test/mutationlab/corpus/configmap/owner-ref-cascade/audit.delete.cm-parent.yaml
- test/mutationlab/corpus/configmap/conflict-update/audit.update.yaml
- test/mutationlab/corpus/widget/crd-conversion/conversion.to-v1.yaml
- test/mutationlab/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/mutationlab/normalize/normalize.go
- test/mutationlab/e2e/harness_test.go
| failures++ | ||
| log.V(1).Info("watch-state session ended; will re-open", | ||
| "gvr", gvr.String(), "resumeRV", sinceRV, "failures", failures, "err", err.Error()) | ||
| if failures >= watchStateRelistThreshold { | ||
| // An un-resumable resume point (e.g. 410 Gone after compaction): drop to the live edge. | ||
| // The freshness gap is healed by the next checkpoint re-anchor; this stream is never a | ||
| // correctness source. | ||
| log.V(1).Info("watch-state stream resetting to live edge", "gvr", gvr.String()) | ||
| sinceRV = "" | ||
| failures = 0 |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major
Don't drop the resume RV after ordinary watch rollovers.
watchStateSession returns errWatchStateClosed for any channel closure, and runTypeWatchStream treats this as a failure that increments the failures counter. Once failures reaches watchStateRelistThreshold, sinceRV is cleared to "". This logic fails to distinguish between a normal API server watch timeout (which is expected and resumable) and an actual resource version expiration (410 Gone). Consequently, the stream discards its resume cursor after just three healthy rollovers, forcing a restart from the live edge and silently skipping mutations that occurred during the reconnect window. Only reset sinceRV on definitive resource expiration; treat ordinary channel closes as resumable events.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/watch/watch_state.go` around lines 141 - 150, The watch-state
restart logic is treating every session close as a resumable failure and
clearing sinceRV too aggressively. Update runTypeWatchStream/watchStateSession
handling so only definitive resource-version expiration (for example 410 Gone /
unresumable resume points) increments toward watchStateRelistThreshold and
resets sinceRV, while ordinary watch rollovers or errWatchStateClosed channel
closures should keep the current resume cursor and reopen from the same RV. Use
the existing gvr, sinceRV, failures, and watchStateRelistThreshold flow to
distinguish these cases before logging "watch-state stream resetting to live
edge".
| finalizers: | ||
| - mutationlab.configbutler.ai/hold | ||
| labels: | ||
| mutationlab.configbutler.ai/scenario: finalizer-delete | ||
| managedFields: | ||
| - apiVersion: v1 | ||
| fieldsType: FieldsV1 | ||
| fieldsV1: | ||
| f:data: | ||
| .: {} | ||
| f:key: {} | ||
| f:metadata: | ||
| f:finalizers: | ||
| .: {} | ||
| v:"mutationlab.configbutler.ai/hold": {} | ||
| f:labels: | ||
| .: {} | ||
| f:mutationlab.configbutler.ai/scenario: {} | ||
| manager: e2e.test | ||
| operation: Update | ||
| time: <ts> | ||
| name: cm-hold | ||
| namespace: <ns-1> | ||
| resourceVersion: <rv-3> | ||
| uid: <uid-1> |
There was a problem hiding this comment.
🎯 Functional Correctness | 🔴 Critical
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
files=(
test/mutationlab/corpus/configmap/finalizer-delete/watch.modified.yaml
test/mutationlab/corpus/configmap/finalizer-delete/watch.deleted.yaml
test/mutationlab/corpus/configmap/finalizer-delete/admission.update.yaml
test/mutationlab/corpus/configmap/finalizer-delete/audit.patch.yaml
)
for f in "${files[@]}"; do
echo "===== $f ====="
sed -n '1,120p' "$f"
echo
done
python - <<'PY'
from pathlib import Path
import difflib
a = Path("test/mutationlab/corpus/configmap/finalizer-delete/watch.modified.yaml").read_text().splitlines()
b = Path("test/mutationlab/corpus/configmap/finalizer-delete/watch.deleted.yaml").read_text().splitlines()
print("===== diff: watch.modified.yaml vs watch.deleted.yaml =====")
for line in difflib.unified_diff(a, b, fromfile="watch.modified.yaml", tofile="watch.deleted.yaml", lineterm=""):
print(line)
PYRepository: ConfigButler/gitops-reverser
Length of output: 5661
The watch.deleted.yaml fixture is missing the finalizer removal.
Although audit.patch.yaml sends a request to clear finalizers (null), the resulting watch.deleted.yaml snapshot incorrectly retains the mutationlab.configbutler.ai/hold entry. In the finalizer-delete scenario, the terminal DELETED event must reflect the object state after the patch, which requires the finalizers array to be empty or absent.
Update watch.deleted.yaml to remove the finalizers block.
See patch request vs. current deleted state
Current audit.patch.yaml request:
requestObject:
metadata:
finalizers: nullIncorrect watch.deleted.yaml state:
metadata:
finalizers:
- mutationlab.configbutler.ai/hold🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/mutationlab/corpus/configmap/finalizer-delete/watch.deleted.yaml` around
lines 10 - 34, The terminal DELETED snapshot for the finalizer-delete scenario
still shows the old finalizer, so update the watch.deleted.yaml fixture to
reflect the post-patch state by removing the metadata.finalizers block entirely.
Make sure the deleted event matches the effect of audit.patch.yaml on the
object, and verify the remaining metadata in watch.deleted.yaml still aligns
with the finalizer-delete flow.
| if _, err := h.kube.CoreV1().ConfigMaps(s.ns).Get(ctx, "cm-dry", metav1.GetOptions{}); err == nil { | ||
| t.Error("dry-run create persisted an object; want none") | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Require NotFound here, not just “any error”.
This assertion passes on transport/auth failures too, so the test can claim “not persisted” when the lookup actually failed for an unrelated reason.
Suggested fix
- if _, err := h.kube.CoreV1().ConfigMaps(s.ns).Get(ctx, "cm-dry", metav1.GetOptions{}); err == nil {
- t.Error("dry-run create persisted an object; want none")
- }
+ if _, err := h.kube.CoreV1().ConfigMaps(s.ns).Get(ctx, "cm-dry", metav1.GetOptions{}); err == nil {
+ t.Error("dry-run create persisted an object; want none")
+ } else if !apierrors.IsNotFound(err) {
+ t.Fatalf("dry-run create lookup failed with %v; want NotFound", err)
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if _, err := h.kube.CoreV1().ConfigMaps(s.ns).Get(ctx, "cm-dry", metav1.GetOptions{}); err == nil { | |
| t.Error("dry-run create persisted an object; want none") | |
| } | |
| if _, err := h.kube.CoreV1().ConfigMaps(s.ns).Get(ctx, "cm-dry", metav1.GetOptions{}); err == nil { | |
| t.Error("dry-run create persisted an object; want none") | |
| } else if !apierrors.IsNotFound(err) { | |
| t.Fatalf("dry-run create lookup failed with %v; want NotFound", err) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/mutationlab/e2e/configmap_scenarios_test.go` around lines 118 - 120, The
dry-run verification in configmap_scenarios_test should assert a NotFound error
from the ConfigMaps().Get call instead of treating any error as success. Update
the check around h.kube.CoreV1().ConfigMaps(s.ns).Get for "cm-dry" to explicitly
validate that the failure is due to the object not existing, so transport or
auth errors do not incorrectly pass the test.
Add the watch-transport rows via a targeted /watch-probe lab endpoint so 410-Expired ERROR (row 16) and BOOKMARK (row 17) events can be scenario-attributed, with a watch_probe recorder and watch_transport_test driver. Commit the watch-resync and watch-bookmark corpus. Update the design doc + README to mark rows 8, 10, 13, 16, 17 captured (fifteen of seventeen); only server-side apply and no-op apply (rows 3, 4) remain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (3)
internal/mutationlab/labserver/api.go (1)
126-129: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winProbe validation errors get a misleading 504.
Probereturns plain errors for client-side problems (emptyscenario, unsupportedmode, missing resource), but this path maps every probe error to504 Gateway Timeout. The handler validates the resource up front yet never checksscenario/mode, so a malformed request body surfaces as a timeout rather than a400. Consider distinguishing a real timeout (ctx.Err()/context.DeadlineExceeded) from validation/transport failures.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/mutationlab/labserver/api.go` around lines 126 - 129, The probe error handling in the API handler is mapping all failures from Probe to 504 Gateway Timeout, which makes validation and transport errors look like timeouts. Update the handler around the Probe call to distinguish real deadline/timeout cases (for example via ctx.Err() or context.DeadlineExceeded) from client-side validation failures such as empty scenario, unsupported mode, or missing resource, and return a 400 for the latter. Use the existing Probe and handler logic in api.go to keep the response codes aligned with the actual failure source.internal/watch/watch_compare.go (1)
141-153: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low valueDivergence gauge can go stale for types that stop being serviceable.
recordWatchAuditDivergencere-records zeros each pass, which correctly clears a healed divergence. But the loop only records gauges for types still returned byInventory()withServiceable() == true. When a type is released/unclaimed (or removed from inventory) while diverging, its last non-zero{gvr, reason}gauge value is never reset to 0 and lingers, falsely signalling ongoing divergence for a type no longer being compared.For an experimental Phase-1 meter this is low impact, but consider emitting a final zero-out when a previously-recorded GVR drops out of the serviceable set.
Also applies to: 172-194
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/watch/watch_compare.go` around lines 141 - 153, The divergence gauge handling in compareWatchAuditAllTypes only updates currently serviceable Inventory() entries, so stale non-zero values can remain when a GVR drops out of the serviceable set. Update the watch audit flow to track previously recorded GVRs and, after iterating the current serviceable types, emit a final zero-value record for any GVR that was seen before but is no longer returned by Inventory() or Serviceable(). Apply the same cleanup behavior anywhere the watch-audit comparison loop is duplicated so recordWatchAuditDivergence no longer leaves lingering divergence metrics.internal/queue/redis_watch_splice.go (1)
73-79: 🚀 Performance & Scalability | 🔵 TrivialEnsure
MaxLenconstraint on:watch:streamto prevent unbounded read costs.
XRangeinSpliceWatchTypereads the entire stream (fromstart="-"toend="+") on every pass. WhilefoldWatchEntrydiscards old entries, the network and memory cost scales with the total event history if the stream is not trimmed.The write path (
AppendWatchEvent->xaddID) appliesMaxLenonly ifq.maxLen > 0. If the queue instance is initialized without a length limit or if the watch stream is configured independently, it will grow indefinitely.Consider either:
- Enforcing a
MaxLenorXTrimstrategy on theRedisByTypeStreamQueueinitialization for watch streams.- Alternatively, reading only from the checkpoint position in the stream key's native ID space (though this is complex with auto-IDs, making the
MaxLenapproach preferred).Relevant Code
// internal/queue/redis_watch_splice.go:73 msgs, err := s.client.XRange(ctx, base+byTypeWatchStreamSuffix, "-", "+").Result() ... // internal/queue/redis_bytype_queue.go:644-647 args := &redis.XAddArgs{Stream: stream, ID: id, Values: values} if q.maxLen > 0 { args.MaxLen = q.maxLen args.Approx = true }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/queue/redis_watch_splice.go` around lines 73 - 79, The watch-stream read path in SpliceWatchType currently scans the entire :watch:stream with XRange, so it can become unbounded unless the stream is trimmed. Update the RedisByTypeStreamQueue/watch-stream setup to enforce a MaxLen or XTrim policy for the watch stream, and ensure AppendWatchEvent/xaddID applies that limit consistently even when the queue is otherwise unbounded. Keep the fix localized around RedisByTypeStreamQueue, AppendWatchEvent, and the XRange-based splice in redis_watch_splice.go so the stream history stays bounded.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/mutationlab/labserver/api.go`:
- Around line 126-129: The probe error handling in the API handler is mapping
all failures from Probe to 504 Gateway Timeout, which makes validation and
transport errors look like timeouts. Update the handler around the Probe call to
distinguish real deadline/timeout cases (for example via ctx.Err() or
context.DeadlineExceeded) from client-side validation failures such as empty
scenario, unsupported mode, or missing resource, and return a 400 for the
latter. Use the existing Probe and handler logic in api.go to keep the response
codes aligned with the actual failure source.
In `@internal/queue/redis_watch_splice.go`:
- Around line 73-79: The watch-stream read path in SpliceWatchType currently
scans the entire :watch:stream with XRange, so it can become unbounded unless
the stream is trimmed. Update the RedisByTypeStreamQueue/watch-stream setup to
enforce a MaxLen or XTrim policy for the watch stream, and ensure
AppendWatchEvent/xaddID applies that limit consistently even when the queue is
otherwise unbounded. Keep the fix localized around RedisByTypeStreamQueue,
AppendWatchEvent, and the XRange-based splice in redis_watch_splice.go so the
stream history stays bounded.
In `@internal/watch/watch_compare.go`:
- Around line 141-153: The divergence gauge handling in
compareWatchAuditAllTypes only updates currently serviceable Inventory()
entries, so stale non-zero values can remain when a GVR drops out of the
serviceable set. Update the watch audit flow to track previously recorded GVRs
and, after iterating the current serviceable types, emit a final zero-value
record for any GVR that was seen before but is no longer returned by Inventory()
or Serviceable(). Apply the same cleanup behavior anywhere the watch-audit
comparison loop is duplicated so recordWatchAuditDivergence no longer leaves
lingering divergence metrics.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 878bbb6f-c46c-40ee-b1a5-b18c3fdbe802
📒 Files selected for processing (27)
cmd/main.gocmd/mutation-capture-lab/main.godocs/design/mutation-capture-lab-design.mddocs/design/watch-only-ingestion-architecture.mdinternal/mutationlab/labserver/api.gointernal/mutationlab/labserver/api_test.gointernal/mutationlab/normalize/normalize.gointernal/mutationlab/normalize/normalize_test.gointernal/mutationlab/recorder/recorder_test.gointernal/mutationlab/recorder/watch_probe.gointernal/queue/redis_watch_splice.gointernal/queue/redis_watch_splice_test.gointernal/telemetry/exporter.gointernal/watch/manager.gointernal/watch/watch_compare.gointernal/watch/watch_compare_test.gotest/mutationlab/README.mdtest/mutationlab/corpus/configmap/no-op-apply/admission.update.yamltest/mutationlab/corpus/configmap/no-op-apply/audit.patch.yamltest/mutationlab/corpus/configmap/server-side-apply/admission.update.yamltest/mutationlab/corpus/configmap/server-side-apply/audit.patch.yamltest/mutationlab/corpus/configmap/server-side-apply/watch.modified.yamltest/mutationlab/corpus/configmap/watch-bookmark/watch.bookmark.yamltest/mutationlab/corpus/configmap/watch-resync/watch.error.yamltest/mutationlab/e2e/configmap_scenarios_test.gotest/mutationlab/e2e/harness_test.gotest/mutationlab/e2e/watch_transport_test.go
✅ Files skipped from review due to trivial changes (7)
- test/mutationlab/corpus/configmap/watch-resync/watch.error.yaml
- test/mutationlab/corpus/configmap/watch-bookmark/watch.bookmark.yaml
- test/mutationlab/corpus/configmap/server-side-apply/admission.update.yaml
- test/mutationlab/corpus/configmap/no-op-apply/audit.patch.yaml
- test/mutationlab/corpus/configmap/server-side-apply/audit.patch.yaml
- test/mutationlab/README.md
- docs/design/watch-only-ingestion-architecture.md
🚧 Files skipped from review as they are similar to previous changes (6)
- internal/mutationlab/recorder/recorder_test.go
- cmd/mutation-capture-lab/main.go
- test/mutationlab/e2e/harness_test.go
- internal/mutationlab/normalize/normalize_test.go
- cmd/main.go
- internal/mutationlab/normalize/normalize.go
| func (m *Manager) ForgetGitTargetDeclaration(gitDest types.ResourceReference) { | ||
| m.clearTargetTypeWatermarks(gitDest) | ||
| m.forgetGitTargetWatches(gitDest) | ||
| m.forgetGitTargetUID(gitDest) |
There was a problem hiding this comment.
WARNING: forgetGitTargetUID can race with DeclareForGitTarget for recreated GitTargets
When a GitTarget is deleted and recreated with the same namespace/name, the controller calls ForgetGitTargetDeclaration with a UID-less gitDest (gittarget_controller.go:828). If DeclareForGitTarget for the new target has already stored its UID, this unconditional delete in forgetGitTargetUID wipes the new UID from the map. The new target's cursor resume then falls back to a fresh replay instead of resuming from the stored position.
Consider only deleting when the stored UID matches the one being forgotten, or skipping UID cleanup here and relying on DeclareForGitTarget overwrites plus the cursor TTL to evict stale entries.
Reply with @kilocode-bot fix it to have Kilo Code address this issue.
…nt in the writer The acceptance gate that classifies unsafe/unsupported folder content existed but was wired only into the manifest-analyzer CLI — the running operator built its store with an empty allowlist and never called Accept, so a folder with a hard-Kustomize kustomization.yaml (patches/generators/components/...), duplicate identities, impure managed files, or standalone non-KRM YAML was detected but written into anyway. This enforces the gate in the live writer and the resync (first-materialization) apply, for the cases we already know are a problem from structure alone: - manifestanalyzer: add IssueUnsupportedKustomize + RetainedDocument.Unsupported (surfaced from the existing hasUnsupportedKustomizeFeature predicate), an AcceptStructureOnly entrypoint that runs the structural refusals but never the discovery-derived mapping refusals (which can blink on a wobble), a typed AcceptanceRefusedError, and a WriterAllowlist that retains the operator's own .sops.yaml + kustomize build directives so they are never mis-refused. - git writer: run the structure-only gate over the built store in both flushEventsToWorktree (live) and applyResyncToWorktree (resync); a refusal aborts the commit before any file is touched and returns the typed error. Surfacing the refusal on GitTarget status (a Blocked stream) follows in the next commit. Design: docs/design/unsupported-folder-refusal-plan.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When the first-materialization resync is refused by the acceptance gate (an unsupported kustomization, duplicate identity, impure or non-KRM file), the watch layer now surfaces it on GitTarget status instead of only logging: - thread the (GVR, namespace) key through enqueueReplayResync to drainScopedResync so a refusal can mark the right type's stream; - in the drain, recover the typed *AcceptanceRefusedError via errors.As and mark the stream Blocked with reason UnsupportedContent and a message naming the file; a refusal is NOT counted as a generic background resync failure (nothing was written; it is user-actionable, not a transient fault); - add the StreamReasonUnsupportedContent reason. This flows through the existing data-plane surface with no schema change: status.streams.blocked rises, the StreamsReady condition goes False with reason UnsupportedContent, and phase becomes Degraded — a clean fit for the current two-axis GitTarget status. The folder is left untouched until a human cleans it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds test/e2e/unsupported_folder_e2e_test.go: seeds a GitTarget path with a hard-Kustomize kustomization.yaml (a patches block) plus the ConfigMap it references, creates the GitTarget + a ConfigMap WatchRule, and asserts the control plane is configured (Ready=True) while the data plane is blocked (StreamsReady=False, reason UnsupportedContent) and the operator commits nothing on top of the seed. - architecture.md: the Mark-and-Sweep Resync section now documents the structure-only acceptance gate and the Blocked-stream refusal, replacing the stale "unsafe YAML is left alone" claim. - e2e-coverage-gaps plan: flip Test D from "blocked on a product decision" to IMPLEMENTED. - lint: add IssueUnsupportedKustomize to the exhaustive analyzer_test map; use require.ErrorAs and wrap a long line in the git acceptance-gate test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… go True createGitTarget always enables SOPS encryption referencing the shared sops-age-key secret, so without it the GitTarget's EncryptionConfigured gate fails and Ready never becomes True — which made Test D time out on the Ready check before reaching the refusal assertion. Apply the key in BeforeAll like the other manager specs. Verified end to end against a redeployed operator: the hard-Kustomize folder is refused with StreamsReady=False/UnsupportedContent and no commit is produced. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| Expect(condition.Reason).To(Equal(ReasonProgressing)) | ||
| Expect(streamsRunning.Status).To(Equal(metav1.ConditionFalse)) | ||
| Expect(streamsRunning.Reason).To(Equal(watch.StreamReasonNoResolvedTypes)) | ||
| Expect(gitTargetReady.Status).To(Equal(metav1.ConditionFalse)) |
There was a problem hiding this comment.
WARNING: Test asserts GitTargetReady=False but controller produces Unknown for unevaluated GitTarget
The gitTargetReadyCondition function returns Unknown when the GitTarget has no status conditions (gittarget_dependency_status.go:57), which correctly signals "not yet evaluated." This test creates a GitTarget without any prior reconcile, so the WatchRule’s setGitTargetReadyCondition assigns Unknown — not False.
Change the expectation to metav1.ConditionUnknown so the test matches the actual status of a freshly created GitTarget that has not yet published a Ready condition.
Reply with @kilocode-bot fix it to have Kilo Code address this issue.
| BeforeAll(func() { | ||
| By("creating the foreign-content test namespace") | ||
| testNs = testNamespaceFor("manager-foreign-content") | ||
| _, _ = kubectlRun("create", "namespace", testNs) |
There was a problem hiding this comment.
WARNING: Error discarded without explanatory comment
kubectlRun("create", "namespace", testNs) silently ignores the returned error. Other e2e tests in this codebase annotate such calls with // idempotent; ignore AlreadyExists when the error is intentionally discarded. Without that annotation a genuine failure here (invalid name, RBAC denial, API server unavailable) will propagate into later steps and produce confusing assertion failures instead of a clear namespace-creation error.
Reply with @kilocode-bot fix it to have Kilo Code address this issue.
GitOps Reverser mirrors changes from the Kubernetes API server into clean manifests in Git. Until now, the ingestion path was audit-first: which requires configuring kube-apiserver audit delivery to GitOps Reverser. That works well on clusters where the control plane is configurable, which is not always possible (AKS, GKE etc.).
Do note that Redis (or Valkey) is still a required dependency, since we are also going to introduce HA support on short term.
This PR investigates and implements a watch-first ingestion model. Kubernetes watches now provide the source of object state, while audit events are used only for author attribution when they are available. That simplifies the code path and makes installation easier: clusters without audit webhook support can still run in committer-only mode, while clusters with audit delivery get per-user commit attribution.
Open points:
This also improves several related areas:
Fixes: #168