Watch-first ingestion with (optional!) audit-based attribution by sunib · Pull Request #174 · ConfigButler/gitops-reverser

sunib · 2026-06-24T14:20:12Z

GitOps Reverser mirrors changes from the Kubernetes API server into clean manifests in Git. Until now, the ingestion path was audit-first: which requires configuring kube-apiserver audit delivery to GitOps Reverser. That works well on clusters where the control plane is configurable, which is not always possible (AKS, GKE etc.).

Do note that Redis (or Valkey) is still a required dependency, since we are also going to introduce HA support on short term.

This PR investigates and implements a watch-first ingestion model. Kubernetes watches now provide the source of object state, while audit events are used only for author attribution when they are available. That simplifies the code path and makes installation easier: clusters without audit webhook support can still run in committer-only mode, while clusters with audit delivery get per-user commit attribution.

Open points:

Should there be an explicit opt-in or opt-out for audit events? Or do we just accept them and apply them (that's what I do now).
Fix metrics
Add missing e2e tests

This also improves several related areas:

deletecollection now becomes normal per-resource delete events through watch delivery, so we no longer need to infer affected objects from audit data.
Object bodies are no longer stored in Redis. Redis now holds attribution facts and watch cursors, reducing Secret exposure compared with the old audit body queue.
The data model is clearer: each GitTarget owns the watches it needs, instead of sharing/reusing a more complex global materialization model.
The new mutation capture lab records and compares admission, audit, and watch behavior. This gives us independent test tooling for new Kubernetes versions and future ingestion decisions.
Removes the need for apiservice-audit-proxy in aggregated API situations: there appears to be no shallow-body problem in watch events (if I would have known that earlier!).

Fixes: #168

…erate testing project to exactly get how the kube-api versions are responding to edge cases

… structure way (so that we can also use this to evalaute new (or older) Kubernetes versions

coderabbitai · 2026-06-24T14:20:39Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
✅ Review completed - (🔄 Check again to review again)

📝 Walkthrough

Walkthrough

Adds a validating admission webhook, a mutation-capture lab binary and recorders, watch-state streaming, normalization and corpus generation, and e2e scenarios with committed fixtures and deployment scripts.

Changes

Mutation capture lab and admission webhook

Layer / File(s)	Summary
Admission webhook wiring `cmd/main.go`, `cmd/main_audit_server_test.go`, `internal/webhook/`, `config/`	`cmd/main.go` adds admission-webhook flags, validation, and webhook-server wiring; manifests expose the HTTPS listener, certificate, and validating webhook configuration.
Lab record model, store, and API `internal/mutationlab/record.go`, `internal/mutationlab/store/`, `internal/mutationlab/labserver/`, `internal/mutationlab/recorder/*`	The lab record shape, in-memory store, records API, health endpoints, GVR parsing utilities, and recorder handlers are added with unit tests.
Normalization and corpus `go.mod`, `internal/mutationlab/normalize/`, `internal/mutationlab/corpus/golden.`	Normalization rewrites volatile Kubernetes fields into placeholders, and the golden corpus helpers build, compare, and write deterministic YAML snapshots.
Watch-state stream plumbing `cmd/main.go`, `internal/queue/`, `internal/watch/`, `internal/telemetry/exporter.go`	The parallel watch-state stream is wired through the main flag set, Redis stream storage, watch-manager lifecycle, materialization transitions, comparison metrics, and restart logic with tests.
Lab runtime and packaging `cmd/mutation-capture-lab/main.go`, `test/mutationlab/Dockerfile`, `test/mutationlab/swap-image.sh`, `hack/e2e/lab-port-forward.sh`, `test/e2e/Taskfile.yml`, `test/mutationlab/README.md`, `test/mutationlab/e2e/doc.go`, `test/mutationlab/corpus/CLUSTER.md`, `docs/design/*`, `test/e2e/gittarget_overlap_e2e_test.go`	The lab binary, container image, cluster swap script, port-forward helper, Taskfile targets, and design/readme docs are added for building, deploying, and running the lab.
ConfigMap scenarios and corpus `test/mutationlab/e2e/harness_test.go`, `test/mutationlab/e2e/configmap_scenarios_test.go`, `test/mutationlab/corpus/configmap/*`	The shared harness and ConfigMap scenarios cover create, update, dry-run, reject, deletecollection, conflict, finalizer-delete, and owner-ref-cascade flows with matching fixtures.
Workload, CRD, and aggregated API scenarios `test/mutationlab/e2e/aggregated_api_test.go`, `test/mutationlab/e2e/crd_conversion_test.go`, `test/mutationlab/e2e/workload_scenarios_test.go`, `test/mutationlab/corpus/deployment/`, `test/mutationlab/corpus/pod/`, `test/mutationlab/corpus/widget/`, `test/mutationlab/corpus/flunder/`	Aggregated API, CRD conversion, and workload scenarios add live tests plus the corresponding deployment, pod, widget, and flunder fixtures.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

I hop through webhooks, soft and bright,
and sniff the corpus in the light.
With watch-state streams and carrot cheer,
my whiskers twinkle, year by year.
Thump! The lab is tame and near.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.02% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description explains the goal but does not follow the required template and omits the Type of Change, Testing, and Checklist sections.	Add the template sections with selected change type(s), testing performed, checklist items, and any screenshots/notes if relevant.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title accurately summarizes the main shift toward watch-first ingestion with optional audit attribution.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch investigate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

gemini-code-assist

Code Review

This pull request introduces a validating admission webhook to the main controller and adds a new "mutation-capture lab" utility (cmd/mutation-capture-lab) along with its associated packages under internal/mutationlab/ to record and normalize Kubernetes watch, audit, and admission events into a versioned corpus. Feedback on the changes highlights a concurrency issue in the lab's shutdown sequence where multiple HTTP servers are shut down sequentially using a single shared timeout context, which could prevent subsequent servers from shutting down gracefully.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-24T14:23:19Z

+	for _, s := range servers {
+		_ = s.srv.Shutdown(shutdownCtx)
+	}


Calling Shutdown sequentially on multiple servers using a single shared timeout context (shutdownCtx) can prevent subsequent servers from shutting down gracefully if a preceding server takes a long time to shut down. If the first server takes the entire 5 seconds to shut down, the context will expire, and the remaining servers will immediately fail to shut down gracefully.

We should shut down the servers concurrently.

var shutdownWg sync.WaitGroup for _, s := range servers { shutdownWg.Add(1) go func(srv *http.Server) { defer shutdownWg.Done() _ = srv.Shutdown(shutdownCtx) }(s.srv) } shutdownWg.Wait()

coderabbitai

Actionable comments posted: 11

🧹 Nitpick comments (2)

hack/e2e/lab-port-forward.sh (1)
71-71: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Optional: rename the unused loop counter to _.

Shellcheck SC2034 flags i as unused; the loop body never references it.
♻️ Proposed tweak
-for i in {1..15}; do
+for _ in {1..15}; do
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hack/e2e/lab-port-forward.sh` at line 71, The loop counter in the
port-forward retry loop is unused, triggering Shellcheck SC2034. Update the
`for` loop in `lab-port-forward.sh` to use `_` instead of `i` since the body
never references the counter, keeping the retry behavior unchanged.
Source: Linters/SAST tools
test/mutationlab/e2e/harness_test.go (1)
232-256: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

do can hang indefinitely: http.DefaultClient has no timeout.

The 30s deadline only bounds the retry loop between attempts. A single http.DefaultClient.Do has no client-side timeout, so if the port-forward accepts the connection but stalls without responding (a plausible failure for the watchdog-restarted forward described in the comment), the call blocks forever and the loop deadline is never re-evaluated. Use a client with an explicit per-request timeout (or a request context with deadline).
♻️ Proposed fix
-func (h *harness) do(method, path string) ([]byte, int, error) {
-	var lastErr error
-	deadline := time.Now().Add(30 * time.Second)
-	for time.Now().Before(deadline) {
-		req, err := http.NewRequest(method, h.apiURL+path, nil)
-		if err != nil {
-			return nil, 0, err
-		}
-		resp, err := http.DefaultClient.Do(req)
+func (h *harness) do(method, path string) ([]byte, int, error) {
+	var lastErr error
+	client := &http.Client{Timeout: 10 * time.Second}
+	deadline := time.Now().Add(30 * time.Second)
+	for time.Now().Before(deadline) {
+		req, err := http.NewRequest(method, h.apiURL+path, nil)
+		if err != nil {
+			return nil, 0, err
+		}
+		resp, err := client.Do(req)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/mutationlab/e2e/harness_test.go` around lines 232 - 256, The harness do
method can block forever because http.DefaultClient.Do has no per-request
timeout, so the loop deadline is never checked during a stalled request. Update
do to use a client or request context with an explicit timeout, and keep the
existing retry/deadline logic in h.do so each attempt fails fast before the 30s
overall limit is reached.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/main.go`:
- Around line 708-710: The admission webhook port validation in main’s startup
config check only rejects non-positive values and still allows values above the
valid TCP range. Update the validation around cfg.admissionWebhookPort to reject
ports greater than 65535 as well, and keep the error message clear in the same
check. Also add an invalid-flag test case covering
--admission-webhook-port=65536 to ensure the new range check is enforced.

In `@config/webhook/kustomization.yaml`:
- Around line 4-5: The kustomization.yaml entry currently leaves an extra
trailing blank line, which triggers YAMLlint. Update the kustomization content
so the final listed resource/reference (the validating-webhook.yaml entry) is
the last line in the file, with no blank line after it.

In `@config/webhook/validating-webhook.yaml`:
- Line 52: The YAML file has an extra trailing blank line that triggers the
empty-lines lint error; remove the final empty line so the document ends
cleanly. Update the webhook manifest content in validating-webhook.yaml and
verify there are no stray blank lines after the last YAML entry.
- Around line 38-51: Shorten the validating webhook timeout and clean up the
YAML formatting: in the webhook configuration with the rules block and
timeoutSeconds setting, reduce timeoutSeconds from 10 to a few seconds so all
CREATE/UPDATE/DELETE requests are less likely to stall, and remove the trailing
blank line at the end of the file to satisfy the repo’s YAML lint rules.

In `@docs/design/mutation-capture-lab-design.md`:
- Around line 21-23: The deletecollection admission contract text is
inconsistent with the captured behavior and corpus references. Update the
mutation-capture design doc so the `deletecollection` row and any related
matrix/layout text consistently describe per-object validating admission calls
and per-object corpus outputs, and make sure the references in the
`deletecollection` section match the actual captured files and stack context
used by `admission.delete.collection.yaml` and the per-object
`admission.delete.cm-*` entries.
- Around line 24-27: The document has two conflicting statements about
deletecollection audit bodies, so reconcile them by making one versioned source
of truth in the design doc. Update the relevant sections around the
deletecollection discussion and the later guidance so they agree on whether
`responseObject` contains removed objects for the named cluster version, or
clearly label one as an earlier hypothesis that was superseded. Use the
`deletecollection` audit event and `responseObject` references to locate both
passages and align their wording.
- Around line 639-646: The setup description is inconsistent about how `task
lab-e2e` runs the lab: the isolated image-swap flow in `swap-image.sh` says it
reuses the already-prepared e2e cluster, but the M0/setup wording claims it
brings up its own k3d profile. Update the M0/setup text and any related wording
in the design doc to match the image-swap strategy, using the `task lab-e2e` and
`test/mutationlab/swap-image.sh` references consistently and removing any
mention of a separate cluster bring-up.

In `@internal/mutationlab/record.go`:
- Line 57: The Record.Key field currently uses omitempty on a plain ObjectKey,
which will still serialize as an empty object and mislead callers. Update the
Record struct to either remove omitempty from Key in record.go, or change Key to
*ObjectKey if it is meant to be optional; use the Record type and its Key field
as the place to make this adjustment.

In `@test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml`:
- Around line 53-56: Normalize the cluster-specific
`userInfo.extra["authentication.kubernetes.io/credential-id"]` value in the
admission/audit corpus so it stays stable across runs. Update the relevant
corpus normalization path to replace this fingerprint with a fixed placeholder,
or normalize the entire `userInfo.extra` map if that is the shared pattern, and
ensure the fixture under the admission create-succeeds case follows the same
rule.

In
`@test/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yaml`:
- Around line 60-63: The audit fixture still hardcodes a concrete
authentication.kubernetes.io/credential-id fingerprint, so update this corpus
entry to use a placeholder value instead of a real cluster-specific SHA256.
Locate the audit payload fixture under the deletecollection corpus and replace
the literal credential-id in the extra section so compare/write logic sees a
normalized, stable value.

In `@test/mutationlab/e2e/harness_test.go`:
- Around line 221-226: clearRecords currently only checks the error from
harness.do and can silently accept non-2xx responses, leaving stale records
behind. Update clearRecords in harness_test.go to validate the HTTP status
returned by h.do for the "/records" DELETE call, and fail the test unless it is
the expected http.StatusNoContent; keep the existing test helper flow and use
the clearRecords and do methods to locate the change.

---

Nitpick comments:
In `@hack/e2e/lab-port-forward.sh`:
- Line 71: The loop counter in the port-forward retry loop is unused, triggering
Shellcheck SC2034. Update the `for` loop in `lab-port-forward.sh` to use `_`
instead of `i` since the body never references the counter, keeping the retry
behavior unchanged.

In `@test/mutationlab/e2e/harness_test.go`:
- Around line 232-256: The harness do method can block forever because
http.DefaultClient.Do has no per-request timeout, so the loop deadline is never
checked during a stalled request. Update do to use a client or request context
with an explicit timeout, and keep the existing retry/deadline logic in h.do so
each attempt fails fast before the 30s overall limit is reached.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 321027e0-fe96-4101-9dba-8bdfb939a640

📥 Commits

Reviewing files that changed from the base of the PR and between 1eb59fd and ec49609.

📒 Files selected for processing (57)

cmd/main.go
cmd/main_audit_server_test.go
cmd/mutation-capture-lab/main.go
config/certs/certificates.yaml
config/deployment.yaml
config/kustomization.yaml
config/service.yaml
config/webhook/kustomization.yaml
config/webhook/validating-webhook.yaml
docs/design/mutation-capture-lab-design.md
go.mod
hack/e2e/lab-port-forward.sh
internal/mutationlab/corpus/golden.go
internal/mutationlab/corpus/golden_test.go
internal/mutationlab/labserver/api.go
internal/mutationlab/labserver/api_test.go
internal/mutationlab/labserver/resources.go
internal/mutationlab/labserver/resources_test.go
internal/mutationlab/normalize/normalize.go
internal/mutationlab/normalize/normalize_test.go
internal/mutationlab/record.go
internal/mutationlab/recorder/admission.go
internal/mutationlab/recorder/audit.go
internal/mutationlab/recorder/recorder_test.go
internal/mutationlab/recorder/scenario.go
internal/mutationlab/recorder/watch.go
internal/mutationlab/store/store.go
internal/mutationlab/store/store_test.go
internal/webhook/admission_allow_handler.go
internal/webhook/admission_allow_handler_test.go
test/e2e/Taskfile.yml
test/e2e/gittarget_overlap_e2e_test.go
test/mutationlab/Dockerfile
test/mutationlab/README.md
test/mutationlab/corpus/CLUSTER.md
test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml
test/mutationlab/corpus/configmap/create-succeeds/audit.create.yaml
test/mutationlab/corpus/configmap/create-succeeds/watch.added.yaml
test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-a.yaml
test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-b.yaml
test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-c.yaml
test/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yaml
test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-a.yaml
test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-b.yaml
test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-c.yaml
test/mutationlab/corpus/configmap/dry-run-create/admission.create.yaml
test/mutationlab/corpus/configmap/dry-run-create/audit.create.yaml
test/mutationlab/corpus/configmap/record-and-reject/admission.create.yaml
test/mutationlab/corpus/configmap/record-and-reject/audit.create.yaml
test/mutationlab/corpus/configmap/update/admission.update.yaml
test/mutationlab/corpus/configmap/update/audit.update.yaml
test/mutationlab/corpus/configmap/update/watch.modified.yaml
test/mutationlab/e2e/create_succeeds_test.go
test/mutationlab/e2e/doc.go
test/mutationlab/e2e/harness_test.go
test/mutationlab/e2e/m1_scenarios_test.go
test/mutationlab/swap-image.sh

coderabbitai · 2026-06-24T14:32:30Z

+	Source     Source          `json:"source"`
+	Scenario   string          `json:"scenario,omitempty"`
+	ObservedAt time.Time       `json:"observedAt"`
+	Key        ObjectKey       `json:"key,omitempty"`


📐 Maintainability & Code Quality | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Does Go encoding/json omitempty work on non-pointer struct fields?

💡 Result:

Yes—omitempty can omit non-pointer struct fields, but only when the field’s value is considered “empty” by encoding/json’s definition of emptiness. In encoding/json, the “omitempty” option omits a struct field from the output if the field value is an empty Go value, defined as false, 0, a nil pointer, a nil interface value, and any array/slice/map/string of length zero [1]. A struct value is not inherently an “empty value” under that definition, so a non-pointer struct field will usually still be emitted. Concrete implications for non-pointer struct fields: 1) If the field is a non-pointer struct type (e.g., a nested struct), encoding/json does not treat it as empty just because its subfields are zero; omitempty generally does not omit it. This is why time.Time (a non-pointer struct) does not get omitted with omitempty in many cases—zero time.Time is still a struct value, and the encoder can’t decide it’s empty based solely on subfields [2][3]. 2) However, if the struct field is actually of a type that can be treated as empty by encoding/json’s rules (for example, via nil being possible only for pointers/interfaces, not plain struct values), then omitempty can omit it—but for plain non-pointer struct values, that usually doesn’t apply. Recommended pattern: use pointers for optional nested structs/values, so the field can be nil and thus “empty” to encoding/json’s omitempty logic [4][5]. Alternative (Go 1.24+): use the “omitzero” tag, which is designed to omit struct fields when their value is the zero value (and can use an IsZero method), allowing non-pointer types like time.Time to be omitted when they’re zero [1][6].

Citations:

1: https://pkg.go.dev/encoding/json

2: https://cyrillschumacher.com/2016/02/01/go-encoding/jsonxml-omitempty-with-non-pointer-structs/

3: https://groups.google.com/g/golang-nuts/c/CaF9dlrC24Y

4: https://www.sohamkamani.com/golang/omitempty/

5: https://golangbot.com/omitzero-go/

6: encoding/json: add omitzero option golang/go#45669

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the relevant files and definitions. git ls-files internal/mutationlab/record.go internal/mutationlab | sed -n '1,120p' echo '--- record.go ---' cat -n internal/mutationlab/record.go echo '--- ObjectKey search ---' rg -n "type ObjectKey|ObjectKey struct|ObjectKey" internal -g '!**/*_test.go'

Repository: ConfigButler/gitops-reverser

Length of output: 7786

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the relevant files and definitions. git ls-files internal/mutationlab/record.go internal/mutationlab | sed -n '1,120p' echo '--- record.go ---' cat -n internal/mutationlab/record.go echo '--- ObjectKey search ---' rg -n "type ObjectKey|ObjectKey struct|ObjectKey" internal -g '!**/*_test.go'

Repository: ConfigButler/gitops-reverser

Length of output: 7786

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the relevant files and definitions. git ls-files internal/mutationlab/record.go internal/mutationlab | sed -n '1,120p' echo '--- record.go ---' cat -n internal/mutationlab/record.go echo '--- ObjectKey search ---' rg -n "type ObjectKey|ObjectKey struct|ObjectKey" internal -g '!**/*_test.go'

Repository: ConfigButler/gitops-reverser

Length of output: 7786

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the relevant file and the ObjectKey type before rewriting the review comment. echo '--- internal/mutationlab/record.go ---' cat -n internal/mutationlab/record.go echo '--- ObjectKey definition(s) ---' rg -n "type ObjectKey|ObjectKey struct|type .*ObjectKey" internal -g '!**/*_test.go' echo '--- surrounding matches ---' rg -n -A3 -B3 "ObjectKey" internal/mutationlab internal -g '!**/*_test.go'

Repository: ConfigButler/gitops-reverser

Length of output: 21090

Remove omitempty from Record.Key

encoding/json does not treat a plain struct field as empty, so Key will still be emitted as "key": {} when all of its fields are zero. If the field should be optional, make it *ObjectKey; otherwise drop omitempty to avoid misleading behavior.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/mutationlab/record.go` at line 57, The Record.Key field currently uses omitempty on a plain ObjectKey, which will still serialize as an empty object and mislead callers. Update the Record struct to either remove omitempty from Key in record.go, or change Key to *ObjectKey if it is meant to be optional; use the Record type and its Key field as the place to make this adjustment.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

internal/mutationlab/normalize/normalize.go (2)
398-420: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Don't rewrite namespaces with global substring replacement.

replaceNamespaces rewrites every matching substring in the URI/key, not just the namespace field/segment. A path like /api/v1/namespaces/lab/configmaps/lab would normalize both lab occurrences, corrupting the object name as well as the namespace. This should replace only the namespace segment / JSON value, not every textual match.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/mutationlab/normalize/normalize.go` around lines 398 - 420, The
namespace normalization in replaceNamespaces is too broad because it does a
global substring replace, which can rewrite unrelated occurrences like object
names inside paths or keys. Update replaceNamespaces and its call sites such as
rewriteKey so namespaces are replaced only in the actual namespace segment or
JSON value being normalized, not every matching substring; keep the placeholder
logic in indices/nsByLen/ns and preserve the IP rewriting behavior in
rewriteKey.
201-220: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Normalize authentication.kubernetes.io/credential-id before writing corpus.

The array path here only rewrites sourceIPs, so user.extra["authentication.kubernetes.io/credential-id"] survives verbatim. The new corpus already commits raw fingerprint values in test/mutationlab/corpus/deployment/scale-patch/audit.patch.yaml, Lines 41-42, and test/mutationlab/corpus/pod/graceful-delete/admission.delete.yaml, Lines 210-212. A rotated admin cert or fresh cluster bootstrap will change those values and fail corpus comparison even when behavior is unchanged.

Also applies to: 344-357
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/mutationlab/normalize/normalize.go` around lines 201 - 220, The
corpus normalization in collector.collectScalar only special-cases sourceIPs, so
authentication.kubernetes.io/credential-id is still written verbatim and causes
unstable corpus diffs. Add normalization for that key in collectScalar, ideally
by rewriting its values to a deterministic token before they reach c.orderedFor,
and keep the handling scoped to the existing collector logic so user.extra
values are normalized consistently across corpus generation.

🧹 Nitpick comments (1)

test/mutationlab/e2e/m2_scenarios_test.go (1)
90-90: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Rename these tests to the repo's TestFunctionName_Scenario form.

TestStatusSubresource, TestScaleSubresource, and TestGracefulDelete don't follow the *_test.go naming convention the repo documents. TestStatusSubresource_WatchOnly, etc., would align with the rest of the suite. As per coding guidelines, "Follow naming convention 'TestFunctionName_Scenario(t *testing.T)' for Go tests".

Also applies to: 127-127, 190-190
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/mutationlab/e2e/m2_scenarios_test.go` at line 90, Rename the Go test
functions to match the repo’s TestFunctionName_Scenario convention so they align
with the rest of the suite. Update TestStatusSubresource, TestScaleSubresource,
and TestGracefulDelete in m2_scenarios_test.go to scenario-style names such as
TestStatusSubresource_WatchOnly, and keep the rest of each test body unchanged.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/mutation-capture-lab-design.md`:
- Around line 280-281: The matrix row about `matchPolicy: Equivalent` overstates
that the webhook always sees the submitted version; update the wording in the
design note to reflect that the apiserver may send a converted object while
`requestKind` and `requestResource` preserve the original version, or change the
policy mention to `Exact` if the intent is to require the raw submitted version.
Use the existing `matchPolicy`, `requestKind`, `requestResource`, and
`apiVersions: ['*']` references in the same section to keep the statement
aligned with how the recorder observes the admission request.

In `@test/mutationlab/e2e/m2_scenarios_test.go`:
- Around line 171-172: The gracefulPod scenario uses the mutable busybox:1.36
tag, which can change the recorded imageID and drift the golden corpus. Update
the pod spec in the gracefulPod setup to pin the image by digest, or adjust the
normalization/corpus generation around the pod status imageID so it is scrubbed
consistently; use the gracefulPod test case and the image field in
m2_scenarios_test.go as the place to fix it.

---

Outside diff comments:
In `@internal/mutationlab/normalize/normalize.go`:
- Around line 398-420: The namespace normalization in replaceNamespaces is too
broad because it does a global substring replace, which can rewrite unrelated
occurrences like object names inside paths or keys. Update replaceNamespaces and
its call sites such as rewriteKey so namespaces are replaced only in the actual
namespace segment or JSON value being normalized, not every matching substring;
keep the placeholder logic in indices/nsByLen/ns and preserve the IP rewriting
behavior in rewriteKey.
- Around line 201-220: The corpus normalization in collector.collectScalar only
special-cases sourceIPs, so authentication.kubernetes.io/credential-id is still
written verbatim and causes unstable corpus diffs. Add normalization for that
key in collectScalar, ideally by rewriting its values to a deterministic token
before they reach c.orderedFor, and keep the handling scoped to the existing
collector logic so user.extra values are normalized consistently across corpus
generation.

---

Nitpick comments:
In `@test/mutationlab/e2e/m2_scenarios_test.go`:
- Line 90: Rename the Go test functions to match the repo’s
TestFunctionName_Scenario convention so they align with the rest of the suite.
Update TestStatusSubresource, TestScaleSubresource, and TestGracefulDelete in
m2_scenarios_test.go to scenario-style names such as
TestStatusSubresource_WatchOnly, and keep the rest of each test body unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 56f06b1b-0d31-4cf8-a9c0-4822cbb804bb

📥 Commits

Reviewing files that changed from the base of the PR and between ec49609 and 876ff60.

📒 Files selected for processing (31)

docs/design/mutation-capture-lab-design.md
internal/mutationlab/normalize/normalize.go
internal/mutationlab/normalize/normalize_test.go
test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml
test/mutationlab/corpus/configmap/create-succeeds/audit.create.yaml
test/mutationlab/corpus/configmap/create-succeeds/watch.added.yaml
test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-a.yaml
test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-b.yaml
test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-c.yaml
test/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yaml
test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-a.yaml
test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-b.yaml
test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-c.yaml
test/mutationlab/corpus/configmap/dry-run-create/admission.create.yaml
test/mutationlab/corpus/configmap/dry-run-create/audit.create.yaml
test/mutationlab/corpus/configmap/record-and-reject/admission.create.yaml
test/mutationlab/corpus/configmap/record-and-reject/audit.create.yaml
test/mutationlab/corpus/configmap/update/admission.update.yaml
test/mutationlab/corpus/configmap/update/audit.update.yaml
test/mutationlab/corpus/configmap/update/watch.modified.yaml
test/mutationlab/corpus/deployment/scale-patch/audit.patch.yaml
test/mutationlab/corpus/deployment/scale-patch/watch.modified.1.yaml
test/mutationlab/corpus/deployment/scale-patch/watch.modified.2.yaml
test/mutationlab/corpus/deployment/status-update/watch.modified.1.yaml
test/mutationlab/corpus/deployment/status-update/watch.modified.2.yaml
test/mutationlab/corpus/pod/graceful-delete/admission.delete.yaml
test/mutationlab/corpus/pod/graceful-delete/watch.deleted.yaml
test/mutationlab/corpus/pod/graceful-delete/watch.modified.yaml
test/mutationlab/e2e/harness_test.go
test/mutationlab/e2e/m2_scenarios_test.go
test/mutationlab/swap-image.sh

✅ Files skipped from review due to trivial changes (21)

test/mutationlab/corpus/configmap/update/watch.modified.yaml
test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-b.yaml
test/mutationlab/corpus/deployment/scale-patch/watch.modified.1.yaml
test/mutationlab/corpus/pod/graceful-delete/watch.deleted.yaml
test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-a.yaml
test/mutationlab/corpus/configmap/record-and-reject/admission.create.yaml
test/mutationlab/corpus/configmap/create-succeeds/watch.added.yaml
test/mutationlab/corpus/configmap/dry-run-create/audit.create.yaml
test/mutationlab/corpus/configmap/update/audit.update.yaml
test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-b.yaml
test/mutationlab/corpus/pod/graceful-delete/watch.modified.yaml
test/mutationlab/corpus/deployment/status-update/watch.modified.2.yaml
test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-c.yaml
test/mutationlab/corpus/configmap/dry-run-create/admission.create.yaml
test/mutationlab/corpus/configmap/record-and-reject/audit.create.yaml
test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml
test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-a.yaml
test/mutationlab/corpus/deployment/scale-patch/watch.modified.2.yaml
test/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yaml
test/mutationlab/corpus/deployment/status-update/watch.modified.1.yaml
test/mutationlab/corpus/configmap/create-succeeds/audit.create.yaml

🚧 Files skipped from review as they are similar to previous changes (4)

test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-c.yaml
test/mutationlab/corpus/configmap/update/admission.update.yaml
test/mutationlab/swap-image.sh
test/mutationlab/e2e/harness_test.go

coderabbitai · 2026-06-24T20:44:36Z

+				Image:   "busybox:1.36",
+				Command: []string{"sleep", "3600"},


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Pin this Pod image by digest or scrub imageID from normalization.

The committed corpus for this scenario preserves status.containerStatuses[].imageID in test/mutationlab/corpus/pod/graceful-delete/admission.delete.yaml, Lines 162-164, but gracefulPod pulls busybox:1.36 by mutable tag. If that tag is republished or resolves to a different digest, the golden corpus will drift even though the control-plane behavior under test did not change.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/mutationlab/e2e/m2_scenarios_test.go` around lines 171 - 172, The gracefulPod scenario uses the mutable busybox:1.36 tag, which can change the recorded imageID and drift the golden corpus. Update the pod spec in the gracefulPod setup to pin the image by digest, or adjust the normalization/corpus generation around the pod status imageID so it is scrubbed consistently; use the gracefulPod test case and the image field in m2_scenarios_test.go as the place to fix it.

kilo-code-bot · 2026-06-24T21:19:49Z

Code Review Summary

Status: No Issues Found | Recommendation: Merge

No new issues were found in the incremental changes. The changes in this PR are cleanup operations:

Removal of the obsolete ServiceAccountNamingPolicy (SANamePolicyBot) from the author resolver
Removal of AttributionServiceAccountCollapsed constant
Removal of legacy late nudge code from materialization.go
Removal of materializer fields from manager.go
Documentation updates for watch-first architecture

All previously identified issues have been addressed in prior commits.

Previous Review Summaries (16 snapshots, latest commit 1c61666)

Current summary above is authoritative. Previous snapshots are kept for context only.

Previous review (commit `1c61666`)

Status: 1 Issue Found | Recommendation: Address before merge

Overview

Severity	Count
WARNING	1

Issue Details (click to expand)

WARNING

File	Line	Issue
`test/e2e/foreign_content_e2e_test.go`	55	Error discarded without explanatory comment

Files Reviewed (24 files)

docs/design/gitpath-foreign-content-stringency.md - no issues
internal/controller/gittarget_controller.go - no issues
internal/controller/gittarget_kstatus_test.go - no issues
internal/controller/stream_status.go - no issues
internal/git/bootstrapped-repo-template/.gittargetignore - no issues
internal/git/bootstrapped_repo_template.go - no issues
internal/git/branch_worker_split_test.go - no issues
internal/git/branch_worker_test.go - no issues
internal/git/git_operations_test.go - no issues
internal/git/gittargetignore_writer_test.go - no issues
internal/git/plan_flush.go - no issues
internal/git/resync_flush.go - no issues
internal/manifestanalyzer/acceptance.go - no issues
internal/manifestanalyzer/acceptance_refusal.go - no issues
internal/manifestanalyzer/analyzer.go - no issues
internal/manifestanalyzer/analyzer_test.go - no issues
internal/manifestanalyzer/gittargetignore.go - no issues
internal/manifestanalyzer/gittargetignore_test.go - no issues
internal/manifestanalyzer/scan.go - no issues
internal/manifestanalyzer/store.go - no issues
internal/watch/event_router.go - no issues
internal/watch/event_router_test.go - no issues
test/e2e/foreign_content_e2e_test.go - 1 issue
test/e2e/watchrule_configmap_secret_e2e_test.go - no issues

Fix these issues in Kilo Cloud

Previous review (commit `12f3aa2`)

Status: 1 Issue Found | Recommendation: Address before merge

Overview

Severity	Count
WARNING	1

Issue Details (click to expand)

WARNING

File	Line	Issue
`internal/controller/watchrule_controller_test.go`	285	Test asserts `GitTargetReady=False` but controller produces `Unknown` for unevaluated GitTarget

Files Reviewed (32 files)

api/v1alpha2/clusterwatchrule_types.go - API additions
api/v1alpha2/gittarget_types.go - Phase removal, GitPathAccepted added
api/v1alpha2/watchrule_types.go - GitTargetReady printer column
config/crd/bases/configbutler.ai_clusterwatchrules.yaml - CRD update
config/crd/bases/configbutler.ai_gittargets.yaml - CRD update
config/crd/bases/configbutler.ai_watchrules.yaml - CRD update
docs/architecture.md - Watch-first and condition docs
docs/configuration.md - Updated status docs
docs/design/e2e-coverage-gaps-and-improvements-plan.md - Test D marked implemented
docs/design/gitpath-foreign-content-stringency.md - New design doc
docs/design/status-conditions-guide.md - Status conditions updated
docs/design/unsupported-folder-refusal-plan.md - Refusal plan revised
internal/controller/clusterwatchrule_controller.go - GitTargetReady condition
internal/controller/clusterwatchrule_controller_test.go - GitTargetReady assertions
internal/controller/constants.go - New condition types
internal/controller/gittarget_controller.go - Folder→Path rename, phase removal
internal/controller/gittarget_controller_test.go - Phase assertion removed
internal/controller/gittarget_dependency_status.go - New GitTargetReady mirror
internal/controller/gittarget_kstatus_test.go - GitPathAccepted test updates
internal/controller/gittarget_status_test.go - Removed phase, folder→path
internal/controller/stream_status.go - GitTargetReady stall logic
internal/controller/stream_status_test.go - New test for GitTargetReady stall
internal/controller/watchrule_controller.go - GitTargetReady condition
internal/controller/watchrule_controller_test.go - GitTargetReady assertion bug
internal/manifestanalyzer/acceptance_refusal.go - folder→path in messages
internal/watch/event_router.go - folder→path rename
internal/watch/event_router_test.go - folder→path rename
internal/watch/git_path_acceptance.go - New file replacing folder_acceptance.go
internal/watch/manager.go - targetGitPathAcceptance field
internal/watch/target_watch.go - dropTargetGitPathAcceptanceLocked
test/e2e/unsupported_folder_e2e_test.go - ClusterWatchRule and GitPathAccepted checks

Fix these issues in Kilo Cloud

Previous review (commit `92fa490`)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (31 files)

api/v1alpha2/clusterwatchrule_types.go - API type definitions
api/v1alpha2/gittarget_types.go - API type definitions
api/v1alpha2/watchrule_types.go - API type definitions
config/crd/bases/configbutler.ai_clusterwatchrules.yaml - Generated CRD
config/crd/bases/configbutler.ai_gittargets.yaml - Generated CRD
config/crd/bases/configbutler.ai_watchrules.yaml - Generated CRD
docs/architecture.md - Documentation update for structure-only acceptance gate
docs/configuration.md - Configuration documentation
docs/design/e2e-coverage-gaps-and-improvements-plan.md - Documentation update marking Test D as implemented
docs/design/status-conditions-guide.md - Status conditions design
docs/design/unsupported-folder-refusal-plan.md - Unsupported folder refusal plan
go.mod - Dependency updates
go.sum - Dependency updates
internal/controller/clusterwatchrule_controller.go - Controller logic
internal/controller/clusterwatchrule_controller_test.go - Controller tests
internal/controller/condition_helper.go - Condition helper functions
internal/controller/constants.go - Shared constants
internal/controller/gittarget_controller.go - GitTarget controller
internal/controller/gittarget_controller_test.go - Controller tests
internal/controller/gittarget_kstatus_test.go - KStatus tests
internal/controller/gittarget_status_test.go - Status tests
internal/controller/stream_status.go - Stream status logic
internal/controller/watchrule_controller.go - WatchRule controller
internal/controller/watchrule_controller_test.go - Controller tests
internal/git/acceptance_gate_test.go - Acceptance gate tests
internal/git/resync_flush.go - Resync flush logic
internal/watch/event_router.go - Event router
internal/watch/event_router_test.go - Event router tests
internal/watch/folder_acceptance.go - Folder acceptance status
internal/watch/manager.go - Watch manager (added targetFolderAcceptance field)
internal/watch/stream_readiness.go - Stream readiness (removed UnsupportedContent constant)
internal/watch/stream_readiness_test.go - Updated to use StreamsRunning()
internal/watch/target_watch.go - Target watch with forceRecheck parameter
test/e2e/*.go - Multiple e2e test files updated

Previous review (commit `f2773a8`)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (5 files)

docs/architecture.md - Documentation update for structure-only acceptance gate
docs/design/e2e-coverage-gaps-and-improvements-plan.md - Documentation update marking Test D as implemented
internal/git/acceptance_gate_test.go - Test improvements (require.ErrorAs, variable extraction)
internal/manifestanalyzer/analyzer_test.go - Added IssueUnsupportedKustomize to exhaustive test map
test/e2e/unsupported_folder_e2e_test.go - New e2e test for unsupported folder refusal

Previous review (commit `893e17f`)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (5 files)

docs/architecture.md - Documentation update for structure-only acceptance gate
docs/design/e2e-coverage-gaps-and-improvements-plan.md - Documentation update marking Test D as implemented
internal/git/acceptance_gate_test.go - Test improvements (require.ErrorAs, variable extraction)
internal/manifestanalyzer/analyzer_test.go - Added IssueUnsupportedKustomize to exhaustive test map
test/e2e/unsupported_folder_e2e_test.go - New e2e test for unsupported folder refusal

Previous review (commit `d09ab73`)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (5 files)

docs/architecture.md - Documentation update for structure-only acceptance gate
docs/design/e2e-coverage-gaps-and-improvements-plan.md - Documentation update marking Test D as implemented
internal/git/acceptance_gate_test.go - Test improvements (require.ErrorAs, variable extraction)
internal/manifestanalyzer/analyzer_test.go - Added IssueUnsupportedKustomize to exhaustive test map
test/e2e/unsupported_folder_e2e_test.go - New e2e test for unsupported folder refusal

Previous review (commit `cb8d4b0`)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (11 files)

internal/git/acceptance_gate_test.go - Tests for unsupported kustomization folder refusal
internal/git/plan_flush.go - Added structure-only acceptance gate to live writer
internal/git/resync_flush.go - Added structure-only acceptance gate to resync apply
internal/manifestanalyzer/acceptance.go - Added AcceptStructureOnly, WriterAllowlist, unsupported-kustomize refusals
internal/manifestanalyzer/acceptance_refusal.go - Added AcceptanceRefusedError for writer-facing refusal
internal/manifestanalyzer/acceptance_test.go - Tests for new structure-only acceptance and refusal
internal/manifestanalyzer/store.go - Added Unsupported flag on RetainedDocument, kustomizationUsesUnsupportedFeature
internal/watch/event_router.go - Threaded targetWatchKey through drainScopedResync, added handleScopedResyncError
internal/watch/event_router_test.go - Test for refusal marking stream Blocked
internal/watch/stream_readiness.go - Added StreamReasonUnsupportedContent
internal/watch/target_watch.go - Threaded targetWatchKey through enqueueReplayResync

Previous review (commit `915b524`)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (14 files)

docs/interpreting-metrics.md - Documentation updates for new attribution metrics
docs/markdown-triage-inventory.md - Minor path reference fix
internal/git/branch_worker.go - Added author kind labeling for commits
internal/git/branch_worker_metrics_test.go - Tests for author kind labeling
internal/git/pending_writes.go - Added createdCommit() and authorKind() helpers
internal/git/resync_flush.go - Added resync sweep delete metric
internal/git/resync_flush_test.go - Tests for resync sweep delete metric
internal/queue/attribution_index.go - Added AttributionResult types and resolution tracking
internal/queue/attribution_index_test.go - Tests for resolution outcomes
internal/telemetry/exporter.go - Added new metric declarations
internal/telemetry/exporter_test.go - Tests for new metrics
internal/watch/author_resolver.go - Updated to use AuthorResolution interface
internal/watch/author_resolver_test.go - Updated tests for resolution interface
internal/watch/target_watch_test.go - Updated test mocks

Previous review (commit `04aa391`)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (8 files)

config/webhook/kustomization.yaml
config/webhook/validating-webhook.yaml
docs/interpreting-metrics.md
docs/markdown-triage-inventory.md
internal/git/branch_worker.go - queue-full error propagation implemented
internal/reconcile/git_target_event_stream.go - OnWatchEvent returns error on queue full
internal/reconcile/git_target_event_stream_test.go - tests updated for new error return
internal/reconcile/gittarget_lifecycle_integration_test.go - tests updated for goroutine-safe assertions
internal/telemetry/exporter.go - removed unused metrics (GitPushDurationSeconds, ObjectsScannedTotal, etc.)
internal/telemetry/exporter_test.go - tests updated for removed metrics
internal/watch/event_router.go - error handling for RouteToGitTargetEventStream and RouteEvent
internal/watch/target_watch_test.go - recordingEnqueuer.Enqueue returns bool

Previous review (commit `acaea33`)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (11 files)

docs/design/e2e-serial-registry.md
test/e2e/aggregated_apiserver_e2e_test.go
test/e2e/bi_directional_e2e_test.go
test/e2e/commit_window_batching_e2e_test.go
test/e2e/crd_lifecycle_e2e_test.go
test/e2e/gittarget_isolation_e2e_test.go
test/e2e/inplace_edit_e2e_test.go
test/e2e/quickstart_framework_e2e_test.go
test/e2e/repo_assertions_test.go
test/e2e/signing_e2e_test.go
test/e2e/watchrule_configmap_secret_e2e_test.go

Previous review (commit `acf73d5`)

Status: 1 Issue Found | Recommendation: Address before merge

Overview

Severity	Count
WARNING	1

Issue Details (click to expand)

WARNING

File	Line	Issue
`internal/watch/materialization.go`	97	`ForgetGitTargetDeclaration` unconditionally deletes the UID map entry, which can race with `DeclareForGitTarget` for recreated GitTargets with the same namespace/name

Files Reviewed (16 files)

docs/design/stream/per-type-streaming-readiness-plan.md
docs/design/stream/streaming-readiness-status-machine-design.md
docs/design/stream/watch-replay-watermark-stream-readiness-investigation.md
internal/mutationlab/recorder/recorder_test.go
internal/mutationlab/recorder/watch_probe.go
internal/queue/attribution_index.go
internal/queue/attribution_index_test.go
internal/watch/author_resolver.go
internal/watch/manager.go
internal/watch/materialization.go - 1 issue
internal/watch/target_watch.go
internal/watch/target_watch_test.go
test/mutationlab/corpus/configmap/watch-replay-collapse/watch.added.yaml
test/mutationlab/corpus/configmap/watch-replay-collapse/watch.bookmark.yaml
test/mutationlab/e2e/watch_transport_test.go

Fix these issues in Kilo Cloud

Previous review (commit `d7bdb16`)

Status: 1 Issue Found | Recommendation: Address before merge

Overview

Severity	Count
WARNING	1

Issue Details (click to expand)

WARNING

File	Line	Issue
`internal/watch/materialization.go`	97	`ForgetGitTargetDeclaration` unconditionally deletes the UID map entry, which can race with `DeclareForGitTarget` for recreated GitTargets with the same namespace/name

Files Reviewed (16 files)

docs/design/stream/per-type-streaming-readiness-plan.md
docs/design/stream/streaming-readiness-status-machine-design.md
docs/design/stream/watch-replay-watermark-stream-readiness-investigation.md
internal/mutationlab/recorder/recorder_test.go
internal/mutationlab/recorder/watch_probe.go
internal/queue/attribution_index.go
internal/queue/attribution_index_test.go
internal/watch/author_resolver.go
internal/watch/manager.go
internal/watch/materialization.go - 1 issue
internal/watch/target_watch.go
internal/watch/target_watch_test.go
test/mutationlab/corpus/configmap/watch-replay-collapse/watch.added.yaml
test/mutationlab/corpus/configmap/watch-replay-collapse/watch.bookmark.yaml
test/mutationlab/e2e/watch_transport_test.go

Fix these issues in Kilo Cloud

Previous review (commit `7e011db`)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (16 files)

charts/gitops-reverser/README.md
charts/gitops-reverser/templates/deployment.yaml
charts/gitops-reverser/values.yaml
cmd/main.go
cmd/main_audit_server_test.go
docs/architecture.md
docs/configuration.md
internal/controller/gittarget_controller.go
internal/git/worker_manager.go
internal/queue/attribution_index.go
internal/queue/attribution_index_test.go
internal/types/reference.go
internal/watch/author_resolver.go
internal/watch/target_watch.go
internal/watch/target_watch_test.go
test/e2e/tilt_playground_e2e_test.go

Previous review (commit `b524d83`)

Status: No Issues Found | Recommendation: Merge

Overview

Severity	Count
CRITICAL	0
WARNING	0
SUGGESTION	0

Incremental Review (`b2d5bc7`..`79a9fe6`)

No new code issues were identified in this incremental diff. The changes consist of documentation updates aligning with the watch-first architecture:

docs/architecture.md: Major restructuring to describe watch as the sole object-state source; audit is now purely an optional attribution lookup
docs/configuration.md: Updated audit/attribution settings, removing auditEventJoin and maxLen references, adding attribution.* flags
docs/future/ha-gittarget-distribution-plan.md: Added reconciliation note for watch-first changes
charts/: README and deployment template updates for new configuration structure

The incremental diff shows no logic changes in executable code paths — only documentation and configuration alignment.

Files Reviewed (16 files)

docs/architecture.md - architecture documentation, no code
docs/configuration.md - configuration documentation, no code
docs/future/ha-gittarget-distribution-plan.md - design doc, no code
charts/gitops-reverser/README.md - Helm chart docs
charts/gitops-reverser/values.yaml - Helm values
charts/gitops-reverser/templates/deployment.yaml - Helm templates
charts/gitops-reverser/templates/NOTES.txt - Helm templates
config/deployment.yaml - Deployment manifest
cmd/main.go - No functional changes in this diff range
cmd/main_audit_server_test.go - No functional changes in this diff range
cmd/readiness.go - No functional changes
test/e2e/controller_basics_e2e_test.go - No functional changes
test/e2e/e2e_suite_test.go - No functional changes
test/e2e/helpers.go - No functional changes
test/e2e/setup/flux/kustomization.yaml - Flux setup
test/e2e/setup/flux/namespaces/aggregated-api.yaml - deleted (aggregated-api removal)
test/e2e/setup/flux/releases/aggregated-api.yaml - deleted (aggregated-api removal)

Previous review (commit `79a9fe6`)

Status: No Issues Found | Recommendation: Merge

Overview

Severity	Count
CRITICAL	0
WARNING	0
SUGGESTION	0

Incremental Review (`b2d5bc7`..`79a9fe6`)

No new code issues were identified in this incremental diff. The changes consist of documentation updates aligning with the watch-first architecture:

docs/architecture.md: Major restructuring to describe watch as the sole object-state source; audit is now purely an optional attribution lookup
docs/configuration.md: Updated audit/attribution settings, removing auditEventJoin and maxLen references, adding attribution.* flags
docs/future/ha-gittarget-distribution-plan.md: Added reconciliation note for watch-first changes
charts/: README and deployment template updates for new configuration structure

The incremental diff shows no logic changes in executable code paths — only documentation and configuration alignment.

Files Reviewed (16 files)

docs/architecture.md - architecture documentation, no code
docs/configuration.md - configuration documentation, no code
docs/future/ha-gittarget-distribution-plan.md - design doc, no code
charts/gitops-reverser/README.md - Helm chart docs
charts/gitops-reverser/values.yaml - Helm values
charts/gitops-reverser/templates/deployment.yaml - Helm templates
charts/gitops-reverser/templates/NOTES.txt - Helm templates
config/deployment.yaml - Deployment manifest
cmd/main.go - No functional changes in this diff range
cmd/main_audit_server_test.go - No functional changes in this diff range
cmd/readiness.go - No functional changes
test/e2e/controller_basics_e2e_test.go - No functional changes
test/e2e/e2e_suite_test.go - No functional changes
test/e2e/helpers.go - No functional changes
test/e2e/setup/flux/kustomization.yaml - Flux setup
test/e2e/setup/flux/namespaces/aggregated-api.yaml - deleted (aggregated-api removal)
test/e2e/setup/flux/releases/aggregated-api.yaml - deleted (aggregated-api removal)

Previous review (commit `b2d5bc7`)

Status: Active Issues Found | Recommendation: Review existing CodeRabbit comments

Overview

Severity	Count
CRITICAL	0
WARNING	5
SUGGESTION	1

Incremental Review (`e0b6369`..HEAD)

No new code issues were identified in this incremental diff. The changes consist mainly of documentation restructuring and test-file consolidation (M1→configmap_scenarios_test.go, M2→workload_scenarios_test.go, M3→crd_conversion_test.go, M4→aggregated_api_test.go), the addition of rvLess for deterministic numeric resourceVersion ordering, and an update to sourceRank() to include SourceConversion.

Previously identified issues on unchanged lines remain active (see Unresolved Findings below). The following issues from prior reviews that fell within the incremental diff are now resolved:

Docs inconsistencies (docs/design/mutation-capture-lab-design.md): the deletecollection admission-contract inconsistency, conflicting audit-body claim, and M0 setup-text conflict are all fixed by the restructuring.
Flaky E2E corpus (test/mutationlab/e2e/aggregated_api_test.go): the proxy-enriched audit event (audit-additional) is now explicitly required and always committed, removing the conditional-appending path that caused stray corpus files failures.

[Snapshot truncated.]

Additional previous summary content was truncated to keep this comment within platform limits.

_{Reviewed by laguna-m.1-20260312:free · Input: 533K · Output: 23.4K · Cached: 8.1M}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

internal/mutationlab/recorder/conversion_test.go (1)
30-65: 🎯 Functional Correctness | 🔵 Trivial | 💤 Low value

Optional: add a large-sizeBytes case.

Both directions only test 1024/2048, which round-trip cleanly through float64. A value above 2^53 would expose the precision/json.Number issue flagged in conversion.go.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/mutationlab/recorder/conversion_test.go` around lines 30 - 65, Add a
regression case to TestConvertWidget_RenamesFieldBothDirections that uses a very
large spec.sizeBytes value above 2^53 and verifies both convertWidget directions
preserve it correctly. The current test only covers small values that survive
json.Unmarshal as float64, so extend the existing v1->v2 and v2->v1 assertions
to catch precision loss and ensure conversion.go handles large integers without
truncation.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/mutationlab/recorder/conversion.go`:
- Around line 100-140: The size parsing in convertWidget/widgetSize is losing
precision because json.Unmarshal decodes numbers into float64, so the
json.Number branch in widgetSize is never reached for sizeBytes. Update
convertWidget to decode the raw JSON with a Decoder that calls UseNumber()
before populating obj, so spec["sizeBytes"] can be preserved as json.Number and
parsed exactly; keep the existing widgetSize logic and ensure any needed bytes
import is added.

In `@test/mutationlab/e2e/m4_scenarios_test.go`:
- Around line 113-118: Remove the committed audit-additional.create.yaml from
the flunder/aggregated-api-write corpus so the golden corpus matches what
m4_scenarios_test actually generates. The corpus is built in h.syncCorpus from
corpusRecords in test/mutationlab/e2e/m4_scenarios_test.go, where enriched is
appended only when present, so the audit-additional record must not be checked
in as a required golden file. Update the corpus under
mutationlab/corpus/flunder/aggregated-api-write and keep the generated set
aligned with the conditional logic in m4_scenarios_test and Compare() behavior
in golden.go.

---

Nitpick comments:
In `@internal/mutationlab/recorder/conversion_test.go`:
- Around line 30-65: Add a regression case to
TestConvertWidget_RenamesFieldBothDirections that uses a very large
spec.sizeBytes value above 2^53 and verifies both convertWidget directions
preserve it correctly. The current test only covers small values that survive
json.Unmarshal as float64, so extend the existing v1->v2 and v2->v1 assertions
to catch precision loss and ensure conversion.go handles large integers without
truncation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9ddc6031-6047-4583-a86d-368b5d8d099e

📥 Commits

Reviewing files that changed from the base of the PR and between 876ff60 and e0b6369.

📒 Files selected for processing (33)

cmd/mutation-capture-lab/main.go
docs/design/mutation-capture-lab-design.md
internal/mutationlab/normalize/normalize.go
internal/mutationlab/normalize/normalize_test.go
internal/mutationlab/record.go
internal/mutationlab/recorder/conversion.go
internal/mutationlab/recorder/conversion_test.go
test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml
test/mutationlab/corpus/configmap/create-succeeds/audit.create.yaml
test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-a.yaml
test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-b.yaml
test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-c.yaml
test/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yaml
test/mutationlab/corpus/configmap/dry-run-create/admission.create.yaml
test/mutationlab/corpus/configmap/dry-run-create/audit.create.yaml
test/mutationlab/corpus/configmap/record-and-reject/admission.create.yaml
test/mutationlab/corpus/configmap/record-and-reject/audit.create.yaml
test/mutationlab/corpus/configmap/update/admission.update.yaml
test/mutationlab/corpus/configmap/update/audit.update.yaml
test/mutationlab/corpus/deployment/scale-patch/audit.patch.yaml
test/mutationlab/corpus/flunder/aggregated-api-write/audit-additional.create.yaml
test/mutationlab/corpus/flunder/aggregated-api-write/audit.create.yaml
test/mutationlab/corpus/flunder/aggregated-api-write/watch.added.yaml
test/mutationlab/corpus/pod/graceful-delete/admission.delete.yaml
test/mutationlab/corpus/widget/crd-conversion/admission.create.yaml
test/mutationlab/corpus/widget/crd-conversion/audit.create.yaml
test/mutationlab/corpus/widget/crd-conversion/conversion.to-v1.yaml
test/mutationlab/corpus/widget/crd-conversion/conversion.to-v2.yaml
test/mutationlab/corpus/widget/crd-conversion/watch.added.yaml
test/mutationlab/e2e/harness_test.go
test/mutationlab/e2e/m3_scenarios_test.go
test/mutationlab/e2e/m4_scenarios_test.go
test/mutationlab/swap-image.sh

✅ Files skipped from review due to trivial changes (16)

test/mutationlab/corpus/configmap/dry-run-create/audit.create.yaml
test/mutationlab/corpus/flunder/aggregated-api-write/audit.create.yaml
test/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yaml
test/mutationlab/corpus/configmap/record-and-reject/audit.create.yaml
test/mutationlab/corpus/flunder/aggregated-api-write/audit-additional.create.yaml
test/mutationlab/corpus/widget/crd-conversion/audit.create.yaml
test/mutationlab/corpus/widget/crd-conversion/conversion.to-v2.yaml
test/mutationlab/corpus/configmap/update/admission.update.yaml
test/mutationlab/corpus/widget/crd-conversion/admission.create.yaml
test/mutationlab/corpus/deployment/scale-patch/audit.patch.yaml
test/mutationlab/corpus/configmap/record-and-reject/admission.create.yaml
test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-a.yaml
test/mutationlab/corpus/configmap/create-succeeds/audit.create.yaml
test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-b.yaml
test/mutationlab/corpus/pod/graceful-delete/admission.delete.yaml
test/mutationlab/corpus/configmap/update/audit.update.yaml

🚧 Files skipped from review as they are similar to previous changes (10)

test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-c.yaml
test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml
internal/mutationlab/record.go
test/mutationlab/corpus/configmap/dry-run-create/admission.create.yaml
test/mutationlab/swap-image.sh
test/mutationlab/e2e/harness_test.go
internal/mutationlab/normalize/normalize_test.go
cmd/mutation-capture-lab/main.go
internal/mutationlab/normalize/normalize.go
docs/design/mutation-capture-lab-design.md

coderabbitai · 2026-06-25T05:39:40Z

+func convertWidget(raw json.RawMessage, desiredAPIVersion string) json.RawMessage {
+	var obj map[string]any
+	if err := json.Unmarshal(raw, &obj); err != nil {
+		return raw
+	}
+	obj["apiVersion"] = desiredAPIVersion
+	spec, ok := obj["spec"].(map[string]any)
+	if !ok {
+		return remarshal(obj, raw)
+	}
+	switch versionOf(desiredAPIVersion) {
+	case "v2":
+		if n, ok := widgetSize(spec["sizeBytes"]); ok {
+			spec["size"] = strconv.FormatInt(n, 10)
+			delete(spec, "sizeBytes")
+		}
+	case "v1":
+		if n, ok := widgetSize(spec["size"]); ok {
+			spec["sizeBytes"] = n
+			delete(spec, "size")
+		}
+	}
+	return remarshal(obj, raw)
+}
+
+// widgetSize reads the Widget size from either the v1 integer or the v2 string
+// form, returning the canonical integer.
+func widgetSize(v any) (int64, bool) {
+	switch t := v.(type) {
+	case json.Number:
+		n, err := t.Int64()
+		return n, err == nil
+	case float64:
+		return int64(t), true
+	case string:
+		n, err := strconv.ParseInt(t, 10, 64)
+		return n, err == nil
+	default:
+		return 0, false
+	}
+}


🎯 Functional Correctness | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In Go, does json.Unmarshal into map[string]any decode JSON numbers as float64 unless json.Decoder.UseNumber is set?

💡 Result:

Yes, that is correct. By default, when the Go encoding/json package unmarshals JSON numbers into an interface{} value (such as within a map[string]any), it decodes them as float64 [1][2]. If you use json.Decoder.UseNumber, the decoder will instead unmarshal JSON numbers into a json.Number type [3][4]. A json.Number is a string-based type that holds the literal text of the JSON number, which allows you to handle the number without losing precision or converting it to a float64 prematurely [5][6]. Summary of behavior: - Default: JSON numbers are decoded as float64 [1][2]. - With UseNumber: JSON numbers are decoded as json.Number [3][6].

Citations:

1: https://pkg.go.dev/encoding/json

2: https://go.dev/blog/json

3: https://go.dev/src/encoding/json/stream.go?m=text

4: https://github.com/golang/go/blob/master/src/encoding/json/stream.go

5: https://go.dev/src/encoding/json/decode.go

6: https://www.gofaq.org/en/validate-request-bodies/

json.Number branch in widgetSize is unreachable; large sizeBytes lose precision.

convertWidget uses json.Unmarshal, which defaults to decoding JSON numbers as float64 when populating a map[string]any. Consequently, spec["sizeBytes"] is never a json.Number, the case json.Number in widgetSize never executes, and integers larger than 2^53 lose precision before any conversion.

Update the decoder to use UseNumber() to preserve the exact integer representation:

Proposed fix

- var obj map[string]any - if err := json.Unmarshal(raw, &obj); err != nil { - return raw - } + var obj map[string]any + dec := json.NewDecoder(bytes.NewReader(raw)) + dec.UseNumber() + if err := dec.Decode(&obj); err != nil { + return raw - }

Ensure "bytes" is imported.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func convertWidget(raw json.RawMessage, desiredAPIVersion string) json.RawMessage {

var obj map[string]any

if err := json.Unmarshal(raw, &obj); err != nil {

return raw

}

obj["apiVersion"] = desiredAPIVersion

spec, ok := obj["spec"].(map[string]any)

if !ok {

return remarshal(obj, raw)

}

switch versionOf(desiredAPIVersion) {

case "v2":

if n, ok := widgetSize(spec["sizeBytes"]); ok {

spec["size"] = strconv.FormatInt(n, 10)

delete(spec, "sizeBytes")

}

case "v1":

if n, ok := widgetSize(spec["size"]); ok {

spec["sizeBytes"] = n

delete(spec, "size")

}

}

return remarshal(obj, raw)

}

// widgetSize reads the Widget size from either the v1 integer or the v2 string

// form, returning the canonical integer.

func widgetSize(v any) (int64, bool) {

switch t := v.(type) {

case json.Number:

n, err := t.Int64()

return n, err == nil

case float64:

return int64(t), true

case string:

n, err := strconv.ParseInt(t, 10, 64)

return n, err == nil

default:

return 0, false

}

}

func convertWidget(raw json.RawMessage, desiredAPIVersion string) json.RawMessage {

var obj map[string]any

dec := json.NewDecoder(bytes.NewReader(raw))

dec.UseNumber()

if err := dec.Decode(&obj); err != nil {

return raw

}

obj["apiVersion"] = desiredAPIVersion

spec, ok := obj["spec"].(map[string]any)

if !ok {

return remarshal(obj, raw)

}

switch versionOf(desiredAPIVersion) {

case "v2":

if n, ok := widgetSize(spec["sizeBytes"]); ok {

spec["size"] = strconv.FormatInt(n, 10)

delete(spec, "sizeBytes")

}

case "v1":

if n, ok := widgetSize(spec["size"]); ok {

spec["sizeBytes"] = n

delete(spec, "size")

}

}

return remarshal(obj, raw)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/mutationlab/recorder/conversion.go` around lines 100 - 140, The size parsing in convertWidget/widgetSize is losing precision because json.Unmarshal decodes numbers into float64, so the json.Number branch in widgetSize is never reached for sizeBytes. Update convertWidget to decode the raw JSON with a Decoder that calls UseNumber() before populating obj, so spec["sizeBytes"] can be preserved as json.Number and parsed exactly; keep the existing widgetSize logic and ensure any needed bytes import is added.

coderabbitai · 2026-06-25T05:39:40Z

+	corpusRecords := []mutationlab.Record{*official, *added}
+	if enriched != nil {
+		corpusRecords = append(corpusRecords, *enriched)
+	}
+	h.syncCorpus(t, "flunder/aggregated-api-write", corpusRecords)
+}


🩺 Stability & Availability | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm the additional-audit corpus is committed and check how Compare treats missing captures. fd -t f 'audit-additional' test/mutationlab/corpus rg -n 'stray corpus|SourceAuditAdditional|audit-webhook-additional' internal/mutationlab cmd/mutation-capture-lab

Repository: ConfigButler/gitops-reverser

Length of output: 1729

Remove committed audit-additional.create.yaml to prevent flaky E2E failures.

The audit-additional record is conditionally appended in test/mutationlab/e2e/m4_scenarios_test.go (line 115), meaning it is absent when enriched is nil. However, test/mutationlab/corpus/flunder/aggregated-api-write/audit-additional.create.yaml is committed to the repo. Per internal/mutationlab/corpus/golden.go (line 181), if a committed file is not generated during the test run, Compare() fails with stray corpus files no longer captured. This causes the test to flake if the e2e cluster's audit proxy does not emit the additional event.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/mutationlab/e2e/m4_scenarios_test.go` around lines 113 - 118, Remove the committed audit-additional.create.yaml from the flunder/aggregated-api-write corpus so the golden corpus matches what m4_scenarios_test actually generates. The corpus is built in h.syncCorpus from corpusRecords in test/mutationlab/e2e/m4_scenarios_test.go, where enriched is appended only when present, so the audit-additional record must not be checked in as a required golden file. Update the corpus under mutationlab/corpus/flunder/aggregated-api-write and keep the generated set aligned with the conditional logic in m4_scenarios_test and Compare() behavior in golden.go.

Phase 0 ("finish the evidence") of the watch-only ingestion proposal: two new "difficult cases" corpus rows captured against a live k8s v1.35.2 cluster. - Row 10 (owner-ref cascade): one user delete fans out to a watch DELETED for both the parent and the cascaded child; the child delete is audited under generic-garbage-collector, not the human — concrete delete-attribution evidence for the conservative resolver. - Row 13 (optimistic-concurrency conflict): a stale-resourceVersion update is rejected at the storage layer BEFORE validating admission runs, so audit is the sole witness — no admission, no watch. Stronger than rows 11/12. Also fixes a normalizer non-determinism the cascade surfaced: owner-ref UIDs embedded in managedFields association keys (k:{"uid":...}) were not collapsed, churning the corpus every run. Adds uidByLen handling mirroring the existing IP rule. Full lab suite 13/13 deterministic in compare mode. Updates the README scenario index and the design doc (corpus table, rows 10/13, and the rows 16/17 resync/bookmark recorder-readiness findings). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Phase 1 of docs/design/watch-only-ingestion-architecture.md ("build watch state in parallel"): for every Synced type, optionally hold a long-lived WATCH and record each ADDED/MODIFIED/DELETED into a per-type ⌚stream, written ALONGSIDE the authoritative :audit:stream so the watch-derived desired set can be diffed against the audit-derived one. Off by default; changes no Git write. - queue: AppendWatchEvent / DeleteTypeWatchStream on RedisByTypeStreamQueue, reusing the per-type key machinery and the :objects:items envelope shape so the checkpoint and watch log fold byte-identically. - watch: a runner twinned with the audit tail (started on Synced beside startTypeAuditTail, stopped on Release); the resume cursor advances on bookmarks and resets to the live edge after repeated un-resumable (410-style) failures — correctness stays owned by the checkpoint, so a watch gap costs only freshness on this experimental stream. - cmd: --watch-state-stream flag wiring (off by default), reusing the mirror's write-side client. Unit-tested with a fake watch + writer (records/bookmark-cursor/relist-reset/ lifecycle). fmt/vet/lint/test green. Product test-e2e deferred (flag off, no behavior change). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…1 payoff) Slice 2 of the parallel watch-state stream: now that --watch-state-stream records a ⌚stream, periodically fold it into a watch-derived desired set and diff that against the audit-derived set, metering the divergence. This is the question the corpus cannot answer — does a continuous watch reproduce the same desired manifests the audit log does, on a live cluster? It changes no Git write. - queue: RedisTypeSplicer.SpliceWatchType folds the checkpoint with the ⌚stream entries whose recorded rv > checkpoint R (the stream uses arrival-order auto-IDs, so the post-checkpoint slice is by the rv field, mirroring SpliceType's exclusive "(R" fold). - watch: compareDesiredSets diffs the two sets by identity + sanitized body (agree / mismatch / audit-only / watch-only); a per-minute Manager loop runs it for every serviceable type and meters the result. Wired only when --watch-state-stream is on. - telemetry: gitopsreverser_watch_audit_divergence{gvr,reason} gauge + gitopsreverser_watch_audit_comparisons_total{gvr,result} counter. Unit-tested: the fold (rv-gate + delete), the diff buckets, the Manager wiring with fake splices, and a Valkey-backed SpliceWatchType. fmt/vet/lint/test green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/mutationlab/e2e/aggregated_api_test.go (1)

115-127: 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Fail fast before syncing the corpus on a bad watch payload.

The t.Errorf on Lines 115-117 still allows h.syncCorpus to run on Lines 126-127. In MUTATIONLAB_UPDATE mode that can rewrite the golden flunder fixtures with the exact broken watch body this scenario is supposed to reject.

Proposed fix

-	if !added.Summary.HasObject || flunderReference(added) != "some-flunder" {
-		t.Errorf("watch ADDED did not carry the full flunder object (spec.reference=%q, hasObject=%v)",
-			flunderReference(added), added.Summary.HasObject)
-	}
+	if !added.Summary.HasObject || flunderReference(added) != "some-flunder" {
+		t.Fatalf("watch ADDED did not carry the full flunder object (spec.reference=%q, hasObject=%v)",
+			flunderReference(added), added.Summary.HasObject)
+	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/mutationlab/e2e/aggregated_api_test.go` around lines 115 - 127, The
aggregated API test is logging a bad watch payload but still continues into
h.syncCorpus, which can rewrite golden fixtures with invalid data. In
aggregated_api_test.go, update the check around the watch ADDED validation so
the test stops immediately on a failed flunderReference(added) or
added.Summary.HasObject condition, and only call h.syncCorpus after the payload
has passed validation. Keep the fix localized to the existing test flow in the
aggregated_api_test.go scenario.

🧹 Nitpick comments (1)

test/mutationlab/e2e/configmap_scenarios_test.go (1)
64-65: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Rename these tests to the TestFunctionName_Scenario pattern.

These new e2e cases use bare names like TestUpdate and TestOwnerRefCascade, which doesn't match the repo's required *_test.go naming convention.

As per coding guidelines, **/*_test.go: Follow naming convention TestFunctionName_Scenario(t *testing.T).

Also applies to: 81-82, 103-104, 128-129, 180-181, 287-288, 344-345, 441-442
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/mutationlab/e2e/configmap_scenarios_test.go` around lines 64 - 65,
Rename the e2e test functions in configmap_scenarios_test.go to follow the
required TestFunctionName_Scenario pattern, using the existing test names like
TestCreateSucceeds, TestUpdate, and TestOwnerRefCascade as the base and adding a
clear scenario suffix where needed. Update each affected test function
declaration so the names are descriptive and consistent with the repo’s testing
convention, and ensure the renamed functions still match their existing
harness-based bodies such as newHarness(t).
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/mutation-capture-lab-design.md`:
- Around line 9-12: The captured-scenarios inventory is stale in the
mutation-capture lab design doc. Update the scenario counts and row status in
the sections that describe the corpus inventory so they match the committed
corpus, including the new owner-ref cascade and conflict coverage referenced by
the existing corpus/test symbols and the sibling watch-only design doc. Also
revise the “What To Capture Next” list and the affected Difficult Cases Catalog
entries together so the totals, planned rows, and captured rows are consistent
throughout the document.

In `@internal/watch/watch_state.go`:
- Around line 141-150: The watch-state restart logic is treating every session
close as a resumable failure and clearing sinceRV too aggressively. Update
runTypeWatchStream/watchStateSession handling so only definitive
resource-version expiration (for example 410 Gone / unresumable resume points)
increments toward watchStateRelistThreshold and resets sinceRV, while ordinary
watch rollovers or errWatchStateClosed channel closures should keep the current
resume cursor and reopen from the same RV. Use the existing gvr, sinceRV,
failures, and watchStateRelistThreshold flow to distinguish these cases before
logging "watch-state stream resetting to live edge".

In `@test/mutationlab/corpus/configmap/finalizer-delete/watch.deleted.yaml`:
- Around line 10-34: The terminal DELETED snapshot for the finalizer-delete
scenario still shows the old finalizer, so update the watch.deleted.yaml fixture
to reflect the post-patch state by removing the metadata.finalizers block
entirely. Make sure the deleted event matches the effect of audit.patch.yaml on
the object, and verify the remaining metadata in watch.deleted.yaml still aligns
with the finalizer-delete flow.

In `@test/mutationlab/e2e/configmap_scenarios_test.go`:
- Around line 118-120: The dry-run verification in configmap_scenarios_test
should assert a NotFound error from the ConfigMaps().Get call instead of
treating any error as success. Update the check around
h.kube.CoreV1().ConfigMaps(s.ns).Get for "cm-dry" to explicitly validate that
the failure is due to the object not existing, so transport or auth errors do
not incorrectly pass the test.

---

Outside diff comments:
In `@test/mutationlab/e2e/aggregated_api_test.go`:
- Around line 115-127: The aggregated API test is logging a bad watch payload
but still continues into h.syncCorpus, which can rewrite golden fixtures with
invalid data. In aggregated_api_test.go, update the check around the watch ADDED
validation so the test stops immediately on a failed flunderReference(added) or
added.Summary.HasObject condition, and only call h.syncCorpus after the payload
has passed validation. Keep the fix localized to the existing test flow in the
aggregated_api_test.go scenario.

---

Nitpick comments:
In `@test/mutationlab/e2e/configmap_scenarios_test.go`:
- Around line 64-65: Rename the e2e test functions in
configmap_scenarios_test.go to follow the required TestFunctionName_Scenario
pattern, using the existing test names like TestCreateSucceeds, TestUpdate, and
TestOwnerRefCascade as the base and adding a clear scenario suffix where needed.
Update each affected test function declaration so the names are descriptive and
consistent with the repo’s testing convention, and ensure the renamed functions
still match their existing harness-based bodies such as newHarness(t).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: cb5de786-dc51-4d30-99da-7f1342cfe8bc

📥 Commits

Reviewing files that changed from the base of the PR and between e0b6369 and 77e349f.

📒 Files selected for processing (28)

cmd/main.go
docs/design/mutation-capture-lab-design.md
docs/design/watch-only-ingestion-architecture.md
internal/mutationlab/normalize/normalize.go
internal/queue/redis_watch_stream.go
internal/watch/manager.go
internal/watch/materialization.go
internal/watch/watch_state.go
internal/watch/watch_state_test.go
test/mutationlab/README.md
test/mutationlab/corpus/configmap/conflict-update/audit.update.yaml
test/mutationlab/corpus/configmap/finalizer-delete/admission.delete.yaml
test/mutationlab/corpus/configmap/finalizer-delete/admission.update.yaml
test/mutationlab/corpus/configmap/finalizer-delete/audit.delete.yaml
test/mutationlab/corpus/configmap/finalizer-delete/audit.patch.yaml
test/mutationlab/corpus/configmap/finalizer-delete/watch.deleted.yaml
test/mutationlab/corpus/configmap/finalizer-delete/watch.modified.yaml
test/mutationlab/corpus/configmap/owner-ref-cascade/audit.delete.cm-child.yaml
test/mutationlab/corpus/configmap/owner-ref-cascade/audit.delete.cm-parent.yaml
test/mutationlab/corpus/configmap/owner-ref-cascade/watch.deleted.cm-child.yaml
test/mutationlab/corpus/configmap/owner-ref-cascade/watch.deleted.cm-parent.yaml
test/mutationlab/corpus/widget/crd-conversion/conversion.to-v1.yaml
test/mutationlab/corpus/widget/crd-conversion/conversion.to-v2.yaml
test/mutationlab/e2e/aggregated_api_test.go
test/mutationlab/e2e/configmap_scenarios_test.go
test/mutationlab/e2e/crd_conversion_test.go
test/mutationlab/e2e/harness_test.go
test/mutationlab/e2e/workload_scenarios_test.go

✅ Files skipped from review due to trivial changes (7)

test/mutationlab/corpus/widget/crd-conversion/conversion.to-v2.yaml
test/mutationlab/corpus/configmap/owner-ref-cascade/watch.deleted.cm-child.yaml
test/mutationlab/corpus/configmap/finalizer-delete/audit.patch.yaml
test/mutationlab/corpus/configmap/owner-ref-cascade/audit.delete.cm-parent.yaml
test/mutationlab/corpus/configmap/conflict-update/audit.update.yaml
test/mutationlab/corpus/widget/crd-conversion/conversion.to-v1.yaml
test/mutationlab/README.md

🚧 Files skipped from review as they are similar to previous changes (2)

internal/mutationlab/normalize/normalize.go
test/mutationlab/e2e/harness_test.go

coderabbitai · 2026-06-25T11:49:46Z

+			failures++
+			log.V(1).Info("watch-state session ended; will re-open",
+				"gvr", gvr.String(), "resumeRV", sinceRV, "failures", failures, "err", err.Error())
+			if failures >= watchStateRelistThreshold {
+				// An un-resumable resume point (e.g. 410 Gone after compaction): drop to the live edge.
+				// The freshness gap is healed by the next checkpoint re-anchor; this stream is never a
+				// correctness source.
+				log.V(1).Info("watch-state stream resetting to live edge", "gvr", gvr.String())
+				sinceRV = ""
+				failures = 0


🎯 Functional Correctness | 🟠 Major

Don't drop the resume RV after ordinary watch rollovers.

watchStateSession returns errWatchStateClosed for any channel closure, and runTypeWatchStream treats this as a failure that increments the failures counter. Once failures reaches watchStateRelistThreshold, sinceRV is cleared to "". This logic fails to distinguish between a normal API server watch timeout (which is expected and resumable) and an actual resource version expiration (410 Gone). Consequently, the stream discards its resume cursor after just three healthy rollovers, forcing a restart from the live edge and silently skipping mutations that occurred during the reconnect window. Only reset sinceRV on definitive resource expiration; treat ordinary channel closes as resumable events.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/watch/watch_state.go` around lines 141 - 150, The watch-state restart logic is treating every session close as a resumable failure and clearing sinceRV too aggressively. Update runTypeWatchStream/watchStateSession handling so only definitive resource-version expiration (for example 410 Gone / unresumable resume points) increments toward watchStateRelistThreshold and resets sinceRV, while ordinary watch rollovers or errWatchStateClosed channel closures should keep the current resume cursor and reopen from the same RV. Use the existing gvr, sinceRV, failures, and watchStateRelistThreshold flow to distinguish these cases before logging "watch-state stream resetting to live edge".

coderabbitai · 2026-06-25T11:49:46Z

+    finalizers:
+    - mutationlab.configbutler.ai/hold
+    labels:
+      mutationlab.configbutler.ai/scenario: finalizer-delete
+    managedFields:
+    - apiVersion: v1
+      fieldsType: FieldsV1
+      fieldsV1:
+        f:data:
+          .: {}
+          f:key: {}
+        f:metadata:
+          f:finalizers:
+            .: {}
+            v:"mutationlab.configbutler.ai/hold": {}
+          f:labels:
+            .: {}
+            f:mutationlab.configbutler.ai/scenario: {}
+      manager: e2e.test
+      operation: Update
+      time: <ts>
+    name: cm-hold
+    namespace: <ns-1>
+    resourceVersion: <rv-3>
+    uid: <uid-1>


🎯 Functional Correctness | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail files=( test/mutationlab/corpus/configmap/finalizer-delete/watch.modified.yaml test/mutationlab/corpus/configmap/finalizer-delete/watch.deleted.yaml test/mutationlab/corpus/configmap/finalizer-delete/admission.update.yaml test/mutationlab/corpus/configmap/finalizer-delete/audit.patch.yaml ) for f in "${files[@]}"; do echo "===== $f =====" sed -n '1,120p' "$f" echo done python - <<'PY' from pathlib import Path import difflib a = Path("test/mutationlab/corpus/configmap/finalizer-delete/watch.modified.yaml").read_text().splitlines() b = Path("test/mutationlab/corpus/configmap/finalizer-delete/watch.deleted.yaml").read_text().splitlines() print("===== diff: watch.modified.yaml vs watch.deleted.yaml =====") for line in difflib.unified_diff(a, b, fromfile="watch.modified.yaml", tofile="watch.deleted.yaml", lineterm=""): print(line) PY

Repository: ConfigButler/gitops-reverser

Length of output: 5661

The watch.deleted.yaml fixture is missing the finalizer removal.

Although audit.patch.yaml sends a request to clear finalizers (null), the resulting watch.deleted.yaml snapshot incorrectly retains the mutationlab.configbutler.ai/hold entry. In the finalizer-delete scenario, the terminal DELETED event must reflect the object state after the patch, which requires the finalizers array to be empty or absent.

Update watch.deleted.yaml to remove the finalizers block.

See patch request vs. current deleted state

Current audit.patch.yaml request:

requestObject: metadata: finalizers: null

Incorrect watch.deleted.yaml state:

metadata: finalizers: - mutationlab.configbutler.ai/hold

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/mutationlab/corpus/configmap/finalizer-delete/watch.deleted.yaml` around lines 10 - 34, The terminal DELETED snapshot for the finalizer-delete scenario still shows the old finalizer, so update the watch.deleted.yaml fixture to reflect the post-patch state by removing the metadata.finalizers block entirely. Make sure the deleted event matches the effect of audit.patch.yaml on the object, and verify the remaining metadata in watch.deleted.yaml still aligns with the finalizer-delete flow.

coderabbitai · 2026-06-25T11:49:46Z

+	if _, err := h.kube.CoreV1().ConfigMaps(s.ns).Get(ctx, "cm-dry", metav1.GetOptions{}); err == nil {
+		t.Error("dry-run create persisted an object; want none")
+	}


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Require NotFound here, not just “any error”.

This assertion passes on transport/auth failures too, so the test can claim “not persisted” when the lookup actually failed for an unrelated reason.

Suggested fix

- if _, err := h.kube.CoreV1().ConfigMaps(s.ns).Get(ctx, "cm-dry", metav1.GetOptions{}); err == nil { - t.Error("dry-run create persisted an object; want none") - } + if _, err := h.kube.CoreV1().ConfigMaps(s.ns).Get(ctx, "cm-dry", metav1.GetOptions{}); err == nil { + t.Error("dry-run create persisted an object; want none") + } else if !apierrors.IsNotFound(err) { + t.Fatalf("dry-run create lookup failed with %v; want NotFound", err) + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if _, err := h.kube.CoreV1().ConfigMaps(s.ns).Get(ctx, "cm-dry", metav1.GetOptions{}); err == nil {

t.Error("dry-run create persisted an object; want none")

}

if _, err := h.kube.CoreV1().ConfigMaps(s.ns).Get(ctx, "cm-dry", metav1.GetOptions{}); err == nil {

t.Error("dry-run create persisted an object; want none")

} else if !apierrors.IsNotFound(err) {

t.Fatalf("dry-run create lookup failed with %v; want NotFound", err)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/mutationlab/e2e/configmap_scenarios_test.go` around lines 118 - 120, The dry-run verification in configmap_scenarios_test should assert a NotFound error from the ConfigMaps().Get call instead of treating any error as success. Update the check around h.kube.CoreV1().ConfigMaps(s.ns).Get for "cm-dry" to explicitly validate that the failure is due to the object not existing, so transport or auth errors do not incorrectly pass the test.

Add the watch-transport rows via a targeted /watch-probe lab endpoint so 410-Expired ERROR (row 16) and BOOKMARK (row 17) events can be scenario-attributed, with a watch_probe recorder and watch_transport_test driver. Commit the watch-resync and watch-bookmark corpus. Update the design doc + README to mark rows 8, 10, 13, 16, 17 captured (fifteen of seventeen); only server-side apply and no-op apply (rows 3, 4) remain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (3)

internal/mutationlab/labserver/api.go (1)
126-129: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Probe validation errors get a misleading 504.

Probe returns plain errors for client-side problems (empty scenario, unsupported mode, missing resource), but this path maps every probe error to 504 Gateway Timeout. The handler validates the resource up front yet never checks scenario/mode, so a malformed request body surfaces as a timeout rather than a 400. Consider distinguishing a real timeout (ctx.Err()/context.DeadlineExceeded) from validation/transport failures.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/mutationlab/labserver/api.go` around lines 126 - 129, The probe
error handling in the API handler is mapping all failures from Probe to 504
Gateway Timeout, which makes validation and transport errors look like timeouts.
Update the handler around the Probe call to distinguish real deadline/timeout
cases (for example via ctx.Err() or context.DeadlineExceeded) from client-side
validation failures such as empty scenario, unsupported mode, or missing
resource, and return a 400 for the latter. Use the existing Probe and handler
logic in api.go to keep the response codes aligned with the actual failure
source.
internal/watch/watch_compare.go (1)
141-153: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low value

Divergence gauge can go stale for types that stop being serviceable.

recordWatchAuditDivergence re-records zeros each pass, which correctly clears a healed divergence. But the loop only records gauges for types still returned by Inventory() with Serviceable() == true. When a type is released/unclaimed (or removed from inventory) while diverging, its last non-zero {gvr, reason} gauge value is never reset to 0 and lingers, falsely signalling ongoing divergence for a type no longer being compared.

For an experimental Phase-1 meter this is low impact, but consider emitting a final zero-out when a previously-recorded GVR drops out of the serviceable set.

Also applies to: 172-194
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/watch/watch_compare.go` around lines 141 - 153, The divergence gauge
handling in compareWatchAuditAllTypes only updates currently serviceable
Inventory() entries, so stale non-zero values can remain when a GVR drops out of
the serviceable set. Update the watch audit flow to track previously recorded
GVRs and, after iterating the current serviceable types, emit a final zero-value
record for any GVR that was seen before but is no longer returned by Inventory()
or Serviceable(). Apply the same cleanup behavior anywhere the watch-audit
comparison loop is duplicated so recordWatchAuditDivergence no longer leaves
lingering divergence metrics.
internal/queue/redis_watch_splice.go (1)
73-79: 🚀 Performance & Scalability | 🔵 Trivial

Ensure MaxLen constraint on :watch:stream to prevent unbounded read costs.

XRange in SpliceWatchType reads the entire stream (from start="-" to end="+") on every pass. While foldWatchEntry discards old entries, the network and memory cost scales with the total event history if the stream is not trimmed.

The write path (AppendWatchEvent -> xaddID) applies MaxLen only if q.maxLen > 0. If the queue instance is initialized without a length limit or if the watch stream is configured independently, it will grow indefinitely.

Consider either:

Enforcing a MaxLen or XTrim strategy on the RedisByTypeStreamQueue initialization for watch streams.

Alternatively, reading only from the checkpoint position in the stream key's native ID space (though this is complex with auto-IDs, making the MaxLen approach preferred).
Relevant Code
// internal/queue/redis_watch_splice.go:73
msgs, err := s.client.XRange(ctx, base+byTypeWatchStreamSuffix, "-", "+").Result()
...

// internal/queue/redis_bytype_queue.go:644-647
args := &redis.XAddArgs{Stream: stream, ID: id, Values: values}
if q.maxLen > 0 {
    args.MaxLen = q.maxLen
    args.Approx = true
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/queue/redis_watch_splice.go` around lines 73 - 79, The watch-stream
read path in SpliceWatchType currently scans the entire :watch:stream with
XRange, so it can become unbounded unless the stream is trimmed. Update the
RedisByTypeStreamQueue/watch-stream setup to enforce a MaxLen or XTrim policy
for the watch stream, and ensure AppendWatchEvent/xaddID applies that limit
consistently even when the queue is otherwise unbounded. Keep the fix localized
around RedisByTypeStreamQueue, AppendWatchEvent, and the XRange-based splice in
redis_watch_splice.go so the stream history stays bounded.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/mutationlab/labserver/api.go`:
- Around line 126-129: The probe error handling in the API handler is mapping
all failures from Probe to 504 Gateway Timeout, which makes validation and
transport errors look like timeouts. Update the handler around the Probe call to
distinguish real deadline/timeout cases (for example via ctx.Err() or
context.DeadlineExceeded) from client-side validation failures such as empty
scenario, unsupported mode, or missing resource, and return a 400 for the
latter. Use the existing Probe and handler logic in api.go to keep the response
codes aligned with the actual failure source.

In `@internal/queue/redis_watch_splice.go`:
- Around line 73-79: The watch-stream read path in SpliceWatchType currently
scans the entire :watch:stream with XRange, so it can become unbounded unless
the stream is trimmed. Update the RedisByTypeStreamQueue/watch-stream setup to
enforce a MaxLen or XTrim policy for the watch stream, and ensure
AppendWatchEvent/xaddID applies that limit consistently even when the queue is
otherwise unbounded. Keep the fix localized around RedisByTypeStreamQueue,
AppendWatchEvent, and the XRange-based splice in redis_watch_splice.go so the
stream history stays bounded.

In `@internal/watch/watch_compare.go`:
- Around line 141-153: The divergence gauge handling in
compareWatchAuditAllTypes only updates currently serviceable Inventory()
entries, so stale non-zero values can remain when a GVR drops out of the
serviceable set. Update the watch audit flow to track previously recorded GVRs
and, after iterating the current serviceable types, emit a final zero-value
record for any GVR that was seen before but is no longer returned by Inventory()
or Serviceable(). Apply the same cleanup behavior anywhere the watch-audit
comparison loop is duplicated so recordWatchAuditDivergence no longer leaves
lingering divergence metrics.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 878bbb6f-c46c-40ee-b1a5-b18c3fdbe802

📥 Commits

Reviewing files that changed from the base of the PR and between 77e349f and d41cfab.

📒 Files selected for processing (27)

cmd/main.go
cmd/mutation-capture-lab/main.go
docs/design/mutation-capture-lab-design.md
docs/design/watch-only-ingestion-architecture.md
internal/mutationlab/labserver/api.go
internal/mutationlab/labserver/api_test.go
internal/mutationlab/normalize/normalize.go
internal/mutationlab/normalize/normalize_test.go
internal/mutationlab/recorder/recorder_test.go
internal/mutationlab/recorder/watch_probe.go
internal/queue/redis_watch_splice.go
internal/queue/redis_watch_splice_test.go
internal/telemetry/exporter.go
internal/watch/manager.go
internal/watch/watch_compare.go
internal/watch/watch_compare_test.go
test/mutationlab/README.md
test/mutationlab/corpus/configmap/no-op-apply/admission.update.yaml
test/mutationlab/corpus/configmap/no-op-apply/audit.patch.yaml
test/mutationlab/corpus/configmap/server-side-apply/admission.update.yaml
test/mutationlab/corpus/configmap/server-side-apply/audit.patch.yaml
test/mutationlab/corpus/configmap/server-side-apply/watch.modified.yaml
test/mutationlab/corpus/configmap/watch-bookmark/watch.bookmark.yaml
test/mutationlab/corpus/configmap/watch-resync/watch.error.yaml
test/mutationlab/e2e/configmap_scenarios_test.go
test/mutationlab/e2e/harness_test.go
test/mutationlab/e2e/watch_transport_test.go

✅ Files skipped from review due to trivial changes (7)

test/mutationlab/corpus/configmap/watch-resync/watch.error.yaml
test/mutationlab/corpus/configmap/watch-bookmark/watch.bookmark.yaml
test/mutationlab/corpus/configmap/server-side-apply/admission.update.yaml
test/mutationlab/corpus/configmap/no-op-apply/audit.patch.yaml
test/mutationlab/corpus/configmap/server-side-apply/audit.patch.yaml
test/mutationlab/README.md
docs/design/watch-only-ingestion-architecture.md

🚧 Files skipped from review as they are similar to previous changes (6)

internal/mutationlab/recorder/recorder_test.go
cmd/mutation-capture-lab/main.go
test/mutationlab/e2e/harness_test.go
internal/mutationlab/normalize/normalize_test.go
cmd/main.go
internal/mutationlab/normalize/normalize.go

kilo-code-bot · 2026-06-26T18:16:19Z

 func (m *Manager) ForgetGitTargetDeclaration(gitDest types.ResourceReference) {
-	m.clearTargetTypeWatermarks(gitDest)
+	m.forgetGitTargetWatches(gitDest)
+	m.forgetGitTargetUID(gitDest)


WARNING: forgetGitTargetUID can race with DeclareForGitTarget for recreated GitTargets

When a GitTarget is deleted and recreated with the same namespace/name, the controller calls ForgetGitTargetDeclaration with a UID-less gitDest (gittarget_controller.go:828). If DeclareForGitTarget for the new target has already stored its UID, this unconditional delete in forgetGitTargetUID wipes the new UID from the map. The new target's cursor resume then falls back to a fresh replay instead of resuming from the stored position.

Consider only deleting when the stored UID matches the one being forgotten, or skipping UID cleanup here and relying on DeclareForGitTarget overwrites plus the cursor TTL to evict stale entries.

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

…nt in the writer The acceptance gate that classifies unsafe/unsupported folder content existed but was wired only into the manifest-analyzer CLI — the running operator built its store with an empty allowlist and never called Accept, so a folder with a hard-Kustomize kustomization.yaml (patches/generators/components/...), duplicate identities, impure managed files, or standalone non-KRM YAML was detected but written into anyway. This enforces the gate in the live writer and the resync (first-materialization) apply, for the cases we already know are a problem from structure alone: - manifestanalyzer: add IssueUnsupportedKustomize + RetainedDocument.Unsupported (surfaced from the existing hasUnsupportedKustomizeFeature predicate), an AcceptStructureOnly entrypoint that runs the structural refusals but never the discovery-derived mapping refusals (which can blink on a wobble), a typed AcceptanceRefusedError, and a WriterAllowlist that retains the operator's own .sops.yaml + kustomize build directives so they are never mis-refused. - git writer: run the structure-only gate over the built store in both flushEventsToWorktree (live) and applyResyncToWorktree (resync); a refusal aborts the commit before any file is touched and returns the typed error. Surfacing the refusal on GitTarget status (a Blocked stream) follows in the next commit. Design: docs/design/unsupported-folder-refusal-plan.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

When the first-materialization resync is refused by the acceptance gate (an unsupported kustomization, duplicate identity, impure or non-KRM file), the watch layer now surfaces it on GitTarget status instead of only logging: - thread the (GVR, namespace) key through enqueueReplayResync to drainScopedResync so a refusal can mark the right type's stream; - in the drain, recover the typed *AcceptanceRefusedError via errors.As and mark the stream Blocked with reason UnsupportedContent and a message naming the file; a refusal is NOT counted as a generic background resync failure (nothing was written; it is user-actionable, not a transient fault); - add the StreamReasonUnsupportedContent reason. This flows through the existing data-plane surface with no schema change: status.streams.blocked rises, the StreamsReady condition goes False with reason UnsupportedContent, and phase becomes Degraded — a clean fit for the current two-axis GitTarget status. The folder is left untouched until a human cleans it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Adds test/e2e/unsupported_folder_e2e_test.go: seeds a GitTarget path with a hard-Kustomize kustomization.yaml (a patches block) plus the ConfigMap it references, creates the GitTarget + a ConfigMap WatchRule, and asserts the control plane is configured (Ready=True) while the data plane is blocked (StreamsReady=False, reason UnsupportedContent) and the operator commits nothing on top of the seed. - architecture.md: the Mark-and-Sweep Resync section now documents the structure-only acceptance gate and the Blocked-stream refusal, replacing the stale "unsafe YAML is left alone" claim. - e2e-coverage-gaps plan: flip Test D from "blocked on a product decision" to IMPLEMENTED. - lint: add IssueUnsupportedKustomize to the exhaustive analyzer_test map; use require.ErrorAs and wrap a long line in the git acceptance-gate test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… go True createGitTarget always enables SOPS encryption referencing the shared sops-age-key secret, so without it the GitTarget's EncryptionConfigured gate fails and Ready never becomes True — which made Test D time out on the Ready check before reaching the refusal assertion. Apply the key in BeforeAll like the other manager specs. Verified end to end against a redeployed operator: the hard-Kustomize folder is refused with StreamsReady=False/UnsupportedContent and no commit is produced. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

kilo-code-bot · 2026-06-27T12:49:52Z

+			Expect(condition.Reason).To(Equal(ReasonProgressing))
+			Expect(streamsRunning.Status).To(Equal(metav1.ConditionFalse))
+			Expect(streamsRunning.Reason).To(Equal(watch.StreamReasonNoResolvedTypes))
+			Expect(gitTargetReady.Status).To(Equal(metav1.ConditionFalse))


WARNING: Test asserts GitTargetReady=False but controller produces Unknown for unevaluated GitTarget

The gitTargetReadyCondition function returns Unknown when the GitTarget has no status conditions (gittarget_dependency_status.go:57), which correctly signals "not yet evaluated." This test creates a GitTarget without any prior reconcile, so the WatchRule’s setGitTargetReadyCondition assigns Unknown — not False.

Change the expectation to metav1.ConditionUnknown so the test matches the actual status of a freshly created GitTarget that has not yet published a Ready condition.

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

…nore

kilo-code-bot · 2026-06-27T21:44:22Z

+	BeforeAll(func() {
+		By("creating the foreign-content test namespace")
+		testNs = testNamespaceFor("manager-foreign-content")
+		_, _ = kubectlRun("create", "namespace", testNs)


WARNING: Error discarded without explanatory comment

kubectlRun("create", "namespace", testNs) silently ignores the returned error. Other e2e tests in this codebase annotate such calls with // idempotent; ignore AlreadyExists when the error is intentionally discarded. Without that annotation a genuine failure here (invalid name, RBAC denial, API server unavailable) will propagate into later steps and produce confusing assertion failures instead of a clear namespace-creation error.

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

sunib added 2 commits June 24, 2026 10:39

chore: add optional validatingAdmissionWebhook, and prepare for a sep…

b487bed

…erate testing project to exactly get how the kube-api versions are responding to edge cases

chore: creating plan and code to capture all the mechanisms in a more…

ec49609

… structure way (so that we can also use this to evalaute new (or older) Kubernetes versions

gemini-code-assist Bot reviewed Jun 24, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

chore: here is M2!

876ff60

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

chore: Interesting findings on the shallow body problem

e0b6369

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

sunib and others added 7 commits June 25, 2026 06:16

chore: small fixes

758a629

chore: more improvements in the tests

2ccf07a

chore: first draft on architecture update

7c41668

chore: finishing for now

8dc3aaa

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

sunib and others added 2 commits June 25, 2026 12:33

chore: finishing the design

d41cfab

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

sunib added 9 commits June 25, 2026 18:25

feat: let's get all testing to Kubernetes 1.36

38f59f8

chore: reran all mutatons on k8s v1.36.1

6dec610

chore: improving watch-ingestion document.

e612fc6

chore: getting the design docs better

d0e25a2

feat: watch-first ingestion

b2d5bc7

chore: next steps

5122032

docs: moving architecture along with the rewrite

79a9fe6

chore: relisten to a watch when possible

b524d83

chore: details on how Redis is needed

7e011db

docs: created new plan, and hopefully found why the tests are so flaky

d7bdb16

kilo-code-bot Bot reviewed Jun 26, 2026

View reviewed changes

sunib added 2 commits June 26, 2026 18:35

chore: easier status and streamsready

acf73d5

chore: e2e flake preventions

acaea33

sunib changed the title ~~chore: add lab-code to capture events for all types (so that we can make a better informed decision in edge cases).~~ Watch-first ingestion with (optional!) audit-based attribution Jun 26, 2026

chore: overall improvements, fixing things and cleaning docs

04aa391

sunib mentioned this pull request Jun 26, 2026

[FEATURE] Watch mode — native Kubernetes resource watcher as an alternative to the audit webhook #168

Open

sunib and others added 8 commits June 26, 2026 21:24

feat: reworking metrics to new architecture

915b524

docs: adding skills and working on status design

f2773a8

chore: improve status, support kstatus

92fa490

chore: refining names and more explicit e2e test for status behaviour

12f3aa2

kilo-code-bot Bot reviewed Jun 27, 2026

View reviewed changes

sunib added 2 commits June 27, 2026 21:30

docs: designing gittargetignore

419ab33

feat: refuse weird files in GitTarget path, but do allow .gittargetig…

1c61666

…nore

kilo-code-bot Bot reviewed Jun 27, 2026

View reviewed changes

chore: removing settings and preparing merge

b895ef3

Uh oh!

Conversation

sunib commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (2 warnings)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Previous review (commit 1c61666)

Overview

WARNING

Previous review (commit 12f3aa2)

Overview

WARNING

Previous review (commit 92fa490)

Previous review (commit f2773a8)

Previous review (commit 893e17f)

Previous review (commit d09ab73)

Previous review (commit cb8d4b0)

Previous review (commit 915b524)

Previous review (commit 04aa391)

Previous review (commit acaea33)

Previous review (commit acf73d5)

Overview

WARNING

Previous review (commit d7bdb16)

Overview

WARNING

Previous review (commit 7e011db)

Previous review (commit b524d83)

Overview

Incremental Review (b2d5bc7..79a9fe6)

Previous review (commit 79a9fe6)

Overview

Incremental Review (b2d5bc7..79a9fe6)

Previous review (commit b2d5bc7)

Overview

Incremental Review (e0b6369..HEAD)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sunib commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

kilo-code-bot Bot commented Jun 24, 2026 •

edited

Loading

Previous review (commit `1c61666`)

Previous review (commit `12f3aa2`)

Previous review (commit `92fa490`)

Previous review (commit `f2773a8`)

Previous review (commit `893e17f`)

Previous review (commit `d09ab73`)

Previous review (commit `cb8d4b0`)

Previous review (commit `915b524`)

Previous review (commit `04aa391`)

Previous review (commit `acaea33`)

Previous review (commit `acf73d5`)

Previous review (commit `d7bdb16`)

Previous review (commit `7e011db`)

Previous review (commit `b524d83`)

Incremental Review (`b2d5bc7`..`79a9fe6`)

Previous review (commit `79a9fe6`)

Incremental Review (`b2d5bc7`..`79a9fe6`)

Previous review (commit `b2d5bc7`)

Incremental Review (`e0b6369`..HEAD)