Skip to content

Watch-first ingestion with (optional!) audit-based attribution#174

Open
sunib wants to merge 37 commits into
mainfrom
investigate
Open

Watch-first ingestion with (optional!) audit-based attribution#174
sunib wants to merge 37 commits into
mainfrom
investigate

Conversation

@sunib

@sunib sunib commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

GitOps Reverser mirrors changes from the Kubernetes API server into clean manifests in Git. Until now, the ingestion path was audit-first: which requires configuring kube-apiserver audit delivery to GitOps Reverser. That works well on clusters where the control plane is configurable, which is not always possible (AKS, GKE etc.).

Do note that Redis (or Valkey) is still a required dependency, since we are also going to introduce HA support on short term.

This PR investigates and implements a watch-first ingestion model. Kubernetes watches now provide the source of object state, while audit events are used only for author attribution when they are available. That simplifies the code path and makes installation easier: clusters without audit webhook support can still run in committer-only mode, while clusters with audit delivery get per-user commit attribution.

Open points:

  • Should there be an explicit opt-in or opt-out for audit events? Or do we just accept them and apply them (that's what I do now).
  • Fix metrics
  • Add missing e2e tests

This also improves several related areas:

  • deletecollection now becomes normal per-resource delete events through watch delivery, so we no longer need to infer affected objects from audit data.
  • Object bodies are no longer stored in Redis. Redis now holds attribution facts and watch cursors, reducing Secret exposure compared with the old audit body queue.
  • The data model is clearer: each GitTarget owns the watches it needs, instead of sharing/reusing a more complex global materialization model.
  • The new mutation capture lab records and compares admission, audit, and watch behavior. This gives us independent test tooling for new Kubernetes versions and future ingestion decisions.
  • Removes the need for apiservice-audit-proxy in aggregated API situations: there appears to be no shallow-body problem in watch events (if I would have known that earlier!).

Fixes: #168

sunib added 2 commits June 24, 2026 10:39
…erate testing project to exactly get how the kube-api versions are responding to edge cases
… structure way (so that we can also use this to evalaute new (or older) Kubernetes versions
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • ✅ Review completed - (🔄 Check again to review again)
📝 Walkthrough

Walkthrough

Adds a validating admission webhook, a mutation-capture lab binary and recorders, watch-state streaming, normalization and corpus generation, and e2e scenarios with committed fixtures and deployment scripts.

Changes

Mutation capture lab and admission webhook

Layer / File(s) Summary
Admission webhook wiring
cmd/main.go, cmd/main_audit_server_test.go, internal/webhook/*, config/*
cmd/main.go adds admission-webhook flags, validation, and webhook-server wiring; manifests expose the HTTPS listener, certificate, and validating webhook configuration.
Lab record model, store, and API
internal/mutationlab/record.go, internal/mutationlab/store/*, internal/mutationlab/labserver/*, internal/mutationlab/recorder/*
The lab record shape, in-memory store, records API, health endpoints, GVR parsing utilities, and recorder handlers are added with unit tests.
Normalization and corpus
go.mod, internal/mutationlab/normalize/*, internal/mutationlab/corpus/golden.*
Normalization rewrites volatile Kubernetes fields into placeholders, and the golden corpus helpers build, compare, and write deterministic YAML snapshots.
Watch-state stream plumbing
cmd/main.go, internal/queue/*, internal/watch/*, internal/telemetry/exporter.go
The parallel watch-state stream is wired through the main flag set, Redis stream storage, watch-manager lifecycle, materialization transitions, comparison metrics, and restart logic with tests.
Lab runtime and packaging
cmd/mutation-capture-lab/main.go, test/mutationlab/Dockerfile, test/mutationlab/swap-image.sh, hack/e2e/lab-port-forward.sh, test/e2e/Taskfile.yml, test/mutationlab/README.md, test/mutationlab/e2e/doc.go, test/mutationlab/corpus/CLUSTER.md, docs/design/*, test/e2e/gittarget_overlap_e2e_test.go
The lab binary, container image, cluster swap script, port-forward helper, Taskfile targets, and design/readme docs are added for building, deploying, and running the lab.
ConfigMap scenarios and corpus
test/mutationlab/e2e/harness_test.go, test/mutationlab/e2e/configmap_scenarios_test.go, test/mutationlab/corpus/configmap/*
The shared harness and ConfigMap scenarios cover create, update, dry-run, reject, deletecollection, conflict, finalizer-delete, and owner-ref-cascade flows with matching fixtures.
Workload, CRD, and aggregated API scenarios
test/mutationlab/e2e/aggregated_api_test.go, test/mutationlab/e2e/crd_conversion_test.go, test/mutationlab/e2e/workload_scenarios_test.go, test/mutationlab/corpus/deployment/*, test/mutationlab/corpus/pod/*, test/mutationlab/corpus/widget/*, test/mutationlab/corpus/flunder/*
Aggregated API, CRD conversion, and workload scenarios add live tests plus the corresponding deployment, pod, widget, and flunder fixtures.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

I hop through webhooks, soft and bright,
and sniff the corpus in the light.
With watch-state streams and carrot cheer,
my whiskers twinkle, year by year.
Thump! The lab is tame and near.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.02% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description explains the goal but does not follow the required template and omits the Type of Change, Testing, and Checklist sections. Add the template sections with selected change type(s), testing performed, checklist items, and any screenshots/notes if relevant.
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately summarizes the main shift toward watch-first ingestion with optional audit attribution.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch investigate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a validating admission webhook to the main controller and adds a new "mutation-capture lab" utility (cmd/mutation-capture-lab) along with its associated packages under internal/mutationlab/ to record and normalize Kubernetes watch, audit, and admission events into a versioned corpus. Feedback on the changes highlights a concurrency issue in the lab's shutdown sequence where multiple HTTP servers are shut down sequentially using a single shared timeout context, which could prevent subsequent servers from shutting down gracefully.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +163 to +165
for _, s := range servers {
_ = s.srv.Shutdown(shutdownCtx)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling Shutdown sequentially on multiple servers using a single shared timeout context (shutdownCtx) can prevent subsequent servers from shutting down gracefully if a preceding server takes a long time to shut down. If the first server takes the entire 5 seconds to shut down, the context will expire, and the remaining servers will immediately fail to shut down gracefully.

We should shut down the servers concurrently.

	var shutdownWg sync.WaitGroup
	for _, s := range servers {
		shutdownWg.Add(1)
		go func(srv *http.Server) {
			defer shutdownWg.Done()
			_ = srv.Shutdown(shutdownCtx)
		}(s.srv)
	}
	shutdownWg.Wait()

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (2)
hack/e2e/lab-port-forward.sh (1)

71-71: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Optional: rename the unused loop counter to _.

Shellcheck SC2034 flags i as unused; the loop body never references it.

♻️ Proposed tweak
-for i in {1..15}; do
+for _ in {1..15}; do
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hack/e2e/lab-port-forward.sh` at line 71, The loop counter in the
port-forward retry loop is unused, triggering Shellcheck SC2034. Update the
`for` loop in `lab-port-forward.sh` to use `_` instead of `i` since the body
never references the counter, keeping the retry behavior unchanged.

Source: Linters/SAST tools

test/mutationlab/e2e/harness_test.go (1)

232-256: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

do can hang indefinitely: http.DefaultClient has no timeout.

The 30s deadline only bounds the retry loop between attempts. A single http.DefaultClient.Do has no client-side timeout, so if the port-forward accepts the connection but stalls without responding (a plausible failure for the watchdog-restarted forward described in the comment), the call blocks forever and the loop deadline is never re-evaluated. Use a client with an explicit per-request timeout (or a request context with deadline).

♻️ Proposed fix
-func (h *harness) do(method, path string) ([]byte, int, error) {
-	var lastErr error
-	deadline := time.Now().Add(30 * time.Second)
-	for time.Now().Before(deadline) {
-		req, err := http.NewRequest(method, h.apiURL+path, nil)
-		if err != nil {
-			return nil, 0, err
-		}
-		resp, err := http.DefaultClient.Do(req)
+func (h *harness) do(method, path string) ([]byte, int, error) {
+	var lastErr error
+	client := &http.Client{Timeout: 10 * time.Second}
+	deadline := time.Now().Add(30 * time.Second)
+	for time.Now().Before(deadline) {
+		req, err := http.NewRequest(method, h.apiURL+path, nil)
+		if err != nil {
+			return nil, 0, err
+		}
+		resp, err := client.Do(req)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/mutationlab/e2e/harness_test.go` around lines 232 - 256, The harness do
method can block forever because http.DefaultClient.Do has no per-request
timeout, so the loop deadline is never checked during a stalled request. Update
do to use a client or request context with an explicit timeout, and keep the
existing retry/deadline logic in h.do so each attempt fails fast before the 30s
overall limit is reached.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/main.go`:
- Around line 708-710: The admission webhook port validation in main’s startup
config check only rejects non-positive values and still allows values above the
valid TCP range. Update the validation around cfg.admissionWebhookPort to reject
ports greater than 65535 as well, and keep the error message clear in the same
check. Also add an invalid-flag test case covering
--admission-webhook-port=65536 to ensure the new range check is enforced.

In `@config/webhook/kustomization.yaml`:
- Around line 4-5: The kustomization.yaml entry currently leaves an extra
trailing blank line, which triggers YAMLlint. Update the kustomization content
so the final listed resource/reference (the validating-webhook.yaml entry) is
the last line in the file, with no blank line after it.

In `@config/webhook/validating-webhook.yaml`:
- Line 52: The YAML file has an extra trailing blank line that triggers the
empty-lines lint error; remove the final empty line so the document ends
cleanly. Update the webhook manifest content in validating-webhook.yaml and
verify there are no stray blank lines after the last YAML entry.
- Around line 38-51: Shorten the validating webhook timeout and clean up the
YAML formatting: in the webhook configuration with the rules block and
timeoutSeconds setting, reduce timeoutSeconds from 10 to a few seconds so all
CREATE/UPDATE/DELETE requests are less likely to stall, and remove the trailing
blank line at the end of the file to satisfy the repo’s YAML lint rules.

In `@docs/design/mutation-capture-lab-design.md`:
- Around line 21-23: The deletecollection admission contract text is
inconsistent with the captured behavior and corpus references. Update the
mutation-capture design doc so the `deletecollection` row and any related
matrix/layout text consistently describe per-object validating admission calls
and per-object corpus outputs, and make sure the references in the
`deletecollection` section match the actual captured files and stack context
used by `admission.delete.collection.yaml` and the per-object
`admission.delete.cm-*` entries.
- Around line 24-27: The document has two conflicting statements about
deletecollection audit bodies, so reconcile them by making one versioned source
of truth in the design doc. Update the relevant sections around the
deletecollection discussion and the later guidance so they agree on whether
`responseObject` contains removed objects for the named cluster version, or
clearly label one as an earlier hypothesis that was superseded. Use the
`deletecollection` audit event and `responseObject` references to locate both
passages and align their wording.
- Around line 639-646: The setup description is inconsistent about how `task
lab-e2e` runs the lab: the isolated image-swap flow in `swap-image.sh` says it
reuses the already-prepared e2e cluster, but the M0/setup wording claims it
brings up its own k3d profile. Update the M0/setup text and any related wording
in the design doc to match the image-swap strategy, using the `task lab-e2e` and
`test/mutationlab/swap-image.sh` references consistently and removing any
mention of a separate cluster bring-up.

In `@internal/mutationlab/record.go`:
- Line 57: The Record.Key field currently uses omitempty on a plain ObjectKey,
which will still serialize as an empty object and mislead callers. Update the
Record struct to either remove omitempty from Key in record.go, or change Key to
*ObjectKey if it is meant to be optional; use the Record type and its Key field
as the place to make this adjustment.

In `@test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml`:
- Around line 53-56: Normalize the cluster-specific
`userInfo.extra["authentication.kubernetes.io/credential-id"]` value in the
admission/audit corpus so it stays stable across runs. Update the relevant
corpus normalization path to replace this fingerprint with a fixed placeholder,
or normalize the entire `userInfo.extra` map if that is the shared pattern, and
ensure the fixture under the admission create-succeeds case follows the same
rule.

In
`@test/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yaml`:
- Around line 60-63: The audit fixture still hardcodes a concrete
authentication.kubernetes.io/credential-id fingerprint, so update this corpus
entry to use a placeholder value instead of a real cluster-specific SHA256.
Locate the audit payload fixture under the deletecollection corpus and replace
the literal credential-id in the extra section so compare/write logic sees a
normalized, stable value.

In `@test/mutationlab/e2e/harness_test.go`:
- Around line 221-226: clearRecords currently only checks the error from
harness.do and can silently accept non-2xx responses, leaving stale records
behind. Update clearRecords in harness_test.go to validate the HTTP status
returned by h.do for the "/records" DELETE call, and fail the test unless it is
the expected http.StatusNoContent; keep the existing test helper flow and use
the clearRecords and do methods to locate the change.

---

Nitpick comments:
In `@hack/e2e/lab-port-forward.sh`:
- Line 71: The loop counter in the port-forward retry loop is unused, triggering
Shellcheck SC2034. Update the `for` loop in `lab-port-forward.sh` to use `_`
instead of `i` since the body never references the counter, keeping the retry
behavior unchanged.

In `@test/mutationlab/e2e/harness_test.go`:
- Around line 232-256: The harness do method can block forever because
http.DefaultClient.Do has no per-request timeout, so the loop deadline is never
checked during a stalled request. Update do to use a client or request context
with an explicit timeout, and keep the existing retry/deadline logic in h.do so
each attempt fails fast before the 30s overall limit is reached.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 321027e0-fe96-4101-9dba-8bdfb939a640

📥 Commits

Reviewing files that changed from the base of the PR and between 1eb59fd and ec49609.

📒 Files selected for processing (57)
  • cmd/main.go
  • cmd/main_audit_server_test.go
  • cmd/mutation-capture-lab/main.go
  • config/certs/certificates.yaml
  • config/deployment.yaml
  • config/kustomization.yaml
  • config/service.yaml
  • config/webhook/kustomization.yaml
  • config/webhook/validating-webhook.yaml
  • docs/design/mutation-capture-lab-design.md
  • go.mod
  • hack/e2e/lab-port-forward.sh
  • internal/mutationlab/corpus/golden.go
  • internal/mutationlab/corpus/golden_test.go
  • internal/mutationlab/labserver/api.go
  • internal/mutationlab/labserver/api_test.go
  • internal/mutationlab/labserver/resources.go
  • internal/mutationlab/labserver/resources_test.go
  • internal/mutationlab/normalize/normalize.go
  • internal/mutationlab/normalize/normalize_test.go
  • internal/mutationlab/record.go
  • internal/mutationlab/recorder/admission.go
  • internal/mutationlab/recorder/audit.go
  • internal/mutationlab/recorder/recorder_test.go
  • internal/mutationlab/recorder/scenario.go
  • internal/mutationlab/recorder/watch.go
  • internal/mutationlab/store/store.go
  • internal/mutationlab/store/store_test.go
  • internal/webhook/admission_allow_handler.go
  • internal/webhook/admission_allow_handler_test.go
  • test/e2e/Taskfile.yml
  • test/e2e/gittarget_overlap_e2e_test.go
  • test/mutationlab/Dockerfile
  • test/mutationlab/README.md
  • test/mutationlab/corpus/CLUSTER.md
  • test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml
  • test/mutationlab/corpus/configmap/create-succeeds/audit.create.yaml
  • test/mutationlab/corpus/configmap/create-succeeds/watch.added.yaml
  • test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-a.yaml
  • test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-b.yaml
  • test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-c.yaml
  • test/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yaml
  • test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-a.yaml
  • test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-b.yaml
  • test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-c.yaml
  • test/mutationlab/corpus/configmap/dry-run-create/admission.create.yaml
  • test/mutationlab/corpus/configmap/dry-run-create/audit.create.yaml
  • test/mutationlab/corpus/configmap/record-and-reject/admission.create.yaml
  • test/mutationlab/corpus/configmap/record-and-reject/audit.create.yaml
  • test/mutationlab/corpus/configmap/update/admission.update.yaml
  • test/mutationlab/corpus/configmap/update/audit.update.yaml
  • test/mutationlab/corpus/configmap/update/watch.modified.yaml
  • test/mutationlab/e2e/create_succeeds_test.go
  • test/mutationlab/e2e/doc.go
  • test/mutationlab/e2e/harness_test.go
  • test/mutationlab/e2e/m1_scenarios_test.go
  • test/mutationlab/swap-image.sh

Comment thread cmd/main.go
Comment thread config/webhook/kustomization.yaml Outdated
Comment thread config/webhook/validating-webhook.yaml Outdated
Comment thread config/webhook/validating-webhook.yaml Outdated
Comment thread docs/design/mutation-capture-lab-design.md Outdated
Comment thread docs/design/mutation-capture-lab-design.md Outdated
Source Source `json:"source"`
Scenario string `json:"scenario,omitempty"`
ObservedAt time.Time `json:"observedAt"`
Key ObjectKey `json:"key,omitempty"`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Does Go encoding/json omitempty work on non-pointer struct fields?

💡 Result:

Yes—omitempty can omit non-pointer struct fields, but only when the field’s value is considered “empty” by encoding/json’s definition of emptiness. In encoding/json, the “omitempty” option omits a struct field from the output if the field value is an empty Go value, defined as false, 0, a nil pointer, a nil interface value, and any array/slice/map/string of length zero [1]. A struct value is not inherently an “empty value” under that definition, so a non-pointer struct field will usually still be emitted. Concrete implications for non-pointer struct fields: 1) If the field is a non-pointer struct type (e.g., a nested struct), encoding/json does not treat it as empty just because its subfields are zero; omitempty generally does not omit it. This is why time.Time (a non-pointer struct) does not get omitted with omitempty in many cases—zero time.Time is still a struct value, and the encoder can’t decide it’s empty based solely on subfields [2][3]. 2) However, if the struct field is actually of a type that can be treated as empty by encoding/json’s rules (for example, via nil being possible only for pointers/interfaces, not plain struct values), then omitempty can omit it—but for plain non-pointer struct values, that usually doesn’t apply. Recommended pattern: use pointers for optional nested structs/values, so the field can be nil and thus “empty” to encoding/json’s omitempty logic [4][5]. Alternative (Go 1.24+): use the “omitzero” tag, which is designed to omit struct fields when their value is the zero value (and can use an IsZero method), allowing non-pointer types like time.Time to be omitted when they’re zero [1][6].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the relevant files and definitions.
git ls-files internal/mutationlab/record.go internal/mutationlab | sed -n '1,120p'
echo '--- record.go ---'
cat -n internal/mutationlab/record.go
echo '--- ObjectKey search ---'
rg -n "type ObjectKey|ObjectKey struct|ObjectKey" internal -g '!**/*_test.go'

Repository: ConfigButler/gitops-reverser

Length of output: 7786


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the relevant files and definitions.
git ls-files internal/mutationlab/record.go internal/mutationlab | sed -n '1,120p'
echo '--- record.go ---'
cat -n internal/mutationlab/record.go
echo '--- ObjectKey search ---'
rg -n "type ObjectKey|ObjectKey struct|ObjectKey" internal -g '!**/*_test.go'

Repository: ConfigButler/gitops-reverser

Length of output: 7786


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the relevant files and definitions.
git ls-files internal/mutationlab/record.go internal/mutationlab | sed -n '1,120p'
echo '--- record.go ---'
cat -n internal/mutationlab/record.go
echo '--- ObjectKey search ---'
rg -n "type ObjectKey|ObjectKey struct|ObjectKey" internal -g '!**/*_test.go'

Repository: ConfigButler/gitops-reverser

Length of output: 7786


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the relevant file and the ObjectKey type before rewriting the review comment.
echo '--- internal/mutationlab/record.go ---'
cat -n internal/mutationlab/record.go

echo '--- ObjectKey definition(s) ---'
rg -n "type ObjectKey|ObjectKey struct|type .*ObjectKey" internal -g '!**/*_test.go'

echo '--- surrounding matches ---'
rg -n -A3 -B3 "ObjectKey" internal/mutationlab internal -g '!**/*_test.go'

Repository: ConfigButler/gitops-reverser

Length of output: 21090


Remove omitempty from Record.Key

encoding/json does not treat a plain struct field as empty, so Key will still be emitted as "key": {} when all of its fields are zero. If the field should be optional, make it *ObjectKey; otherwise drop omitempty to avoid misleading behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/mutationlab/record.go` at line 57, The Record.Key field currently
uses omitempty on a plain ObjectKey, which will still serialize as an empty
object and mislead callers. Update the Record struct to either remove omitempty
from Key in record.go, or change Key to *ObjectKey if it is meant to be
optional; use the Record type and its Key field as the place to make this
adjustment.

Comment thread test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml Outdated
Comment thread test/mutationlab/e2e/harness_test.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
internal/mutationlab/normalize/normalize.go (2)

398-420: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Don't rewrite namespaces with global substring replacement.

replaceNamespaces rewrites every matching substring in the URI/key, not just the namespace field/segment. A path like /api/v1/namespaces/lab/configmaps/lab would normalize both lab occurrences, corrupting the object name as well as the namespace. This should replace only the namespace segment / JSON value, not every textual match.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/mutationlab/normalize/normalize.go` around lines 398 - 420, The
namespace normalization in replaceNamespaces is too broad because it does a
global substring replace, which can rewrite unrelated occurrences like object
names inside paths or keys. Update replaceNamespaces and its call sites such as
rewriteKey so namespaces are replaced only in the actual namespace segment or
JSON value being normalized, not every matching substring; keep the placeholder
logic in indices/nsByLen/ns and preserve the IP rewriting behavior in
rewriteKey.

201-220: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Normalize authentication.kubernetes.io/credential-id before writing corpus.

The array path here only rewrites sourceIPs, so user.extra["authentication.kubernetes.io/credential-id"] survives verbatim. The new corpus already commits raw fingerprint values in test/mutationlab/corpus/deployment/scale-patch/audit.patch.yaml, Lines 41-42, and test/mutationlab/corpus/pod/graceful-delete/admission.delete.yaml, Lines 210-212. A rotated admin cert or fresh cluster bootstrap will change those values and fail corpus comparison even when behavior is unchanged.

Also applies to: 344-357

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/mutationlab/normalize/normalize.go` around lines 201 - 220, The
corpus normalization in collector.collectScalar only special-cases sourceIPs, so
authentication.kubernetes.io/credential-id is still written verbatim and causes
unstable corpus diffs. Add normalization for that key in collectScalar, ideally
by rewriting its values to a deterministic token before they reach c.orderedFor,
and keep the handling scoped to the existing collector logic so user.extra
values are normalized consistently across corpus generation.
🧹 Nitpick comments (1)
test/mutationlab/e2e/m2_scenarios_test.go (1)

90-90: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Rename these tests to the repo's TestFunctionName_Scenario form.

TestStatusSubresource, TestScaleSubresource, and TestGracefulDelete don't follow the *_test.go naming convention the repo documents. TestStatusSubresource_WatchOnly, etc., would align with the rest of the suite. As per coding guidelines, "Follow naming convention 'TestFunctionName_Scenario(t *testing.T)' for Go tests".

Also applies to: 127-127, 190-190

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/mutationlab/e2e/m2_scenarios_test.go` at line 90, Rename the Go test
functions to match the repo’s TestFunctionName_Scenario convention so they align
with the rest of the suite. Update TestStatusSubresource, TestScaleSubresource,
and TestGracefulDelete in m2_scenarios_test.go to scenario-style names such as
TestStatusSubresource_WatchOnly, and keep the rest of each test body unchanged.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/mutation-capture-lab-design.md`:
- Around line 280-281: The matrix row about `matchPolicy: Equivalent` overstates
that the webhook always sees the submitted version; update the wording in the
design note to reflect that the apiserver may send a converted object while
`requestKind` and `requestResource` preserve the original version, or change the
policy mention to `Exact` if the intent is to require the raw submitted version.
Use the existing `matchPolicy`, `requestKind`, `requestResource`, and
`apiVersions: ['*']` references in the same section to keep the statement
aligned with how the recorder observes the admission request.

In `@test/mutationlab/e2e/m2_scenarios_test.go`:
- Around line 171-172: The gracefulPod scenario uses the mutable busybox:1.36
tag, which can change the recorded imageID and drift the golden corpus. Update
the pod spec in the gracefulPod setup to pin the image by digest, or adjust the
normalization/corpus generation around the pod status imageID so it is scrubbed
consistently; use the gracefulPod test case and the image field in
m2_scenarios_test.go as the place to fix it.

---

Outside diff comments:
In `@internal/mutationlab/normalize/normalize.go`:
- Around line 398-420: The namespace normalization in replaceNamespaces is too
broad because it does a global substring replace, which can rewrite unrelated
occurrences like object names inside paths or keys. Update replaceNamespaces and
its call sites such as rewriteKey so namespaces are replaced only in the actual
namespace segment or JSON value being normalized, not every matching substring;
keep the placeholder logic in indices/nsByLen/ns and preserve the IP rewriting
behavior in rewriteKey.
- Around line 201-220: The corpus normalization in collector.collectScalar only
special-cases sourceIPs, so authentication.kubernetes.io/credential-id is still
written verbatim and causes unstable corpus diffs. Add normalization for that
key in collectScalar, ideally by rewriting its values to a deterministic token
before they reach c.orderedFor, and keep the handling scoped to the existing
collector logic so user.extra values are normalized consistently across corpus
generation.

---

Nitpick comments:
In `@test/mutationlab/e2e/m2_scenarios_test.go`:
- Line 90: Rename the Go test functions to match the repo’s
TestFunctionName_Scenario convention so they align with the rest of the suite.
Update TestStatusSubresource, TestScaleSubresource, and TestGracefulDelete in
m2_scenarios_test.go to scenario-style names such as
TestStatusSubresource_WatchOnly, and keep the rest of each test body unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 56f06b1b-0d31-4cf8-a9c0-4822cbb804bb

📥 Commits

Reviewing files that changed from the base of the PR and between ec49609 and 876ff60.

📒 Files selected for processing (31)
  • docs/design/mutation-capture-lab-design.md
  • internal/mutationlab/normalize/normalize.go
  • internal/mutationlab/normalize/normalize_test.go
  • test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml
  • test/mutationlab/corpus/configmap/create-succeeds/audit.create.yaml
  • test/mutationlab/corpus/configmap/create-succeeds/watch.added.yaml
  • test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-a.yaml
  • test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-b.yaml
  • test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-c.yaml
  • test/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yaml
  • test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-a.yaml
  • test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-b.yaml
  • test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-c.yaml
  • test/mutationlab/corpus/configmap/dry-run-create/admission.create.yaml
  • test/mutationlab/corpus/configmap/dry-run-create/audit.create.yaml
  • test/mutationlab/corpus/configmap/record-and-reject/admission.create.yaml
  • test/mutationlab/corpus/configmap/record-and-reject/audit.create.yaml
  • test/mutationlab/corpus/configmap/update/admission.update.yaml
  • test/mutationlab/corpus/configmap/update/audit.update.yaml
  • test/mutationlab/corpus/configmap/update/watch.modified.yaml
  • test/mutationlab/corpus/deployment/scale-patch/audit.patch.yaml
  • test/mutationlab/corpus/deployment/scale-patch/watch.modified.1.yaml
  • test/mutationlab/corpus/deployment/scale-patch/watch.modified.2.yaml
  • test/mutationlab/corpus/deployment/status-update/watch.modified.1.yaml
  • test/mutationlab/corpus/deployment/status-update/watch.modified.2.yaml
  • test/mutationlab/corpus/pod/graceful-delete/admission.delete.yaml
  • test/mutationlab/corpus/pod/graceful-delete/watch.deleted.yaml
  • test/mutationlab/corpus/pod/graceful-delete/watch.modified.yaml
  • test/mutationlab/e2e/harness_test.go
  • test/mutationlab/e2e/m2_scenarios_test.go
  • test/mutationlab/swap-image.sh
✅ Files skipped from review due to trivial changes (21)
  • test/mutationlab/corpus/configmap/update/watch.modified.yaml
  • test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-b.yaml
  • test/mutationlab/corpus/deployment/scale-patch/watch.modified.1.yaml
  • test/mutationlab/corpus/pod/graceful-delete/watch.deleted.yaml
  • test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-a.yaml
  • test/mutationlab/corpus/configmap/record-and-reject/admission.create.yaml
  • test/mutationlab/corpus/configmap/create-succeeds/watch.added.yaml
  • test/mutationlab/corpus/configmap/dry-run-create/audit.create.yaml
  • test/mutationlab/corpus/configmap/update/audit.update.yaml
  • test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-b.yaml
  • test/mutationlab/corpus/pod/graceful-delete/watch.modified.yaml
  • test/mutationlab/corpus/deployment/status-update/watch.modified.2.yaml
  • test/mutationlab/corpus/configmap/deletecollection/watch.deleted.cm-c.yaml
  • test/mutationlab/corpus/configmap/dry-run-create/admission.create.yaml
  • test/mutationlab/corpus/configmap/record-and-reject/audit.create.yaml
  • test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml
  • test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-a.yaml
  • test/mutationlab/corpus/deployment/scale-patch/watch.modified.2.yaml
  • test/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yaml
  • test/mutationlab/corpus/deployment/status-update/watch.modified.1.yaml
  • test/mutationlab/corpus/configmap/create-succeeds/audit.create.yaml
🚧 Files skipped from review as they are similar to previous changes (4)
  • test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-c.yaml
  • test/mutationlab/corpus/configmap/update/admission.update.yaml
  • test/mutationlab/swap-image.sh
  • test/mutationlab/e2e/harness_test.go

Comment thread docs/design/mutation-capture-lab-design.md Outdated
Comment on lines +171 to +172
Image: "busybox:1.36",
Command: []string{"sleep", "3600"},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Pin this Pod image by digest or scrub imageID from normalization.

The committed corpus for this scenario preserves status.containerStatuses[].imageID in test/mutationlab/corpus/pod/graceful-delete/admission.delete.yaml, Lines 162-164, but gracefulPod pulls busybox:1.36 by mutable tag. If that tag is republished or resolves to a different digest, the golden corpus will drift even though the control-plane behavior under test did not change.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/mutationlab/e2e/m2_scenarios_test.go` around lines 171 - 172, The
gracefulPod scenario uses the mutable busybox:1.36 tag, which can change the
recorded imageID and drift the golden corpus. Update the pod spec in the
gracefulPod setup to pin the image by digest, or adjust the normalization/corpus
generation around the pod status imageID so it is scrubbed consistently; use the
gracefulPod test case and the image field in m2_scenarios_test.go as the place
to fix it.

@kilo-code-bot

kilo-code-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Code Review Summary

Status: No Issues Found | Recommendation: Merge

No new issues were found in the incremental changes. The changes in this PR are cleanup operations:

  • Removal of the obsolete ServiceAccountNamingPolicy (SANamePolicyBot) from the author resolver
  • Removal of AttributionServiceAccountCollapsed constant
  • Removal of legacy late nudge code from materialization.go
  • Removal of materializer fields from manager.go
  • Documentation updates for watch-first architecture

All previously identified issues have been addressed in prior commits.

Previous Review Summaries (16 snapshots, latest commit 1c61666)

Current summary above is authoritative. Previous snapshots are kept for context only.

Previous review (commit 1c61666)

Status: 1 Issue Found | Recommendation: Address before merge

Overview

Severity Count
WARNING 1
Issue Details (click to expand)

WARNING

File Line Issue
test/e2e/foreign_content_e2e_test.go 55 Error discarded without explanatory comment
Files Reviewed (24 files)
  • docs/design/gitpath-foreign-content-stringency.md - no issues
  • internal/controller/gittarget_controller.go - no issues
  • internal/controller/gittarget_kstatus_test.go - no issues
  • internal/controller/stream_status.go - no issues
  • internal/git/bootstrapped-repo-template/.gittargetignore - no issues
  • internal/git/bootstrapped_repo_template.go - no issues
  • internal/git/branch_worker_split_test.go - no issues
  • internal/git/branch_worker_test.go - no issues
  • internal/git/git_operations_test.go - no issues
  • internal/git/gittargetignore_writer_test.go - no issues
  • internal/git/plan_flush.go - no issues
  • internal/git/resync_flush.go - no issues
  • internal/manifestanalyzer/acceptance.go - no issues
  • internal/manifestanalyzer/acceptance_refusal.go - no issues
  • internal/manifestanalyzer/analyzer.go - no issues
  • internal/manifestanalyzer/analyzer_test.go - no issues
  • internal/manifestanalyzer/gittargetignore.go - no issues
  • internal/manifestanalyzer/gittargetignore_test.go - no issues
  • internal/manifestanalyzer/scan.go - no issues
  • internal/manifestanalyzer/store.go - no issues
  • internal/watch/event_router.go - no issues
  • internal/watch/event_router_test.go - no issues
  • test/e2e/foreign_content_e2e_test.go - 1 issue
  • test/e2e/watchrule_configmap_secret_e2e_test.go - no issues

Fix these issues in Kilo Cloud

Previous review (commit 12f3aa2)

Status: 1 Issue Found | Recommendation: Address before merge

Overview

Severity Count
WARNING 1
Issue Details (click to expand)

WARNING

File Line Issue
internal/controller/watchrule_controller_test.go 285 Test asserts GitTargetReady=False but controller produces Unknown for unevaluated GitTarget
Files Reviewed (32 files)
  • api/v1alpha2/clusterwatchrule_types.go - API additions
  • api/v1alpha2/gittarget_types.go - Phase removal, GitPathAccepted added
  • api/v1alpha2/watchrule_types.go - GitTargetReady printer column
  • config/crd/bases/configbutler.ai_clusterwatchrules.yaml - CRD update
  • config/crd/bases/configbutler.ai_gittargets.yaml - CRD update
  • config/crd/bases/configbutler.ai_watchrules.yaml - CRD update
  • docs/architecture.md - Watch-first and condition docs
  • docs/configuration.md - Updated status docs
  • docs/design/e2e-coverage-gaps-and-improvements-plan.md - Test D marked implemented
  • docs/design/gitpath-foreign-content-stringency.md - New design doc
  • docs/design/status-conditions-guide.md - Status conditions updated
  • docs/design/unsupported-folder-refusal-plan.md - Refusal plan revised
  • internal/controller/clusterwatchrule_controller.go - GitTargetReady condition
  • internal/controller/clusterwatchrule_controller_test.go - GitTargetReady assertions
  • internal/controller/constants.go - New condition types
  • internal/controller/gittarget_controller.go - Folder→Path rename, phase removal
  • internal/controller/gittarget_controller_test.go - Phase assertion removed
  • internal/controller/gittarget_dependency_status.go - New GitTargetReady mirror
  • internal/controller/gittarget_kstatus_test.go - GitPathAccepted test updates
  • internal/controller/gittarget_status_test.go - Removed phase, folder→path
  • internal/controller/stream_status.go - GitTargetReady stall logic
  • internal/controller/stream_status_test.go - New test for GitTargetReady stall
  • internal/controller/watchrule_controller.go - GitTargetReady condition
  • internal/controller/watchrule_controller_test.go - GitTargetReady assertion bug
  • internal/manifestanalyzer/acceptance_refusal.go - folder→path in messages
  • internal/watch/event_router.go - folder→path rename
  • internal/watch/event_router_test.go - folder→path rename
  • internal/watch/git_path_acceptance.go - New file replacing folder_acceptance.go
  • internal/watch/manager.go - targetGitPathAcceptance field
  • internal/watch/target_watch.go - dropTargetGitPathAcceptanceLocked
  • test/e2e/unsupported_folder_e2e_test.go - ClusterWatchRule and GitPathAccepted checks

Fix these issues in Kilo Cloud

Previous review (commit 92fa490)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (31 files)
  • api/v1alpha2/clusterwatchrule_types.go - API type definitions
  • api/v1alpha2/gittarget_types.go - API type definitions
  • api/v1alpha2/watchrule_types.go - API type definitions
  • config/crd/bases/configbutler.ai_clusterwatchrules.yaml - Generated CRD
  • config/crd/bases/configbutler.ai_gittargets.yaml - Generated CRD
  • config/crd/bases/configbutler.ai_watchrules.yaml - Generated CRD
  • docs/architecture.md - Documentation update for structure-only acceptance gate
  • docs/configuration.md - Configuration documentation
  • docs/design/e2e-coverage-gaps-and-improvements-plan.md - Documentation update marking Test D as implemented
  • docs/design/status-conditions-guide.md - Status conditions design
  • docs/design/unsupported-folder-refusal-plan.md - Unsupported folder refusal plan
  • go.mod - Dependency updates
  • go.sum - Dependency updates
  • internal/controller/clusterwatchrule_controller.go - Controller logic
  • internal/controller/clusterwatchrule_controller_test.go - Controller tests
  • internal/controller/condition_helper.go - Condition helper functions
  • internal/controller/constants.go - Shared constants
  • internal/controller/gittarget_controller.go - GitTarget controller
  • internal/controller/gittarget_controller_test.go - Controller tests
  • internal/controller/gittarget_kstatus_test.go - KStatus tests
  • internal/controller/gittarget_status_test.go - Status tests
  • internal/controller/stream_status.go - Stream status logic
  • internal/controller/watchrule_controller.go - WatchRule controller
  • internal/controller/watchrule_controller_test.go - Controller tests
  • internal/git/acceptance_gate_test.go - Acceptance gate tests
  • internal/git/resync_flush.go - Resync flush logic
  • internal/watch/event_router.go - Event router
  • internal/watch/event_router_test.go - Event router tests
  • internal/watch/folder_acceptance.go - Folder acceptance status
  • internal/watch/manager.go - Watch manager (added targetFolderAcceptance field)
  • internal/watch/stream_readiness.go - Stream readiness (removed UnsupportedContent constant)
  • internal/watch/stream_readiness_test.go - Updated to use StreamsRunning()
  • internal/watch/target_watch.go - Target watch with forceRecheck parameter
  • test/e2e/*.go - Multiple e2e test files updated

Previous review (commit f2773a8)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (5 files)
  • docs/architecture.md - Documentation update for structure-only acceptance gate
  • docs/design/e2e-coverage-gaps-and-improvements-plan.md - Documentation update marking Test D as implemented
  • internal/git/acceptance_gate_test.go - Test improvements (require.ErrorAs, variable extraction)
  • internal/manifestanalyzer/analyzer_test.go - Added IssueUnsupportedKustomize to exhaustive test map
  • test/e2e/unsupported_folder_e2e_test.go - New e2e test for unsupported folder refusal

Previous review (commit 893e17f)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (5 files)
  • docs/architecture.md - Documentation update for structure-only acceptance gate
  • docs/design/e2e-coverage-gaps-and-improvements-plan.md - Documentation update marking Test D as implemented
  • internal/git/acceptance_gate_test.go - Test improvements (require.ErrorAs, variable extraction)
  • internal/manifestanalyzer/analyzer_test.go - Added IssueUnsupportedKustomize to exhaustive test map
  • test/e2e/unsupported_folder_e2e_test.go - New e2e test for unsupported folder refusal

Previous review (commit d09ab73)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (5 files)
  • docs/architecture.md - Documentation update for structure-only acceptance gate
  • docs/design/e2e-coverage-gaps-and-improvements-plan.md - Documentation update marking Test D as implemented
  • internal/git/acceptance_gate_test.go - Test improvements (require.ErrorAs, variable extraction)
  • internal/manifestanalyzer/analyzer_test.go - Added IssueUnsupportedKustomize to exhaustive test map
  • test/e2e/unsupported_folder_e2e_test.go - New e2e test for unsupported folder refusal

Previous review (commit cb8d4b0)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (11 files)
  • internal/git/acceptance_gate_test.go - Tests for unsupported kustomization folder refusal
  • internal/git/plan_flush.go - Added structure-only acceptance gate to live writer
  • internal/git/resync_flush.go - Added structure-only acceptance gate to resync apply
  • internal/manifestanalyzer/acceptance.go - Added AcceptStructureOnly, WriterAllowlist, unsupported-kustomize refusals
  • internal/manifestanalyzer/acceptance_refusal.go - Added AcceptanceRefusedError for writer-facing refusal
  • internal/manifestanalyzer/acceptance_test.go - Tests for new structure-only acceptance and refusal
  • internal/manifestanalyzer/store.go - Added Unsupported flag on RetainedDocument, kustomizationUsesUnsupportedFeature
  • internal/watch/event_router.go - Threaded targetWatchKey through drainScopedResync, added handleScopedResyncError
  • internal/watch/event_router_test.go - Test for refusal marking stream Blocked
  • internal/watch/stream_readiness.go - Added StreamReasonUnsupportedContent
  • internal/watch/target_watch.go - Threaded targetWatchKey through enqueueReplayResync

Previous review (commit 915b524)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (14 files)
  • docs/interpreting-metrics.md - Documentation updates for new attribution metrics
  • docs/markdown-triage-inventory.md - Minor path reference fix
  • internal/git/branch_worker.go - Added author kind labeling for commits
  • internal/git/branch_worker_metrics_test.go - Tests for author kind labeling
  • internal/git/pending_writes.go - Added createdCommit() and authorKind() helpers
  • internal/git/resync_flush.go - Added resync sweep delete metric
  • internal/git/resync_flush_test.go - Tests for resync sweep delete metric
  • internal/queue/attribution_index.go - Added AttributionResult types and resolution tracking
  • internal/queue/attribution_index_test.go - Tests for resolution outcomes
  • internal/telemetry/exporter.go - Added new metric declarations
  • internal/telemetry/exporter_test.go - Tests for new metrics
  • internal/watch/author_resolver.go - Updated to use AuthorResolution interface
  • internal/watch/author_resolver_test.go - Updated tests for resolution interface
  • internal/watch/target_watch_test.go - Updated test mocks

Previous review (commit 04aa391)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (8 files)
  • config/webhook/kustomization.yaml
  • config/webhook/validating-webhook.yaml
  • docs/interpreting-metrics.md
  • docs/markdown-triage-inventory.md
  • internal/git/branch_worker.go - queue-full error propagation implemented
  • internal/reconcile/git_target_event_stream.go - OnWatchEvent returns error on queue full
  • internal/reconcile/git_target_event_stream_test.go - tests updated for new error return
  • internal/reconcile/gittarget_lifecycle_integration_test.go - tests updated for goroutine-safe assertions
  • internal/telemetry/exporter.go - removed unused metrics (GitPushDurationSeconds, ObjectsScannedTotal, etc.)
  • internal/telemetry/exporter_test.go - tests updated for removed metrics
  • internal/watch/event_router.go - error handling for RouteToGitTargetEventStream and RouteEvent
  • internal/watch/target_watch_test.go - recordingEnqueuer.Enqueue returns bool

Previous review (commit acaea33)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (11 files)
  • docs/design/e2e-serial-registry.md
  • test/e2e/aggregated_apiserver_e2e_test.go
  • test/e2e/bi_directional_e2e_test.go
  • test/e2e/commit_window_batching_e2e_test.go
  • test/e2e/crd_lifecycle_e2e_test.go
  • test/e2e/gittarget_isolation_e2e_test.go
  • test/e2e/inplace_edit_e2e_test.go
  • test/e2e/quickstart_framework_e2e_test.go
  • test/e2e/repo_assertions_test.go
  • test/e2e/signing_e2e_test.go
  • test/e2e/watchrule_configmap_secret_e2e_test.go

Previous review (commit acf73d5)

Status: 1 Issue Found | Recommendation: Address before merge

Overview

Severity Count
WARNING 1
Issue Details (click to expand)

WARNING

File Line Issue
internal/watch/materialization.go 97 ForgetGitTargetDeclaration unconditionally deletes the UID map entry, which can race with DeclareForGitTarget for recreated GitTargets with the same namespace/name
Files Reviewed (16 files)
  • docs/design/stream/per-type-streaming-readiness-plan.md
  • docs/design/stream/streaming-readiness-status-machine-design.md
  • docs/design/stream/watch-replay-watermark-stream-readiness-investigation.md
  • internal/mutationlab/recorder/recorder_test.go
  • internal/mutationlab/recorder/watch_probe.go
  • internal/queue/attribution_index.go
  • internal/queue/attribution_index_test.go
  • internal/watch/author_resolver.go
  • internal/watch/manager.go
  • internal/watch/materialization.go - 1 issue
  • internal/watch/target_watch.go
  • internal/watch/target_watch_test.go
  • test/mutationlab/corpus/configmap/watch-replay-collapse/watch.added.yaml
  • test/mutationlab/corpus/configmap/watch-replay-collapse/watch.bookmark.yaml
  • test/mutationlab/e2e/watch_transport_test.go

Fix these issues in Kilo Cloud

Previous review (commit d7bdb16)

Status: 1 Issue Found | Recommendation: Address before merge

Overview

Severity Count
WARNING 1
Issue Details (click to expand)

WARNING

File Line Issue
internal/watch/materialization.go 97 ForgetGitTargetDeclaration unconditionally deletes the UID map entry, which can race with DeclareForGitTarget for recreated GitTargets with the same namespace/name
Files Reviewed (16 files)
  • docs/design/stream/per-type-streaming-readiness-plan.md
  • docs/design/stream/streaming-readiness-status-machine-design.md
  • docs/design/stream/watch-replay-watermark-stream-readiness-investigation.md
  • internal/mutationlab/recorder/recorder_test.go
  • internal/mutationlab/recorder/watch_probe.go
  • internal/queue/attribution_index.go
  • internal/queue/attribution_index_test.go
  • internal/watch/author_resolver.go
  • internal/watch/manager.go
  • internal/watch/materialization.go - 1 issue
  • internal/watch/target_watch.go
  • internal/watch/target_watch_test.go
  • test/mutationlab/corpus/configmap/watch-replay-collapse/watch.added.yaml
  • test/mutationlab/corpus/configmap/watch-replay-collapse/watch.bookmark.yaml
  • test/mutationlab/e2e/watch_transport_test.go

Fix these issues in Kilo Cloud

Previous review (commit 7e011db)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (16 files)
  • charts/gitops-reverser/README.md
  • charts/gitops-reverser/templates/deployment.yaml
  • charts/gitops-reverser/values.yaml
  • cmd/main.go
  • cmd/main_audit_server_test.go
  • docs/architecture.md
  • docs/configuration.md
  • internal/controller/gittarget_controller.go
  • internal/git/worker_manager.go
  • internal/queue/attribution_index.go
  • internal/queue/attribution_index_test.go
  • internal/types/reference.go
  • internal/watch/author_resolver.go
  • internal/watch/target_watch.go
  • internal/watch/target_watch_test.go
  • test/e2e/tilt_playground_e2e_test.go

Previous review (commit b524d83)

Status: No Issues Found | Recommendation: Merge

Overview

Severity Count
CRITICAL 0
WARNING 0
SUGGESTION 0

Incremental Review (b2d5bc7..79a9fe6)

No new code issues were identified in this incremental diff. The changes consist of documentation updates aligning with the watch-first architecture:

  • docs/architecture.md: Major restructuring to describe watch as the sole object-state source; audit is now purely an optional attribution lookup
  • docs/configuration.md: Updated audit/attribution settings, removing auditEventJoin and maxLen references, adding attribution.* flags
  • docs/future/ha-gittarget-distribution-plan.md: Added reconciliation note for watch-first changes
  • charts/: README and deployment template updates for new configuration structure

The incremental diff shows no logic changes in executable code paths — only documentation and configuration alignment.

Files Reviewed (16 files)
  • docs/architecture.md - architecture documentation, no code
  • docs/configuration.md - configuration documentation, no code
  • docs/future/ha-gittarget-distribution-plan.md - design doc, no code
  • charts/gitops-reverser/README.md - Helm chart docs
  • charts/gitops-reverser/values.yaml - Helm values
  • charts/gitops-reverser/templates/deployment.yaml - Helm templates
  • charts/gitops-reverser/templates/NOTES.txt - Helm templates
  • config/deployment.yaml - Deployment manifest
  • cmd/main.go - No functional changes in this diff range
  • cmd/main_audit_server_test.go - No functional changes in this diff range
  • cmd/readiness.go - No functional changes
  • test/e2e/controller_basics_e2e_test.go - No functional changes
  • test/e2e/e2e_suite_test.go - No functional changes
  • test/e2e/helpers.go - No functional changes
  • test/e2e/setup/flux/kustomization.yaml - Flux setup
  • test/e2e/setup/flux/namespaces/aggregated-api.yaml - deleted (aggregated-api removal)
  • test/e2e/setup/flux/releases/aggregated-api.yaml - deleted (aggregated-api removal)

Previous review (commit 79a9fe6)

Status: No Issues Found | Recommendation: Merge

Overview

Severity Count
CRITICAL 0
WARNING 0
SUGGESTION 0

Incremental Review (b2d5bc7..79a9fe6)

No new code issues were identified in this incremental diff. The changes consist of documentation updates aligning with the watch-first architecture:

  • docs/architecture.md: Major restructuring to describe watch as the sole object-state source; audit is now purely an optional attribution lookup
  • docs/configuration.md: Updated audit/attribution settings, removing auditEventJoin and maxLen references, adding attribution.* flags
  • docs/future/ha-gittarget-distribution-plan.md: Added reconciliation note for watch-first changes
  • charts/: README and deployment template updates for new configuration structure

The incremental diff shows no logic changes in executable code paths — only documentation and configuration alignment.

Files Reviewed (16 files)
  • docs/architecture.md - architecture documentation, no code
  • docs/configuration.md - configuration documentation, no code
  • docs/future/ha-gittarget-distribution-plan.md - design doc, no code
  • charts/gitops-reverser/README.md - Helm chart docs
  • charts/gitops-reverser/values.yaml - Helm values
  • charts/gitops-reverser/templates/deployment.yaml - Helm templates
  • charts/gitops-reverser/templates/NOTES.txt - Helm templates
  • config/deployment.yaml - Deployment manifest
  • cmd/main.go - No functional changes in this diff range
  • cmd/main_audit_server_test.go - No functional changes in this diff range
  • cmd/readiness.go - No functional changes
  • test/e2e/controller_basics_e2e_test.go - No functional changes
  • test/e2e/e2e_suite_test.go - No functional changes
  • test/e2e/helpers.go - No functional changes
  • test/e2e/setup/flux/kustomization.yaml - Flux setup
  • test/e2e/setup/flux/namespaces/aggregated-api.yaml - deleted (aggregated-api removal)
  • test/e2e/setup/flux/releases/aggregated-api.yaml - deleted (aggregated-api removal)

Previous review (commit b2d5bc7)

Status: Active Issues Found | Recommendation: Review existing CodeRabbit comments

Overview

Severity Count
CRITICAL 0
WARNING 5
SUGGESTION 1

Incremental Review (e0b6369..HEAD)

No new code issues were identified in this incremental diff. The changes consist mainly of documentation restructuring and test-file consolidation (M1→configmap_scenarios_test.go, M2→workload_scenarios_test.go, M3→crd_conversion_test.go, M4→aggregated_api_test.go), the addition of rvLess for deterministic numeric resourceVersion ordering, and an update to sourceRank() to include SourceConversion.

Previously identified issues on unchanged lines remain active (see Unresolved Findings below). The following issues from prior reviews that fell within the incremental diff are now resolved:

  • Docs inconsistencies (docs/design/mutation-capture-lab-design.md): the deletecollection admission-contract inconsistency, conflicting audit-body claim, and M0 setup-text conflict are all fixed by the restructuring.
  • Flaky E2E corpus (test/mutationlab/e2e/aggregated_api_test.go): the proxy-enriched audit event (audit-additional) is now explicitly required and always committed, removing the conditional-appending path that caused stray corpus files failures.

[Snapshot truncated.]

Additional previous summary content was truncated to keep this comment within platform limits.


Reviewed by laguna-m.1-20260312:free · Input: 533K · Output: 23.4K · Cached: 8.1M

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
internal/mutationlab/recorder/conversion_test.go (1)

30-65: 🎯 Functional Correctness | 🔵 Trivial | 💤 Low value

Optional: add a large-sizeBytes case.

Both directions only test 1024/2048, which round-trip cleanly through float64. A value above 2^53 would expose the precision/json.Number issue flagged in conversion.go.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/mutationlab/recorder/conversion_test.go` around lines 30 - 65, Add a
regression case to TestConvertWidget_RenamesFieldBothDirections that uses a very
large spec.sizeBytes value above 2^53 and verifies both convertWidget directions
preserve it correctly. The current test only covers small values that survive
json.Unmarshal as float64, so extend the existing v1->v2 and v2->v1 assertions
to catch precision loss and ensure conversion.go handles large integers without
truncation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/mutationlab/recorder/conversion.go`:
- Around line 100-140: The size parsing in convertWidget/widgetSize is losing
precision because json.Unmarshal decodes numbers into float64, so the
json.Number branch in widgetSize is never reached for sizeBytes. Update
convertWidget to decode the raw JSON with a Decoder that calls UseNumber()
before populating obj, so spec["sizeBytes"] can be preserved as json.Number and
parsed exactly; keep the existing widgetSize logic and ensure any needed bytes
import is added.

In `@test/mutationlab/e2e/m4_scenarios_test.go`:
- Around line 113-118: Remove the committed audit-additional.create.yaml from
the flunder/aggregated-api-write corpus so the golden corpus matches what
m4_scenarios_test actually generates. The corpus is built in h.syncCorpus from
corpusRecords in test/mutationlab/e2e/m4_scenarios_test.go, where enriched is
appended only when present, so the audit-additional record must not be checked
in as a required golden file. Update the corpus under
mutationlab/corpus/flunder/aggregated-api-write and keep the generated set
aligned with the conditional logic in m4_scenarios_test and Compare() behavior
in golden.go.

---

Nitpick comments:
In `@internal/mutationlab/recorder/conversion_test.go`:
- Around line 30-65: Add a regression case to
TestConvertWidget_RenamesFieldBothDirections that uses a very large
spec.sizeBytes value above 2^53 and verifies both convertWidget directions
preserve it correctly. The current test only covers small values that survive
json.Unmarshal as float64, so extend the existing v1->v2 and v2->v1 assertions
to catch precision loss and ensure conversion.go handles large integers without
truncation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9ddc6031-6047-4583-a86d-368b5d8d099e

📥 Commits

Reviewing files that changed from the base of the PR and between 876ff60 and e0b6369.

📒 Files selected for processing (33)
  • cmd/mutation-capture-lab/main.go
  • docs/design/mutation-capture-lab-design.md
  • internal/mutationlab/normalize/normalize.go
  • internal/mutationlab/normalize/normalize_test.go
  • internal/mutationlab/record.go
  • internal/mutationlab/recorder/conversion.go
  • internal/mutationlab/recorder/conversion_test.go
  • test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml
  • test/mutationlab/corpus/configmap/create-succeeds/audit.create.yaml
  • test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-a.yaml
  • test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-b.yaml
  • test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-c.yaml
  • test/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yaml
  • test/mutationlab/corpus/configmap/dry-run-create/admission.create.yaml
  • test/mutationlab/corpus/configmap/dry-run-create/audit.create.yaml
  • test/mutationlab/corpus/configmap/record-and-reject/admission.create.yaml
  • test/mutationlab/corpus/configmap/record-and-reject/audit.create.yaml
  • test/mutationlab/corpus/configmap/update/admission.update.yaml
  • test/mutationlab/corpus/configmap/update/audit.update.yaml
  • test/mutationlab/corpus/deployment/scale-patch/audit.patch.yaml
  • test/mutationlab/corpus/flunder/aggregated-api-write/audit-additional.create.yaml
  • test/mutationlab/corpus/flunder/aggregated-api-write/audit.create.yaml
  • test/mutationlab/corpus/flunder/aggregated-api-write/watch.added.yaml
  • test/mutationlab/corpus/pod/graceful-delete/admission.delete.yaml
  • test/mutationlab/corpus/widget/crd-conversion/admission.create.yaml
  • test/mutationlab/corpus/widget/crd-conversion/audit.create.yaml
  • test/mutationlab/corpus/widget/crd-conversion/conversion.to-v1.yaml
  • test/mutationlab/corpus/widget/crd-conversion/conversion.to-v2.yaml
  • test/mutationlab/corpus/widget/crd-conversion/watch.added.yaml
  • test/mutationlab/e2e/harness_test.go
  • test/mutationlab/e2e/m3_scenarios_test.go
  • test/mutationlab/e2e/m4_scenarios_test.go
  • test/mutationlab/swap-image.sh
✅ Files skipped from review due to trivial changes (16)
  • test/mutationlab/corpus/configmap/dry-run-create/audit.create.yaml
  • test/mutationlab/corpus/flunder/aggregated-api-write/audit.create.yaml
  • test/mutationlab/corpus/configmap/deletecollection/audit.deletecollection.yaml
  • test/mutationlab/corpus/configmap/record-and-reject/audit.create.yaml
  • test/mutationlab/corpus/flunder/aggregated-api-write/audit-additional.create.yaml
  • test/mutationlab/corpus/widget/crd-conversion/audit.create.yaml
  • test/mutationlab/corpus/widget/crd-conversion/conversion.to-v2.yaml
  • test/mutationlab/corpus/configmap/update/admission.update.yaml
  • test/mutationlab/corpus/widget/crd-conversion/admission.create.yaml
  • test/mutationlab/corpus/deployment/scale-patch/audit.patch.yaml
  • test/mutationlab/corpus/configmap/record-and-reject/admission.create.yaml
  • test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-a.yaml
  • test/mutationlab/corpus/configmap/create-succeeds/audit.create.yaml
  • test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-b.yaml
  • test/mutationlab/corpus/pod/graceful-delete/admission.delete.yaml
  • test/mutationlab/corpus/configmap/update/audit.update.yaml
🚧 Files skipped from review as they are similar to previous changes (10)
  • test/mutationlab/corpus/configmap/deletecollection/admission.delete.cm-c.yaml
  • test/mutationlab/corpus/configmap/create-succeeds/admission.create.yaml
  • internal/mutationlab/record.go
  • test/mutationlab/corpus/configmap/dry-run-create/admission.create.yaml
  • test/mutationlab/swap-image.sh
  • test/mutationlab/e2e/harness_test.go
  • internal/mutationlab/normalize/normalize_test.go
  • cmd/mutation-capture-lab/main.go
  • internal/mutationlab/normalize/normalize.go
  • docs/design/mutation-capture-lab-design.md

Comment on lines +100 to +140
func convertWidget(raw json.RawMessage, desiredAPIVersion string) json.RawMessage {
var obj map[string]any
if err := json.Unmarshal(raw, &obj); err != nil {
return raw
}
obj["apiVersion"] = desiredAPIVersion
spec, ok := obj["spec"].(map[string]any)
if !ok {
return remarshal(obj, raw)
}
switch versionOf(desiredAPIVersion) {
case "v2":
if n, ok := widgetSize(spec["sizeBytes"]); ok {
spec["size"] = strconv.FormatInt(n, 10)
delete(spec, "sizeBytes")
}
case "v1":
if n, ok := widgetSize(spec["size"]); ok {
spec["sizeBytes"] = n
delete(spec, "size")
}
}
return remarshal(obj, raw)
}

// widgetSize reads the Widget size from either the v1 integer or the v2 string
// form, returning the canonical integer.
func widgetSize(v any) (int64, bool) {
switch t := v.(type) {
case json.Number:
n, err := t.Int64()
return n, err == nil
case float64:
return int64(t), true
case string:
n, err := strconv.ParseInt(t, 10, 64)
return n, err == nil
default:
return 0, false
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In Go, does json.Unmarshal into map[string]any decode JSON numbers as float64 unless json.Decoder.UseNumber is set?

💡 Result:

Yes, that is correct. By default, when the Go encoding/json package unmarshals JSON numbers into an interface{} value (such as within a map[string]any), it decodes them as float64 [1][2]. If you use json.Decoder.UseNumber, the decoder will instead unmarshal JSON numbers into a json.Number type [3][4]. A json.Number is a string-based type that holds the literal text of the JSON number, which allows you to handle the number without losing precision or converting it to a float64 prematurely [5][6]. Summary of behavior: - Default: JSON numbers are decoded as float64 [1][2]. - With UseNumber: JSON numbers are decoded as json.Number [3][6].

Citations:


json.Number branch in widgetSize is unreachable; large sizeBytes lose precision.

convertWidget uses json.Unmarshal, which defaults to decoding JSON numbers as float64 when populating a map[string]any. Consequently, spec["sizeBytes"] is never a json.Number, the case json.Number in widgetSize never executes, and integers larger than 2^53 lose precision before any conversion.

Update the decoder to use UseNumber() to preserve the exact integer representation:

Proposed fix
-	var obj map[string]any
-	if err := json.Unmarshal(raw, &obj); err != nil {
-		return raw
-	}
+	var obj map[string]any
+	dec := json.NewDecoder(bytes.NewReader(raw))
+	dec.UseNumber()
+	if err := dec.Decode(&obj); err != nil {
+		return raw
-	}

Ensure "bytes" is imported.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func convertWidget(raw json.RawMessage, desiredAPIVersion string) json.RawMessage {
var obj map[string]any
if err := json.Unmarshal(raw, &obj); err != nil {
return raw
}
obj["apiVersion"] = desiredAPIVersion
spec, ok := obj["spec"].(map[string]any)
if !ok {
return remarshal(obj, raw)
}
switch versionOf(desiredAPIVersion) {
case "v2":
if n, ok := widgetSize(spec["sizeBytes"]); ok {
spec["size"] = strconv.FormatInt(n, 10)
delete(spec, "sizeBytes")
}
case "v1":
if n, ok := widgetSize(spec["size"]); ok {
spec["sizeBytes"] = n
delete(spec, "size")
}
}
return remarshal(obj, raw)
}
// widgetSize reads the Widget size from either the v1 integer or the v2 string
// form, returning the canonical integer.
func widgetSize(v any) (int64, bool) {
switch t := v.(type) {
case json.Number:
n, err := t.Int64()
return n, err == nil
case float64:
return int64(t), true
case string:
n, err := strconv.ParseInt(t, 10, 64)
return n, err == nil
default:
return 0, false
}
}
func convertWidget(raw json.RawMessage, desiredAPIVersion string) json.RawMessage {
var obj map[string]any
dec := json.NewDecoder(bytes.NewReader(raw))
dec.UseNumber()
if err := dec.Decode(&obj); err != nil {
return raw
}
obj["apiVersion"] = desiredAPIVersion
spec, ok := obj["spec"].(map[string]any)
if !ok {
return remarshal(obj, raw)
}
switch versionOf(desiredAPIVersion) {
case "v2":
if n, ok := widgetSize(spec["sizeBytes"]); ok {
spec["size"] = strconv.FormatInt(n, 10)
delete(spec, "sizeBytes")
}
case "v1":
if n, ok := widgetSize(spec["size"]); ok {
spec["sizeBytes"] = n
delete(spec, "size")
}
}
return remarshal(obj, raw)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/mutationlab/recorder/conversion.go` around lines 100 - 140, The size
parsing in convertWidget/widgetSize is losing precision because json.Unmarshal
decodes numbers into float64, so the json.Number branch in widgetSize is never
reached for sizeBytes. Update convertWidget to decode the raw JSON with a
Decoder that calls UseNumber() before populating obj, so spec["sizeBytes"] can
be preserved as json.Number and parsed exactly; keep the existing widgetSize
logic and ensure any needed bytes import is added.

Comment on lines +113 to +118
corpusRecords := []mutationlab.Record{*official, *added}
if enriched != nil {
corpusRecords = append(corpusRecords, *enriched)
}
h.syncCorpus(t, "flunder/aggregated-api-write", corpusRecords)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm the additional-audit corpus is committed and check how Compare treats missing captures.
fd -t f 'audit-additional' test/mutationlab/corpus
rg -n 'stray corpus|SourceAuditAdditional|audit-webhook-additional' internal/mutationlab cmd/mutation-capture-lab

Repository: ConfigButler/gitops-reverser

Length of output: 1729


Remove committed audit-additional.create.yaml to prevent flaky E2E failures.

The audit-additional record is conditionally appended in test/mutationlab/e2e/m4_scenarios_test.go (line 115), meaning it is absent when enriched is nil. However, test/mutationlab/corpus/flunder/aggregated-api-write/audit-additional.create.yaml is committed to the repo. Per internal/mutationlab/corpus/golden.go (line 181), if a committed file is not generated during the test run, Compare() fails with stray corpus files no longer captured. This causes the test to flake if the e2e cluster's audit proxy does not emit the additional event.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/mutationlab/e2e/m4_scenarios_test.go` around lines 113 - 118, Remove the
committed audit-additional.create.yaml from the flunder/aggregated-api-write
corpus so the golden corpus matches what m4_scenarios_test actually generates.
The corpus is built in h.syncCorpus from corpusRecords in
test/mutationlab/e2e/m4_scenarios_test.go, where enriched is appended only when
present, so the audit-additional record must not be checked in as a required
golden file. Update the corpus under
mutationlab/corpus/flunder/aggregated-api-write and keep the generated set
aligned with the conditional logic in m4_scenarios_test and Compare() behavior
in golden.go.

sunib and others added 7 commits June 25, 2026 06:16
Phase 0 ("finish the evidence") of the watch-only ingestion proposal: two new
"difficult cases" corpus rows captured against a live k8s v1.35.2 cluster.

- Row 10 (owner-ref cascade): one user delete fans out to a watch DELETED for
  both the parent and the cascaded child; the child delete is audited under
  generic-garbage-collector, not the human — concrete delete-attribution
  evidence for the conservative resolver.
- Row 13 (optimistic-concurrency conflict): a stale-resourceVersion update is
  rejected at the storage layer BEFORE validating admission runs, so audit is
  the sole witness — no admission, no watch. Stronger than rows 11/12.

Also fixes a normalizer non-determinism the cascade surfaced: owner-ref UIDs
embedded in managedFields association keys (k:{"uid":...}) were not collapsed,
churning the corpus every run. Adds uidByLen handling mirroring the existing IP
rule.

Full lab suite 13/13 deterministic in compare mode. Updates the README scenario
index and the design doc (corpus table, rows 10/13, and the rows 16/17
resync/bookmark recorder-readiness findings).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phase 1 of docs/design/watch-only-ingestion-architecture.md ("build watch state
in parallel"): for every Synced type, optionally hold a long-lived WATCH and
record each ADDED/MODIFIED/DELETED into a per-type ⌚stream, written
ALONGSIDE the authoritative :audit:stream so the watch-derived desired set can
be diffed against the audit-derived one. Off by default; changes no Git write.

- queue: AppendWatchEvent / DeleteTypeWatchStream on RedisByTypeStreamQueue,
  reusing the per-type key machinery and the :objects:items envelope shape so
  the checkpoint and watch log fold byte-identically.
- watch: a runner twinned with the audit tail (started on Synced beside
  startTypeAuditTail, stopped on Release); the resume cursor advances on
  bookmarks and resets to the live edge after repeated un-resumable (410-style)
  failures — correctness stays owned by the checkpoint, so a watch gap costs
  only freshness on this experimental stream.
- cmd: --watch-state-stream flag wiring (off by default), reusing the mirror's
  write-side client.

Unit-tested with a fake watch + writer (records/bookmark-cursor/relist-reset/
lifecycle). fmt/vet/lint/test green. Product test-e2e deferred (flag off, no
behavior change).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…1 payoff)

Slice 2 of the parallel watch-state stream: now that --watch-state-stream records
a ⌚stream, periodically fold it into a watch-derived desired set and diff
that against the audit-derived set, metering the divergence. This is the question
the corpus cannot answer — does a continuous watch reproduce the same desired
manifests the audit log does, on a live cluster? It changes no Git write.

- queue: RedisTypeSplicer.SpliceWatchType folds the checkpoint with the
  ⌚stream entries whose recorded rv > checkpoint R (the stream uses
  arrival-order auto-IDs, so the post-checkpoint slice is by the rv field,
  mirroring SpliceType's exclusive "(R" fold).
- watch: compareDesiredSets diffs the two sets by identity + sanitized body
  (agree / mismatch / audit-only / watch-only); a per-minute Manager loop runs it
  for every serviceable type and meters the result. Wired only when
  --watch-state-stream is on.
- telemetry: gitopsreverser_watch_audit_divergence{gvr,reason} gauge +
  gitopsreverser_watch_audit_comparisons_total{gvr,result} counter.

Unit-tested: the fold (rv-gate + delete), the diff buckets, the Manager wiring
with fake splices, and a Valkey-backed SpliceWatchType. fmt/vet/lint/test green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/mutationlab/e2e/aggregated_api_test.go (1)

115-127: 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Fail fast before syncing the corpus on a bad watch payload.

The t.Errorf on Lines 115-117 still allows h.syncCorpus to run on Lines 126-127. In MUTATIONLAB_UPDATE mode that can rewrite the golden flunder fixtures with the exact broken watch body this scenario is supposed to reject.

Proposed fix
-	if !added.Summary.HasObject || flunderReference(added) != "some-flunder" {
-		t.Errorf("watch ADDED did not carry the full flunder object (spec.reference=%q, hasObject=%v)",
-			flunderReference(added), added.Summary.HasObject)
-	}
+	if !added.Summary.HasObject || flunderReference(added) != "some-flunder" {
+		t.Fatalf("watch ADDED did not carry the full flunder object (spec.reference=%q, hasObject=%v)",
+			flunderReference(added), added.Summary.HasObject)
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/mutationlab/e2e/aggregated_api_test.go` around lines 115 - 127, The
aggregated API test is logging a bad watch payload but still continues into
h.syncCorpus, which can rewrite golden fixtures with invalid data. In
aggregated_api_test.go, update the check around the watch ADDED validation so
the test stops immediately on a failed flunderReference(added) or
added.Summary.HasObject condition, and only call h.syncCorpus after the payload
has passed validation. Keep the fix localized to the existing test flow in the
aggregated_api_test.go scenario.
🧹 Nitpick comments (1)
test/mutationlab/e2e/configmap_scenarios_test.go (1)

64-65: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Rename these tests to the TestFunctionName_Scenario pattern.

These new e2e cases use bare names like TestUpdate and TestOwnerRefCascade, which doesn't match the repo's required *_test.go naming convention.

As per coding guidelines, **/*_test.go: Follow naming convention TestFunctionName_Scenario(t *testing.T).

Also applies to: 81-82, 103-104, 128-129, 180-181, 287-288, 344-345, 441-442

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/mutationlab/e2e/configmap_scenarios_test.go` around lines 64 - 65,
Rename the e2e test functions in configmap_scenarios_test.go to follow the
required TestFunctionName_Scenario pattern, using the existing test names like
TestCreateSucceeds, TestUpdate, and TestOwnerRefCascade as the base and adding a
clear scenario suffix where needed. Update each affected test function
declaration so the names are descriptive and consistent with the repo’s testing
convention, and ensure the renamed functions still match their existing
harness-based bodies such as newHarness(t).

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/mutation-capture-lab-design.md`:
- Around line 9-12: The captured-scenarios inventory is stale in the
mutation-capture lab design doc. Update the scenario counts and row status in
the sections that describe the corpus inventory so they match the committed
corpus, including the new owner-ref cascade and conflict coverage referenced by
the existing corpus/test symbols and the sibling watch-only design doc. Also
revise the “What To Capture Next” list and the affected Difficult Cases Catalog
entries together so the totals, planned rows, and captured rows are consistent
throughout the document.

In `@internal/watch/watch_state.go`:
- Around line 141-150: The watch-state restart logic is treating every session
close as a resumable failure and clearing sinceRV too aggressively. Update
runTypeWatchStream/watchStateSession handling so only definitive
resource-version expiration (for example 410 Gone / unresumable resume points)
increments toward watchStateRelistThreshold and resets sinceRV, while ordinary
watch rollovers or errWatchStateClosed channel closures should keep the current
resume cursor and reopen from the same RV. Use the existing gvr, sinceRV,
failures, and watchStateRelistThreshold flow to distinguish these cases before
logging "watch-state stream resetting to live edge".

In `@test/mutationlab/corpus/configmap/finalizer-delete/watch.deleted.yaml`:
- Around line 10-34: The terminal DELETED snapshot for the finalizer-delete
scenario still shows the old finalizer, so update the watch.deleted.yaml fixture
to reflect the post-patch state by removing the metadata.finalizers block
entirely. Make sure the deleted event matches the effect of audit.patch.yaml on
the object, and verify the remaining metadata in watch.deleted.yaml still aligns
with the finalizer-delete flow.

In `@test/mutationlab/e2e/configmap_scenarios_test.go`:
- Around line 118-120: The dry-run verification in configmap_scenarios_test
should assert a NotFound error from the ConfigMaps().Get call instead of
treating any error as success. Update the check around
h.kube.CoreV1().ConfigMaps(s.ns).Get for "cm-dry" to explicitly validate that
the failure is due to the object not existing, so transport or auth errors do
not incorrectly pass the test.

---

Outside diff comments:
In `@test/mutationlab/e2e/aggregated_api_test.go`:
- Around line 115-127: The aggregated API test is logging a bad watch payload
but still continues into h.syncCorpus, which can rewrite golden fixtures with
invalid data. In aggregated_api_test.go, update the check around the watch ADDED
validation so the test stops immediately on a failed flunderReference(added) or
added.Summary.HasObject condition, and only call h.syncCorpus after the payload
has passed validation. Keep the fix localized to the existing test flow in the
aggregated_api_test.go scenario.

---

Nitpick comments:
In `@test/mutationlab/e2e/configmap_scenarios_test.go`:
- Around line 64-65: Rename the e2e test functions in
configmap_scenarios_test.go to follow the required TestFunctionName_Scenario
pattern, using the existing test names like TestCreateSucceeds, TestUpdate, and
TestOwnerRefCascade as the base and adding a clear scenario suffix where needed.
Update each affected test function declaration so the names are descriptive and
consistent with the repo’s testing convention, and ensure the renamed functions
still match their existing harness-based bodies such as newHarness(t).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: cb5de786-dc51-4d30-99da-7f1342cfe8bc

📥 Commits

Reviewing files that changed from the base of the PR and between e0b6369 and 77e349f.

📒 Files selected for processing (28)
  • cmd/main.go
  • docs/design/mutation-capture-lab-design.md
  • docs/design/watch-only-ingestion-architecture.md
  • internal/mutationlab/normalize/normalize.go
  • internal/queue/redis_watch_stream.go
  • internal/watch/manager.go
  • internal/watch/materialization.go
  • internal/watch/watch_state.go
  • internal/watch/watch_state_test.go
  • test/mutationlab/README.md
  • test/mutationlab/corpus/configmap/conflict-update/audit.update.yaml
  • test/mutationlab/corpus/configmap/finalizer-delete/admission.delete.yaml
  • test/mutationlab/corpus/configmap/finalizer-delete/admission.update.yaml
  • test/mutationlab/corpus/configmap/finalizer-delete/audit.delete.yaml
  • test/mutationlab/corpus/configmap/finalizer-delete/audit.patch.yaml
  • test/mutationlab/corpus/configmap/finalizer-delete/watch.deleted.yaml
  • test/mutationlab/corpus/configmap/finalizer-delete/watch.modified.yaml
  • test/mutationlab/corpus/configmap/owner-ref-cascade/audit.delete.cm-child.yaml
  • test/mutationlab/corpus/configmap/owner-ref-cascade/audit.delete.cm-parent.yaml
  • test/mutationlab/corpus/configmap/owner-ref-cascade/watch.deleted.cm-child.yaml
  • test/mutationlab/corpus/configmap/owner-ref-cascade/watch.deleted.cm-parent.yaml
  • test/mutationlab/corpus/widget/crd-conversion/conversion.to-v1.yaml
  • test/mutationlab/corpus/widget/crd-conversion/conversion.to-v2.yaml
  • test/mutationlab/e2e/aggregated_api_test.go
  • test/mutationlab/e2e/configmap_scenarios_test.go
  • test/mutationlab/e2e/crd_conversion_test.go
  • test/mutationlab/e2e/harness_test.go
  • test/mutationlab/e2e/workload_scenarios_test.go
✅ Files skipped from review due to trivial changes (7)
  • test/mutationlab/corpus/widget/crd-conversion/conversion.to-v2.yaml
  • test/mutationlab/corpus/configmap/owner-ref-cascade/watch.deleted.cm-child.yaml
  • test/mutationlab/corpus/configmap/finalizer-delete/audit.patch.yaml
  • test/mutationlab/corpus/configmap/owner-ref-cascade/audit.delete.cm-parent.yaml
  • test/mutationlab/corpus/configmap/conflict-update/audit.update.yaml
  • test/mutationlab/corpus/widget/crd-conversion/conversion.to-v1.yaml
  • test/mutationlab/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/mutationlab/normalize/normalize.go
  • test/mutationlab/e2e/harness_test.go

Comment thread docs/design/mutation-capture-lab-design.md Outdated
Comment thread internal/watch/watch_state.go Outdated
Comment on lines +141 to +150
failures++
log.V(1).Info("watch-state session ended; will re-open",
"gvr", gvr.String(), "resumeRV", sinceRV, "failures", failures, "err", err.Error())
if failures >= watchStateRelistThreshold {
// An un-resumable resume point (e.g. 410 Gone after compaction): drop to the live edge.
// The freshness gap is healed by the next checkpoint re-anchor; this stream is never a
// correctness source.
log.V(1).Info("watch-state stream resetting to live edge", "gvr", gvr.String())
sinceRV = ""
failures = 0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major

Don't drop the resume RV after ordinary watch rollovers.

watchStateSession returns errWatchStateClosed for any channel closure, and runTypeWatchStream treats this as a failure that increments the failures counter. Once failures reaches watchStateRelistThreshold, sinceRV is cleared to "". This logic fails to distinguish between a normal API server watch timeout (which is expected and resumable) and an actual resource version expiration (410 Gone). Consequently, the stream discards its resume cursor after just three healthy rollovers, forcing a restart from the live edge and silently skipping mutations that occurred during the reconnect window. Only reset sinceRV on definitive resource expiration; treat ordinary channel closes as resumable events.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/watch/watch_state.go` around lines 141 - 150, The watch-state
restart logic is treating every session close as a resumable failure and
clearing sinceRV too aggressively. Update runTypeWatchStream/watchStateSession
handling so only definitive resource-version expiration (for example 410 Gone /
unresumable resume points) increments toward watchStateRelistThreshold and
resets sinceRV, while ordinary watch rollovers or errWatchStateClosed channel
closures should keep the current resume cursor and reopen from the same RV. Use
the existing gvr, sinceRV, failures, and watchStateRelistThreshold flow to
distinguish these cases before logging "watch-state stream resetting to live
edge".

Comment on lines +10 to +34
finalizers:
- mutationlab.configbutler.ai/hold
labels:
mutationlab.configbutler.ai/scenario: finalizer-delete
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:data:
.: {}
f:key: {}
f:metadata:
f:finalizers:
.: {}
v:"mutationlab.configbutler.ai/hold": {}
f:labels:
.: {}
f:mutationlab.configbutler.ai/scenario: {}
manager: e2e.test
operation: Update
time: <ts>
name: cm-hold
namespace: <ns-1>
resourceVersion: <rv-3>
uid: <uid-1>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

files=(
  test/mutationlab/corpus/configmap/finalizer-delete/watch.modified.yaml
  test/mutationlab/corpus/configmap/finalizer-delete/watch.deleted.yaml
  test/mutationlab/corpus/configmap/finalizer-delete/admission.update.yaml
  test/mutationlab/corpus/configmap/finalizer-delete/audit.patch.yaml
)

for f in "${files[@]}"; do
  echo "===== $f ====="
  sed -n '1,120p' "$f"
  echo
done

python - <<'PY'
from pathlib import Path
import difflib

a = Path("test/mutationlab/corpus/configmap/finalizer-delete/watch.modified.yaml").read_text().splitlines()
b = Path("test/mutationlab/corpus/configmap/finalizer-delete/watch.deleted.yaml").read_text().splitlines()

print("===== diff: watch.modified.yaml vs watch.deleted.yaml =====")
for line in difflib.unified_diff(a, b, fromfile="watch.modified.yaml", tofile="watch.deleted.yaml", lineterm=""):
    print(line)
PY

Repository: ConfigButler/gitops-reverser

Length of output: 5661


The watch.deleted.yaml fixture is missing the finalizer removal.

Although audit.patch.yaml sends a request to clear finalizers (null), the resulting watch.deleted.yaml snapshot incorrectly retains the mutationlab.configbutler.ai/hold entry. In the finalizer-delete scenario, the terminal DELETED event must reflect the object state after the patch, which requires the finalizers array to be empty or absent.

Update watch.deleted.yaml to remove the finalizers block.

See patch request vs. current deleted state

Current audit.patch.yaml request:

requestObject:
  metadata:
    finalizers: null

Incorrect watch.deleted.yaml state:

metadata:
  finalizers:
  - mutationlab.configbutler.ai/hold
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/mutationlab/corpus/configmap/finalizer-delete/watch.deleted.yaml` around
lines 10 - 34, The terminal DELETED snapshot for the finalizer-delete scenario
still shows the old finalizer, so update the watch.deleted.yaml fixture to
reflect the post-patch state by removing the metadata.finalizers block entirely.
Make sure the deleted event matches the effect of audit.patch.yaml on the
object, and verify the remaining metadata in watch.deleted.yaml still aligns
with the finalizer-delete flow.

Comment on lines +118 to +120
if _, err := h.kube.CoreV1().ConfigMaps(s.ns).Get(ctx, "cm-dry", metav1.GetOptions{}); err == nil {
t.Error("dry-run create persisted an object; want none")
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Require NotFound here, not just “any error”.

This assertion passes on transport/auth failures too, so the test can claim “not persisted” when the lookup actually failed for an unrelated reason.

Suggested fix
-	if _, err := h.kube.CoreV1().ConfigMaps(s.ns).Get(ctx, "cm-dry", metav1.GetOptions{}); err == nil {
-		t.Error("dry-run create persisted an object; want none")
-	}
+	if _, err := h.kube.CoreV1().ConfigMaps(s.ns).Get(ctx, "cm-dry", metav1.GetOptions{}); err == nil {
+		t.Error("dry-run create persisted an object; want none")
+	} else if !apierrors.IsNotFound(err) {
+		t.Fatalf("dry-run create lookup failed with %v; want NotFound", err)
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if _, err := h.kube.CoreV1().ConfigMaps(s.ns).Get(ctx, "cm-dry", metav1.GetOptions{}); err == nil {
t.Error("dry-run create persisted an object; want none")
}
if _, err := h.kube.CoreV1().ConfigMaps(s.ns).Get(ctx, "cm-dry", metav1.GetOptions{}); err == nil {
t.Error("dry-run create persisted an object; want none")
} else if !apierrors.IsNotFound(err) {
t.Fatalf("dry-run create lookup failed with %v; want NotFound", err)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/mutationlab/e2e/configmap_scenarios_test.go` around lines 118 - 120, The
dry-run verification in configmap_scenarios_test should assert a NotFound error
from the ConfigMaps().Get call instead of treating any error as success. Update
the check around h.kube.CoreV1().ConfigMaps(s.ns).Get for "cm-dry" to explicitly
validate that the failure is due to the object not existing, so transport or
auth errors do not incorrectly pass the test.

sunib and others added 2 commits June 25, 2026 12:33
Add the watch-transport rows via a targeted /watch-probe lab endpoint so
410-Expired ERROR (row 16) and BOOKMARK (row 17) events can be
scenario-attributed, with a watch_probe recorder and watch_transport_test
driver. Commit the watch-resync and watch-bookmark corpus.

Update the design doc + README to mark rows 8, 10, 13, 16, 17 captured
(fifteen of seventeen); only server-side apply and no-op apply (rows 3, 4)
remain.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
internal/mutationlab/labserver/api.go (1)

126-129: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Probe validation errors get a misleading 504.

Probe returns plain errors for client-side problems (empty scenario, unsupported mode, missing resource), but this path maps every probe error to 504 Gateway Timeout. The handler validates the resource up front yet never checks scenario/mode, so a malformed request body surfaces as a timeout rather than a 400. Consider distinguishing a real timeout (ctx.Err()/context.DeadlineExceeded) from validation/transport failures.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/mutationlab/labserver/api.go` around lines 126 - 129, The probe
error handling in the API handler is mapping all failures from Probe to 504
Gateway Timeout, which makes validation and transport errors look like timeouts.
Update the handler around the Probe call to distinguish real deadline/timeout
cases (for example via ctx.Err() or context.DeadlineExceeded) from client-side
validation failures such as empty scenario, unsupported mode, or missing
resource, and return a 400 for the latter. Use the existing Probe and handler
logic in api.go to keep the response codes aligned with the actual failure
source.
internal/watch/watch_compare.go (1)

141-153: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low value

Divergence gauge can go stale for types that stop being serviceable.

recordWatchAuditDivergence re-records zeros each pass, which correctly clears a healed divergence. But the loop only records gauges for types still returned by Inventory() with Serviceable() == true. When a type is released/unclaimed (or removed from inventory) while diverging, its last non-zero {gvr, reason} gauge value is never reset to 0 and lingers, falsely signalling ongoing divergence for a type no longer being compared.

For an experimental Phase-1 meter this is low impact, but consider emitting a final zero-out when a previously-recorded GVR drops out of the serviceable set.

Also applies to: 172-194

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/watch/watch_compare.go` around lines 141 - 153, The divergence gauge
handling in compareWatchAuditAllTypes only updates currently serviceable
Inventory() entries, so stale non-zero values can remain when a GVR drops out of
the serviceable set. Update the watch audit flow to track previously recorded
GVRs and, after iterating the current serviceable types, emit a final zero-value
record for any GVR that was seen before but is no longer returned by Inventory()
or Serviceable(). Apply the same cleanup behavior anywhere the watch-audit
comparison loop is duplicated so recordWatchAuditDivergence no longer leaves
lingering divergence metrics.
internal/queue/redis_watch_splice.go (1)

73-79: 🚀 Performance & Scalability | 🔵 Trivial

Ensure MaxLen constraint on :watch:stream to prevent unbounded read costs.

XRange in SpliceWatchType reads the entire stream (from start="-" to end="+") on every pass. While foldWatchEntry discards old entries, the network and memory cost scales with the total event history if the stream is not trimmed.

The write path (AppendWatchEvent -> xaddID) applies MaxLen only if q.maxLen > 0. If the queue instance is initialized without a length limit or if the watch stream is configured independently, it will grow indefinitely.

Consider either:

  1. Enforcing a MaxLen or XTrim strategy on the RedisByTypeStreamQueue initialization for watch streams.
  2. Alternatively, reading only from the checkpoint position in the stream key's native ID space (though this is complex with auto-IDs, making the MaxLen approach preferred).
Relevant Code
// internal/queue/redis_watch_splice.go:73
msgs, err := s.client.XRange(ctx, base+byTypeWatchStreamSuffix, "-", "+").Result()
...

// internal/queue/redis_bytype_queue.go:644-647
args := &redis.XAddArgs{Stream: stream, ID: id, Values: values}
if q.maxLen > 0 {
    args.MaxLen = q.maxLen
    args.Approx = true
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/queue/redis_watch_splice.go` around lines 73 - 79, The watch-stream
read path in SpliceWatchType currently scans the entire :watch:stream with
XRange, so it can become unbounded unless the stream is trimmed. Update the
RedisByTypeStreamQueue/watch-stream setup to enforce a MaxLen or XTrim policy
for the watch stream, and ensure AppendWatchEvent/xaddID applies that limit
consistently even when the queue is otherwise unbounded. Keep the fix localized
around RedisByTypeStreamQueue, AppendWatchEvent, and the XRange-based splice in
redis_watch_splice.go so the stream history stays bounded.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/mutationlab/labserver/api.go`:
- Around line 126-129: The probe error handling in the API handler is mapping
all failures from Probe to 504 Gateway Timeout, which makes validation and
transport errors look like timeouts. Update the handler around the Probe call to
distinguish real deadline/timeout cases (for example via ctx.Err() or
context.DeadlineExceeded) from client-side validation failures such as empty
scenario, unsupported mode, or missing resource, and return a 400 for the
latter. Use the existing Probe and handler logic in api.go to keep the response
codes aligned with the actual failure source.

In `@internal/queue/redis_watch_splice.go`:
- Around line 73-79: The watch-stream read path in SpliceWatchType currently
scans the entire :watch:stream with XRange, so it can become unbounded unless
the stream is trimmed. Update the RedisByTypeStreamQueue/watch-stream setup to
enforce a MaxLen or XTrim policy for the watch stream, and ensure
AppendWatchEvent/xaddID applies that limit consistently even when the queue is
otherwise unbounded. Keep the fix localized around RedisByTypeStreamQueue,
AppendWatchEvent, and the XRange-based splice in redis_watch_splice.go so the
stream history stays bounded.

In `@internal/watch/watch_compare.go`:
- Around line 141-153: The divergence gauge handling in
compareWatchAuditAllTypes only updates currently serviceable Inventory()
entries, so stale non-zero values can remain when a GVR drops out of the
serviceable set. Update the watch audit flow to track previously recorded GVRs
and, after iterating the current serviceable types, emit a final zero-value
record for any GVR that was seen before but is no longer returned by Inventory()
or Serviceable(). Apply the same cleanup behavior anywhere the watch-audit
comparison loop is duplicated so recordWatchAuditDivergence no longer leaves
lingering divergence metrics.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 878bbb6f-c46c-40ee-b1a5-b18c3fdbe802

📥 Commits

Reviewing files that changed from the base of the PR and between 77e349f and d41cfab.

📒 Files selected for processing (27)
  • cmd/main.go
  • cmd/mutation-capture-lab/main.go
  • docs/design/mutation-capture-lab-design.md
  • docs/design/watch-only-ingestion-architecture.md
  • internal/mutationlab/labserver/api.go
  • internal/mutationlab/labserver/api_test.go
  • internal/mutationlab/normalize/normalize.go
  • internal/mutationlab/normalize/normalize_test.go
  • internal/mutationlab/recorder/recorder_test.go
  • internal/mutationlab/recorder/watch_probe.go
  • internal/queue/redis_watch_splice.go
  • internal/queue/redis_watch_splice_test.go
  • internal/telemetry/exporter.go
  • internal/watch/manager.go
  • internal/watch/watch_compare.go
  • internal/watch/watch_compare_test.go
  • test/mutationlab/README.md
  • test/mutationlab/corpus/configmap/no-op-apply/admission.update.yaml
  • test/mutationlab/corpus/configmap/no-op-apply/audit.patch.yaml
  • test/mutationlab/corpus/configmap/server-side-apply/admission.update.yaml
  • test/mutationlab/corpus/configmap/server-side-apply/audit.patch.yaml
  • test/mutationlab/corpus/configmap/server-side-apply/watch.modified.yaml
  • test/mutationlab/corpus/configmap/watch-bookmark/watch.bookmark.yaml
  • test/mutationlab/corpus/configmap/watch-resync/watch.error.yaml
  • test/mutationlab/e2e/configmap_scenarios_test.go
  • test/mutationlab/e2e/harness_test.go
  • test/mutationlab/e2e/watch_transport_test.go
✅ Files skipped from review due to trivial changes (7)
  • test/mutationlab/corpus/configmap/watch-resync/watch.error.yaml
  • test/mutationlab/corpus/configmap/watch-bookmark/watch.bookmark.yaml
  • test/mutationlab/corpus/configmap/server-side-apply/admission.update.yaml
  • test/mutationlab/corpus/configmap/no-op-apply/audit.patch.yaml
  • test/mutationlab/corpus/configmap/server-side-apply/audit.patch.yaml
  • test/mutationlab/README.md
  • docs/design/watch-only-ingestion-architecture.md
🚧 Files skipped from review as they are similar to previous changes (6)
  • internal/mutationlab/recorder/recorder_test.go
  • cmd/mutation-capture-lab/main.go
  • test/mutationlab/e2e/harness_test.go
  • internal/mutationlab/normalize/normalize_test.go
  • cmd/main.go
  • internal/mutationlab/normalize/normalize.go

func (m *Manager) ForgetGitTargetDeclaration(gitDest types.ResourceReference) {
m.clearTargetTypeWatermarks(gitDest)
m.forgetGitTargetWatches(gitDest)
m.forgetGitTargetUID(gitDest)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: forgetGitTargetUID can race with DeclareForGitTarget for recreated GitTargets

When a GitTarget is deleted and recreated with the same namespace/name, the controller calls ForgetGitTargetDeclaration with a UID-less gitDest (gittarget_controller.go:828). If DeclareForGitTarget for the new target has already stored its UID, this unconditional delete in forgetGitTargetUID wipes the new UID from the map. The new target's cursor resume then falls back to a fresh replay instead of resuming from the stored position.

Consider only deleting when the stored UID matches the one being forgotten, or skipping UID cleanup here and relying on DeclareForGitTarget overwrites plus the cursor TTL to evict stale entries.


Reply with @kilocode-bot fix it to have Kilo Code address this issue.

@sunib sunib changed the title chore: add lab-code to capture events for all types (so that we can make a better informed decision in edge cases). Watch-first ingestion with (optional!) audit-based attribution Jun 26, 2026
sunib and others added 8 commits June 26, 2026 21:24
…nt in the writer

The acceptance gate that classifies unsafe/unsupported folder content existed
but was wired only into the manifest-analyzer CLI — the running operator built
its store with an empty allowlist and never called Accept, so a folder with a
hard-Kustomize kustomization.yaml (patches/generators/components/...), duplicate
identities, impure managed files, or standalone non-KRM YAML was detected but
written into anyway.

This enforces the gate in the live writer and the resync (first-materialization)
apply, for the cases we already know are a problem from structure alone:

- manifestanalyzer: add IssueUnsupportedKustomize + RetainedDocument.Unsupported
  (surfaced from the existing hasUnsupportedKustomizeFeature predicate), an
  AcceptStructureOnly entrypoint that runs the structural refusals but never the
  discovery-derived mapping refusals (which can blink on a wobble), a typed
  AcceptanceRefusedError, and a WriterAllowlist that retains the operator's own
  .sops.yaml + kustomize build directives so they are never mis-refused.
- git writer: run the structure-only gate over the built store in both
  flushEventsToWorktree (live) and applyResyncToWorktree (resync); a refusal
  aborts the commit before any file is touched and returns the typed error.

Surfacing the refusal on GitTarget status (a Blocked stream) follows in the
next commit. Design: docs/design/unsupported-folder-refusal-plan.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When the first-materialization resync is refused by the acceptance gate (an
unsupported kustomization, duplicate identity, impure or non-KRM file), the
watch layer now surfaces it on GitTarget status instead of only logging:

- thread the (GVR, namespace) key through enqueueReplayResync to drainScopedResync
  so a refusal can mark the right type's stream;
- in the drain, recover the typed *AcceptanceRefusedError via errors.As and mark
  the stream Blocked with reason UnsupportedContent and a message naming the file;
  a refusal is NOT counted as a generic background resync failure (nothing was
  written; it is user-actionable, not a transient fault);
- add the StreamReasonUnsupportedContent reason.

This flows through the existing data-plane surface with no schema change:
status.streams.blocked rises, the StreamsReady condition goes False with reason
UnsupportedContent, and phase becomes Degraded — a clean fit for the current
two-axis GitTarget status. The folder is left untouched until a human cleans it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds test/e2e/unsupported_folder_e2e_test.go: seeds a GitTarget path with a
hard-Kustomize kustomization.yaml (a patches block) plus the ConfigMap it
references, creates the GitTarget + a ConfigMap WatchRule, and asserts the
control plane is configured (Ready=True) while the data plane is blocked
(StreamsReady=False, reason UnsupportedContent) and the operator commits nothing
on top of the seed.

- architecture.md: the Mark-and-Sweep Resync section now documents the
  structure-only acceptance gate and the Blocked-stream refusal, replacing the
  stale "unsafe YAML is left alone" claim.
- e2e-coverage-gaps plan: flip Test D from "blocked on a product decision" to
  IMPLEMENTED.
- lint: add IssueUnsupportedKustomize to the exhaustive analyzer_test map; use
  require.ErrorAs and wrap a long line in the git acceptance-gate test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… go True

createGitTarget always enables SOPS encryption referencing the shared sops-age-key
secret, so without it the GitTarget's EncryptionConfigured gate fails and Ready
never becomes True — which made Test D time out on the Ready check before reaching
the refusal assertion. Apply the key in BeforeAll like the other manager specs.

Verified end to end against a redeployed operator: the hard-Kustomize folder is
refused with StreamsReady=False/UnsupportedContent and no commit is produced.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Expect(condition.Reason).To(Equal(ReasonProgressing))
Expect(streamsRunning.Status).To(Equal(metav1.ConditionFalse))
Expect(streamsRunning.Reason).To(Equal(watch.StreamReasonNoResolvedTypes))
Expect(gitTargetReady.Status).To(Equal(metav1.ConditionFalse))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Test asserts GitTargetReady=False but controller produces Unknown for unevaluated GitTarget

The gitTargetReadyCondition function returns Unknown when the GitTarget has no status conditions (gittarget_dependency_status.go:57), which correctly signals "not yet evaluated." This test creates a GitTarget without any prior reconcile, so the WatchRule’s setGitTargetReadyCondition assigns Unknown — not False.

Change the expectation to metav1.ConditionUnknown so the test matches the actual status of a freshly created GitTarget that has not yet published a Ready condition.


Reply with @kilocode-bot fix it to have Kilo Code address this issue.

BeforeAll(func() {
By("creating the foreign-content test namespace")
testNs = testNamespaceFor("manager-foreign-content")
_, _ = kubectlRun("create", "namespace", testNs)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Error discarded without explanatory comment

kubectlRun("create", "namespace", testNs) silently ignores the returned error. Other e2e tests in this codebase annotate such calls with // idempotent; ignore AlreadyExists when the error is intentionally discarded. Without that annotation a genuine failure here (invalid name, RBAC denial, API server unavailable) will propagate into later steps and produce confusing assertion failures instead of a clear namespace-creation error.


Reply with @kilocode-bot fix it to have Kilo Code address this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Watch mode — native Kubernetes resource watcher as an alternative to the audit webhook

1 participant