fix: cascade-delete registered_entries on attested node deletion#6946
fix: cascade-delete registered_entries on attested node deletion#6946angabini wants to merge 14 commits into
Conversation
Previously, deleteAttestedNodeAndSelectors deleted AttestedNode and NodeSelector rows but left registered_entries with parent_id matching the deleted node's SPIFFE ID untouched. In particular, the alias row that SPIRE auto-creates in createJoinTokenRegistrationEntry (when CreateJoinToken is called with AgentId) would survive the node it was minted for, accumulating as dead rows that still flow through the server's entry cache and the events-based cache. This extends deleteAttestedNodeAndSelectors to cascade: find entries whose parent_id equals the deleted node's SPIFFE ID, delete each via the existing deleteRegistrationEntrySupport helper, emit a RegistrationEntryEvent per deletion so event-cache consumers see them, and log each cascaded delete at Info with the usual SPIFFEID/ParentID/RegistrationID telemetry fields (matching the pruneRegistrationEntries pattern). The cascade fires for both DeleteAttestedNode (admin delete) and PruneAttestedExpiredNodes (periodic prune) since both paths reach this helper. Callers pass their existing logger (ds.log for DeleteAttestedNode, the prune logger for PruneAttestedExpiredNodes). Mirrors the shape of spiffe#3873 which added the node_resolver_map_entries cascade at the same function. Fixes spiffe#6944 Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
Per review feedback on spiffe#6946: cascading every entry parented on a deleted attested node would also wipe legitimately user-registered workload entries that happen to be parented directly on a non-join- token agent SVID (e.g. spiffe://td/spire/agent/aws_iid/...). That is a supported registration pattern, not an orphan. Restrict the cascade in deleteAttestedNodeAndSelectors to nodes whose attested DataType is "join_token". The auto-alias that createJoinTokenRegistrationEntry writes is the only entry SPIRE itself creates with parent_id = node SVID, so this still cleans up the unbounded-growth case reported in spiffe#6944 without changing behavior for any other attestor. A broader explicit cascade can be added later behind a CLI flag, as suggested in the issue. Add TestDeleteAttestedNodeNonJoinTokenDoesNotCascade asserting an aws_iid-attested node's child entry survives node deletion and no RegistrationEntryEvent is emitted. Existing cascade tests already use join_token nodes and continue to pass. Update doc/spire_server.md to reflect the narrowed scope. Refs spiffe#6944 Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds cascade-deletion behavior for registration entries when join-token attested nodes are deleted or pruned, along with tests and documentation describing the behavior.
Changes:
- Implement cascading deletion of registration entries when deleting/pruning join-token attested nodes, emitting registration entry events and logs.
- Add SQL store tests covering cascade behavior (single, multiple w/ federation, prune) and a non-join-token non-cascade case.
- Update server docs to mention cascade behavior during node pruning.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| pkg/server/datastore/sqlstore/sqlstore.go | Adds join-token-only cascade deletion of registration entries during attested node deletion/pruning, including event emission and logging. |
| pkg/server/datastore/sqlstore/sqlstore_test.go | Adds tests verifying cascade deletion, federation association clearing, event emission, and non-join-token safety. |
| doc/spire_server.md | Documents the pruning behavior with cascade deletion details for join-token nodes. |
| logger.WithFields(logrus.Fields{ | ||
| telemetry.SPIFFEID: entry.SpiffeID, | ||
| telemetry.ParentID: entry.ParentID, | ||
| telemetry.RegistrationID: entry.EntryID, | ||
| }).Info("Cascade-deleted registration entry on attested node deletion") |
There was a problem hiding this comment.
This matches the existing pattern in this file. pruneRegistrationEntries (sqlstore.go:4197–4220) loops over expired entries and emits exactly the same per-entry Info log — "Pruned an expired registration" — with the same field set (spiffe_id / parent_id / registration_id).
The cascade I added uses "Cascade-deleted registration entry on attested node deletion" and the same fields, deliberately, so an operator auditing why an entry disappeared sees one consistently shaped log line per deletion regardless of which code path removed it.
Happy to move to Debug if the maintainers prefer the quieter default, but I'd argue we should change pruneRegistrationEntries at the same time so the two related lifecycle events log at the same level.
Per Copilot review feedback on spiffe#6946 (B): even within join-token-attested nodes, child entries can be user-managed (e.g. an operator-created entry parented on the agent SVID for a workload that happens to share that parent). Cascading on parent_id alone would still over-delete those. Tighten deleteAttestedNodeAndSelectors to match the exact shape that createJoinTokenRegistrationEntry writes: - exactly one selector - selector.Type == "spiffe_id" - selector.Value == entry.ParentID Anything else stays put. Tests: - Add TestDeleteAttestedNodeJoinTokenPreservesNonAliasChildEntries (replaces the federation multi-test): asserts an alias-shaped child cascades while a unix-selector workload child parented on the same join_token-attested node survives, and exactly one entry event is emitted. - Sharpen TestDeleteAttestedNodeNonJoinTokenDoesNotCascade: switch the child entry to alias-shaped selectors so the test isolates the DataType filter from the selector-shape filter. - Extract lastRegistrationEntryEventID() helper used by all four cascade tests (Copilot review comment 4). Refs spiffe#6944 Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
The cascade note added in 61a42b9 widened the prune_attested_nodes_expired_for description to 581 chars while every other row in the table sat at 480, tripping markdownlint MD060/table-column-style. Pad the description column on every other row in the table to match, keeping pipe positions identical at columns 1, 38, 516, 581 across the entire config table. Pure padding change; no rendered content moved. Separator row padded with dashes (preserving continuous alignment marker), content rows with spaces. Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
|
@amartinezfayo any action item on my side? |
Pull Request check list
Affected functionality
Server datastore — specifically the attested-node delete path (
DeleteAttestedNodeandPruneAttestedExpiredNodes), which flows throughdeleteAttestedNodeAndSelectorsinpkg/server/datastore/sqlstore/sqlstore.go. Also touches the events-based cache (oneRegistrationEntryEventemitted per cascaded entry).Description of change
deleteAttestedNodeAndSelectorsnow cascades toregistered_entries: before deleting the attested-node row, it queriesregistered_entries WHERE parent_id = <node.spiffe_id>, deletes each via the existingdeleteRegistrationEntrySupporthelper, and emits aRegistrationEntryEventper deletion so the events-based cache picks them up. Each cascaded delete is logged atInfowith the standardSPIFFEID/ParentID/RegistrationIDtelemetry fields, matching the shape ofpruneRegistrationEntries.This closes the gap between #6152 (which added the node prune scheduler, scoped its deletions to
attested_node_entries+node_selectors) and the need to clean up the correspondingregistered_entriesrows thatcreateJoinTokenRegistrationEntrywrites whenCreateJoinTokenis called withAgentId. The pattern mirrors #3873, which added thenode_resolver_map_entriescascade at the same function.The function signature now takes a
logrus.FieldLogger. Both existing callers are updated to pass their logger (ds.logforDeleteAttestedNode, the prune goroutine's logger forpruneAttestedExpiredNodes).Behavior change:
PruneAttestedExpiredNodesnow deletes matching entries atomically with each expired node.DeleteAttestedNode(and thusspire-server agent delete) now deletes matching entries atomically with the node.Scope: cascade fires only when the attested node's DataType == "join_token". The auto-alias written by createJoinTokenRegistrationEntry is the only registered_entries row SPIRE itself creates with parent_id = , so this addresses the unbounded-growth case in #6944 without touching user-registered workload entries that happen to be parented on a non-join-token agent SVID (e.g. spiffe://td/spire/agent/aws_iid/...). Broader cascade for other attestors would need to be opt-in (e.g. behind a --cascade flag on agent delete), per discussion on the issue.
Existing assertions about
attested_node_entries/node_selectorsbehavior are unchanged. Existing tests do not currently make claims aboutregistered_entriessurviving node delete, so there is no regression. Three new tests have been added:TestDeleteAttestedNodeCascadesEntries— asserts the single-child cascade + event emission on the admin delete path.TestDeleteAttestedNodeCascadesMultipleEntriesIncludingFederation— creates 3 child entries (one federated viaFederatesWith) plus node selectors on the same node; asserts all three children are cascaded,Association("FederatesWith").Clear()is exercised, the federated bundle itself is preserved, node selectors are cleared, and exactly 3 new entry events are emitted with matching IDs (order-independent).TestPruneAttestedExpiredNodesCascadesEntries— asserts the cascade fires on the prune path, and that a non-expired sibling node's child entries are preserved.Which issue this PR fixes
fixes #6944