Skip to content

fix: cascade-delete registered_entries on attested node deletion#6946

Open
angabini wants to merge 14 commits into
spiffe:mainfrom
angabini:cascade-entries-on-node-delete
Open

fix: cascade-delete registered_entries on attested node deletion#6946
angabini wants to merge 14 commits into
spiffe:mainfrom
angabini:cascade-entries-on-node-delete

Conversation

@angabini

@angabini angabini commented May 8, 2026

Copy link
Copy Markdown
Contributor

Pull Request check list

  • Commit conforms to CONTRIBUTING.md?
  • Proper tests/regressions included?
  • Documentation updated?

Affected functionality

Server datastore — specifically the attested-node delete path (DeleteAttestedNode and PruneAttestedExpiredNodes), which flows through deleteAttestedNodeAndSelectors in pkg/server/datastore/sqlstore/sqlstore.go. Also touches the events-based cache (one RegistrationEntryEvent emitted per cascaded entry).

Description of change

deleteAttestedNodeAndSelectors now cascades to registered_entries: before deleting the attested-node row, it queries registered_entries WHERE parent_id = <node.spiffe_id>, deletes each via the existing deleteRegistrationEntrySupport helper, and emits a RegistrationEntryEvent per deletion so the events-based cache picks them up. Each cascaded delete is logged at Info with the standard SPIFFEID/ParentID/RegistrationID telemetry fields, matching the shape of pruneRegistrationEntries.

This closes the gap between #6152 (which added the node prune scheduler, scoped its deletions to attested_node_entries + node_selectors) and the need to clean up the corresponding registered_entries rows that createJoinTokenRegistrationEntry writes when CreateJoinToken is called with AgentId. The pattern mirrors #3873, which added the node_resolver_map_entries cascade at the same function.

The function signature now takes a logrus.FieldLogger. Both existing callers are updated to pass their logger (ds.log for DeleteAttestedNode, the prune goroutine's logger for pruneAttestedExpiredNodes).

Behavior change:

  • Prune path: PruneAttestedExpiredNodes now deletes matching entries atomically with each expired node.
  • Admin delete path: DeleteAttestedNode (and thus spire-server agent delete) now deletes matching entries atomically with the node.
  • Both paths emit entry-delete events so servers using the events-based cache invalidate correctly.

Scope: cascade fires only when the attested node's DataType == "join_token". The auto-alias written by createJoinTokenRegistrationEntry is the only registered_entries row SPIRE itself creates with parent_id = , so this addresses the unbounded-growth case in #6944 without touching user-registered workload entries that happen to be parented on a non-join-token agent SVID (e.g. spiffe://td/spire/agent/aws_iid/...). Broader cascade for other attestors would need to be opt-in (e.g. behind a --cascade flag on agent delete), per discussion on the issue.

Existing assertions about attested_node_entries / node_selectors behavior are unchanged. Existing tests do not currently make claims about registered_entries surviving node delete, so there is no regression. Three new tests have been added:

  • TestDeleteAttestedNodeCascadesEntries — asserts the single-child cascade + event emission on the admin delete path.
  • TestDeleteAttestedNodeCascadesMultipleEntriesIncludingFederation — creates 3 child entries (one federated via FederatesWith) plus node selectors on the same node; asserts all three children are cascaded, Association("FederatesWith").Clear() is exercised, the federated bundle itself is preserved, node selectors are cleared, and exactly 3 new entry events are emitted with matching IDs (order-independent).
  • TestPruneAttestedExpiredNodesCascadesEntries — asserts the cascade fires on the prune path, and that a non-expired sibling node's child entries are preserved.

Which issue this PR fixes

fixes #6944

Previously, deleteAttestedNodeAndSelectors deleted AttestedNode and
NodeSelector rows but left registered_entries with parent_id matching
the deleted node's SPIFFE ID untouched. In particular, the alias row
that SPIRE auto-creates in createJoinTokenRegistrationEntry (when
CreateJoinToken is called with AgentId) would survive the node it was
minted for, accumulating as dead rows that still flow through the
server's entry cache and the events-based cache.

This extends deleteAttestedNodeAndSelectors to cascade: find entries
whose parent_id equals the deleted node's SPIFFE ID, delete each via
the existing deleteRegistrationEntrySupport helper, emit a
RegistrationEntryEvent per deletion so event-cache consumers see
them, and log each cascaded delete at Info with the usual
SPIFFEID/ParentID/RegistrationID telemetry fields (matching the
pruneRegistrationEntries pattern). The cascade fires for both
DeleteAttestedNode (admin delete) and PruneAttestedExpiredNodes
(periodic prune) since both paths reach this helper. Callers pass
their existing logger (ds.log for DeleteAttestedNode, the prune
logger for PruneAttestedExpiredNodes).

Mirrors the shape of spiffe#3873 which added the node_resolver_map_entries
cascade at the same function.

Fixes spiffe#6944

Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
angabini added 4 commits May 11, 2026 13:53
Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
@amartinezfayo

Copy link
Copy Markdown
Member

Thank you @angabini for this contribution!
Did you have a chance to look at this comment in the issue?

Per review feedback on spiffe#6946: cascading every entry parented on a
deleted attested node would also wipe legitimately user-registered
workload entries that happen to be parented directly on a non-join-
token agent SVID (e.g. spiffe://td/spire/agent/aws_iid/...). That
is a supported registration pattern, not an orphan.

Restrict the cascade in deleteAttestedNodeAndSelectors to nodes
whose attested DataType is "join_token". The auto-alias that
createJoinTokenRegistrationEntry writes is the only entry SPIRE
itself creates with parent_id = node SVID, so this still cleans up
the unbounded-growth case reported in spiffe#6944 without changing
behavior for any other attestor. A broader explicit cascade can be
added later behind a CLI flag, as suggested in the issue.

Add TestDeleteAttestedNodeNonJoinTokenDoesNotCascade asserting an
aws_iid-attested node's child entry survives node deletion and no
RegistrationEntryEvent is emitted. Existing cascade tests already
use join_token nodes and continue to pass.

Update doc/spire_server.md to reflect the narrowed scope.

Refs spiffe#6944

Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
@amartinezfayo amartinezfayo self-assigned this May 26, 2026
@angabini

Copy link
Copy Markdown
Contributor Author

Thank you @angabini for this contribution! Did you have a chance to look at this comment in the issue?

Yes! and the new commit is to address that. see this.

Copilot AI review requested due to automatic review settings May 26, 2026 21:00

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds cascade-deletion behavior for registration entries when join-token attested nodes are deleted or pruned, along with tests and documentation describing the behavior.

Changes:

  • Implement cascading deletion of registration entries when deleting/pruning join-token attested nodes, emitting registration entry events and logs.
  • Add SQL store tests covering cascade behavior (single, multiple w/ federation, prune) and a non-join-token non-cascade case.
  • Update server docs to mention cascade behavior during node pruning.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
pkg/server/datastore/sqlstore/sqlstore.go Adds join-token-only cascade deletion of registration entries during attested node deletion/pruning, including event emission and logging.
pkg/server/datastore/sqlstore/sqlstore_test.go Adds tests verifying cascade deletion, federation association clearing, event emission, and non-join-token safety.
doc/spire_server.md Documents the pruning behavior with cascade deletion details for join-token nodes.

Comment thread pkg/server/datastore/sqlstore/sqlstore.go
Comment thread doc/spire_server.md
Comment on lines +2424 to +2428
logger.WithFields(logrus.Fields{
telemetry.SPIFFEID: entry.SpiffeID,
telemetry.ParentID: entry.ParentID,
telemetry.RegistrationID: entry.EntryID,
}).Info("Cascade-deleted registration entry on attested node deletion")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches the existing pattern in this file. pruneRegistrationEntries (sqlstore.go:4197–4220) loops over expired entries and emits exactly the same per-entry Info log — "Pruned an expired registration" — with the same field set (spiffe_id / parent_id / registration_id).

The cascade I added uses "Cascade-deleted registration entry on attested node deletion" and the same fields, deliberately, so an operator auditing why an entry disappeared sees one consistently shaped log line per deletion regardless of which code path removed it.

Happy to move to Debug if the maintainers prefer the quieter default, but I'd argue we should change pruneRegistrationEntries at the same time so the two related lifecycle events log at the same level.

Comment thread pkg/server/datastore/sqlstore/sqlstore_test.go Outdated
angabini added 6 commits May 26, 2026 14:17
Per Copilot review feedback on spiffe#6946 (B): even within join-token-attested
nodes, child entries can be user-managed (e.g. an operator-created entry
parented on the agent SVID for a workload that happens to share that
parent). Cascading on parent_id alone would still over-delete those.

Tighten deleteAttestedNodeAndSelectors to match the exact shape that
createJoinTokenRegistrationEntry writes:
  - exactly one selector
  - selector.Type == "spiffe_id"
  - selector.Value == entry.ParentID

Anything else stays put.

Tests:
  - Add TestDeleteAttestedNodeJoinTokenPreservesNonAliasChildEntries
    (replaces the federation multi-test): asserts an alias-shaped child
    cascades while a unix-selector workload child parented on the same
    join_token-attested node survives, and exactly one entry event is
    emitted.
  - Sharpen TestDeleteAttestedNodeNonJoinTokenDoesNotCascade: switch
    the child entry to alias-shaped selectors so the test isolates the
    DataType filter from the selector-shape filter.
  - Extract lastRegistrationEntryEventID() helper used by all four
    cascade tests (Copilot review comment 4).

Refs spiffe#6944

Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
The cascade note added in 61a42b9 widened the prune_attested_nodes_expired_for
description to 581 chars while every other row in the table sat at 480, tripping
markdownlint MD060/table-column-style. Pad the description column on every other
row in the table to match, keeping pipe positions identical at columns 1, 38,
516, 581 across the entire config table.

Pure padding change; no rendered content moved. Separator row padded with
dashes (preserving continuous alignment marker), content rows with spaces.

Signed-off-by: angabini <494150+angabini@users.noreply.github.com>
@angabini

Copy link
Copy Markdown
Contributor Author

@amartinezfayo any action item on my side?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

registered_entries with parent = pruned attested node are not cascade-deleted

3 participants