Skip to content

Retry user-initiated splices across restarts and disconnects#930

Draft
jkczyz wants to merge 7 commits into
lightningdevkit:mainfrom
jkczyz:2026-06-splicing-restart
Draft

Retry user-initiated splices across restarts and disconnects#930
jkczyz wants to merge 7 commits into
lightningdevkit:mainfrom
jkczyz:2026-06-splicing-restart

Conversation

@jkczyz

@jkczyz jkczyz commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

LDK abandons an in-progress splice negotiation whenever the peer disconnects (which includes stopping the node) and only durably records a splice once it reaches signing. Between calling splice_in/splice_out/bump_channel_funding_fee and that point, a restart or an ill-timed disconnect silently drops the splice with no way to recover it.

This makes those splices durable and self-healing: the intent is persisted before the contribution is handed to LDK, and a startup reconciler plus a SpliceNegotiationFailed handler resubmit it (gated on LDK's NegotiationFailureReason::is_retriable) until the splice locks or is genuinely unrecoverable. As a result SpliceNegotiationFailed is now emitted only once a splice is given up on, rather than for every failed negotiation round.

Rather than adding a dedicated store, the intent lives in the existing PendingPaymentStore under a PaymentId generated at splice time — which becomes the splice's payment id, replacing the previous first-candidate-txid derivation. That required modeling the pending record as an enum, since a not-yet-broadcast splice has no funding transaction, and therefore no PaymentDetails, yet. The change is scoped to user-initiated splices; counterparty-initiated splices and V2 opens are untouched and keep the txid-derived id.

Restart resumption is covered by two integration tests (a splice-out and an RBF fee-bump), alongside unit coverage of the retry-decision matrix; the existing splice/funding/on-chain suites continue to pass.

Based on #962.

@ldk-reviews-bot

Copy link
Copy Markdown

👋 Hi! I see this is a draft PR.
I'll wait to assign reviewers until you mark it as ready for review.
Just convert it out of draft status when you're ready for review!

@jkczyz jkczyz self-assigned this Jun 11, 2026
@tnull tnull added this to the 0.8 milestone Jun 12, 2026
@jkczyz jkczyz mentioned this pull request Jun 12, 2026
@jkczyz jkczyz force-pushed the 2026-06-splicing-restart branch 2 times, most recently from 3ae2507 to 6098d0e Compare June 15, 2026 19:46

@joostjager joostjager left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description explains that LDK currently does not durably record a splice until signature exchange, but I don’t think it explains why ldk-node should therefore become the owner of that durability?

This seems similar to the functionality added in #882: common enough, and close enough to channel/protocol state, that it feels like it should live in LDK if we want the behavior to be durable. Persisting it in ldk-node means we now have another store tracking protocol state alongside ChannelManager/ChannelMonitor, plus reconciliation logic to infer whether LDK still has or no longer has the splice. That creates desync risk between persistence layers.

I can see ldk-node owning product policy around retries or how/when to surface failures, but the durable record of an accepted splice contribution or in-flight splice intent feels like it should be owned by LDK.

@tnull

tnull commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

This will now need a (likely rather considerable) rebase now that #888 and a few related PRs landed.

jkczyz and others added 7 commits July 1, 2026 21:47
Funding broadcasts are classified into payment records off the
broadcaster's queue, which can run after wallet sync has already recorded
the transaction -- for instance when LDK re-broadcasts a still-pending
funding transaction on restart. In that case the classification overwrote
a record wallet sync had already advanced, downgrading a confirmed or
graduated funding payment back to unconfirmed/pending.

Merge only the classification and our contribution figures into an
existing record, leaving the confirmation state that the wallet-sync
events own in place.

Raised by Codex in the review of lightningdevkit#888.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A funding payment's id is anchored to its first negotiated candidate, but
once it confirms the record is stamped with the candidate that actually
confirmed. After it graduates and its pending-store entry is removed, a
later reorg event carrying the confirmed candidate's txid could no longer
be resolved to the payment, so it was recorded as a duplicate generic
on-chain payment rather than reverting the splice payment. RBF splices,
whose confirmed candidate differs from the first, are the reachable case.

Resolve such an event against the payment store by on-chain txid once the
pending store no longer holds it, and revert the funding payment to
pending so wallet sync re-graduates it when it reconfirms.

Raised by Codex in the review of lightningdevkit#888.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Retrying a user-initiated splice across restarts requires persisting the
splice intent before handing it to LDK, which happens before negotiation
and therefore before any funding transaction exists. The pending-payment
record was built around an on-chain PaymentDetails carrying a txid, which
cannot represent a splice that has not been broadcast yet.

Reshape PendingPaymentDetails into an enum: a PendingSplice variant that
holds only the generated PaymentId and the splice intent, and a Tracked
variant that is the previous record plus an optional intent retained until
the splice locks. Add the SpliceIntent and SpliceKind types the intent
needs to resubmit or rebuild the contribution.

This is groundwork; nothing constructs a PendingSplice yet. The classify,
retry, and wiring that use it follow in subsequent commits.

Generated with assistance from Claude Code.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A user-initiated splice will be keyed by a PaymentId generated at splice
time rather than derived from a candidate's txid, so its retry intent,
funding payment, and candidate history all share one record. Teach the
classifier to find a pre-broadcast splice intent by its channel and reuse
that id, promoting the intent record to a tracked funding payment while
preserving the intent until the splice locks. Splices we did not originate
(counterparty-initiated or V2 dual-funded opens) keep deriving the id from
the first candidate's txid via a fallback.

Also map any candidate txid back to the record in find_payment_by_txid,
since a splice under a generated id is no longer found by the txid-derived
lookup and an earlier RBF candidate may be the one that confirms.

No splice intents are created yet, so behavior is unchanged; the splice
entry points that persist them follow.

Generated with assistance from Claude Code.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LDK abandons an in-progress splice negotiation whenever the peer
disconnects -- which includes stopping the node -- and only durably records
a splice once its negotiation reaches signing. A splice dropped before then,
after splice_in, splice_out, or bump_channel_funding_fee returned Ok, is
therefore silently lost across a restart or an ill-timed disconnect.

Persist a splice intent before handing the contribution to LDK, keyed by a
PaymentId generated at splice time and reusing the channel's existing intent
record when one is present, so a splice and its fee bumps share one id and at
most one intent exists per channel. At startup a reconciler probes each intent
against LDK's live channel state and resubmits any LDK dropped -- including
those lost to a crash before LDK persisted anything -- surfacing
SpliceNegotiationFailed only when the channel is gone, a fee bump has nothing
left to replace, or the resubmission budget is exhausted. Resubmitting does
not require the peer to be connected: LDK holds the contribution and initiates
quiescence on reconnect.

A payment-tracking merge (e.g. from wallet sync) must leave a live intent
untouched. The splice tests now locate a funding payment by its candidate
txid, since its id is generated rather than derived from the funding txid.

Generated with assistance from Claude Code.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A user-initiated splice can fail mid-negotiation while the node is running
-- the peer disconnects, or the contribution goes stale behind a competing
negotiation -- and LDK reports each such round via SpliceNegotiationFailed.

Drive those events through the splice retrier: resubmit the same
contribution when the peer merely disconnected, rebuild a fresh one when it
went stale, and give up (surfacing the failure) only for a non-retriable
reason or once the resubmission budget is exhausted, using LDK's own
is_retriable classification. Clear a splice's intent once the channel locks
its new funding or the channel closes.

Event::SpliceNegotiationFailed is now emitted only when a splice is finally
abandoned, not for every failed negotiation round, since a recoverable
failure is retried transparently.

Generated with assistance from Claude Code.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add integration coverage for resuming a dropped splice: splice_resumed_after_restart
initiates a splice-out while disconnected, restarts the node before anything is
negotiated, and asserts the reconciler resumes and completes the splice -- and that a
second restart does not resubmit the now-locked splice. splice_rbf_resumed_after_restart
does the same for a fee bump.

Document on splice_in, splice_out, and bump_channel_funding_fee that the splice is
retried automatically across restarts until it completes or is given up on.

Generated with assistance from Claude Code.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jkczyz jkczyz force-pushed the 2026-06-splicing-restart branch from 6098d0e to c005623 Compare July 2, 2026 15:34
@jkczyz

jkczyz commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Re-wrote this to use the exiting PendingPaymentStore as discussed offline. See PR description for details. @tnull Not sure what you think about making PendingPaymentDetails an enum that transitions (see 29a93ef). But high-level feedback on that design would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants