consolidate: v1.12.0 upgrade with all fixes (supersedes #26, #27, #29) by DrudgeRajen · Pull Request #30 · thxnet/thxnet-sdk

DrudgeRajen · 2026-04-19T09:38:46Z

Summary

Single consolidation PR for the full v1.12.0 upgrade work. Merges three previously-independent branches into upgrade/1.12.0-all:

upgrade/1.12.0-backports (was cherry-pick: async-backing backports from v1.13 onto upgrade/1.12.0 (parked) #26) — async-backing cherry-picks from v1.13 (statement-distribution: Fix false warning paritytech/polkadot-sdk#4727 statement-distribution, Remove the prospective-parachains subsystem from collators paritytech/polkadot-sdk#4471 prospective-parachains collator removal, Fix core sharing and make use of scheduling_lookahead paritytech/polkadot-sdk#4724 scheduler core sharing)
upgrade/1.12.0-capacity1-workaround (was fix(leafchain,v1.12.0): UNINCLUDED_SEGMENT_CAPACITY 2→1 to sidestep fragment-chain fork deadlock #27) — leafchain UNINCLUDED_SEGMENT_CAPACITY: 2 → 1 to sidestep v1.12.0 fragment-chain fork deadlock
fix/v1.12.0-free-stuck-cores-on-setcode (was fix(v1.12.0): free stuck AvailabilityCores + ClaimQueue atomically with setCode #29) — migration frees stuck AvailabilityCores + ClaimQueue atomically with setCode

Why consolidate

Each piece was discovered and shipped iteratively during forked-testnet rehearsals. Closing in favor of a single landable PR so downstream (stable2512) rebases cleanly.

End-to-end measurement (forked-testnet, prod-runtime, 1h epoch)

Scenario	Stuck time after rootchain setCode
Plain v1.12.0 migration (no patches)	56 min (wait for session rotation)
Wrong order (rootchain-first)	permanent (API gap)
Correct order + this PR + validator restart	~16 sec ✅

~210× improvement.

Upgrade runbook (v0.9.40 → v1.12.0)

1. Rolling binary upgrade (validators + collators to v1.12.0)
2. Leafchain setCode  (spec 4 → 21, capacity=1)
3. Rootchain setCode  (spec 94000005 → 112000004, EnableAsyncBackingAndCoretime fires)
4. kubectl rollout restart deploy/validator-*   (flush SessionInfo caches)

No 2-session config wait, no manual setNodeFeature, no coretime.assignCore — all baked into migration.

What's included

Fix	Purpose
`EnableAsyncBackingAndCoretime` migration	Writes ActiveConfig atomically with setCode (num_cores, async_backing, lookahead, node_features, max_validators_per_core=None)
Free stuck `AvailabilityCores` + `ClaimQueue`	Mirrors `Scheduler::push_occupied_cores_to_assignment_provider` at migration time so no session wait
`UNINCLUDED_SEGMENT_CAPACITY=1`	Prevents cumulus fork-authoring → fragment-chain fork deadlock (pre-paritytech#4937)
paritytech#4727 / paritytech#4471 / paritytech#4724 cherry-picks	Backport scheduler + collator improvements from v1.13

Tradeoffs

Synchronous-backing pace (~18s/block) under capacity=1 — acceptable for small networks using v1.12.0 as a stable endpoint
stable2512 hop restores capacity=2 (async-backing safe there because prospective-parachains rework: take II paritytech/polkadot-sdk#4937 is native)

Test plan

forked-testnet rehearsal 1 — plain migration, 56min stuck ❌
forked-testnet rehearsal 2 — migration + restart, 16s unstick ✅
forked-testnet rehearsal 3 — clean rebuild from branch HEAD, full flow, 16s unstick ✅
forked-testnet rehearsal 4 — on consolidated upgrade/1.12.0-all, full flow, 16s unstick ✅
rebase upgrade/stable2512 on top, re-rehearse stable2512 hop

Supersedes

cherry-pick: async-backing backports from v1.13 onto upgrade/1.12.0 (parked) #26 (cherry-picks)
fix(leafchain,v1.12.0): UNINCLUDED_SEGMENT_CAPACITY 2→1 to sidestep fragment-chain fork deadlock #27 (capacity=1)
fix(v1.12.0): free stuck AvailabilityCores + ClaimQueue atomically with setCode #29 (free-cores migration)

🤖 Generated with Claude Code

... when backing group is of size 1. Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>

…h#4471) Implements paritytech#4429 Collators only need to maintain the implicit view for the paraid they are collating on. In this case, bypass prospective-parachains entirely. It's still useful to use the GetMinimumRelayParents message from prospective-parachains for validators, because the data is already present there. This enables us to entirely remove the subsystem from collators, which consumed resources needlessly Aims to resolve paritytech#4167 TODO: - [x] fix unit tests

Implements most of paritytech#1797 Core sharing (two parachains or more marachains scheduled on the same core with the same `PartsOf57600` value) was not working correctly. The expected behaviour is to have Backed and Included event in each block for the paras sharing the core and the paras should take turns. E.g. for two cores we expect: Backed(a); Included(a)+Backed(b); Included(b)+Backed(a); etc. Instead of this each block contains just one event and there are a lot of gaps (blocks w/o events) during the session. Core sharing should also work when collators are building collations ahead of time TODOs: - [x] Add a zombienet test verifying that the behaviour mentioned above works. - [x] prdoc --------- Co-authored-by: alindima <alin@parity.io>

After cherry-picking paritytech#4724 (core sharing + scheduling_lookahead) on top of v1.12.0, five upstream references do not resolve because the enabling PRs landed later: 1. `polkadot_statement_table::{…}` → use local `statement-table` alias (crate package name is `polkadot-statement-table`, imported via `statement-table = { package = … }` in Cargo.toml). 2. `polkadot_primitives::{…}` in `scheduler.rs` and `runtime_api_impl/vstaging.rs` → use local `primitives` alias. 3. `AvailabilityStoreMessage::StoreAvailableData::{core_index, node_features}` → the two new fields were added by the systematic-chunks PR paritytech#1644, which is NOT cherry-picked (it's 7540 LoC / 84 files and brings an unrelated availability-recovery rewrite). Keep the outer function signature as-is for API parity with upstream and drop the two fields at the message-send boundary via `let _ = …;`. The erasure-root check that protects consensus continues to run. 4. `free_cores_and_fill_claimqueue` → in our tree the method is named `free_cores_and_fill_claim_queue` (the rename was a small cleanup that happened concurrently in upstream). Update the two call sites (paras_inherent and runtime_api v10). Build clean: `cargo check -p polkadot -p polkadot-service -p polkadot-runtime-parachains -p thxnet-testnet-runtime` all pass with only pre-existing warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… fragment-chain fork deadlock # Problem On v1.12.0 rootchain with async backing enabled (depth=1, ancestry=2) and a small backing topology (3 validators, 1 group, 1 core), the parachain stalls permanently after a few blocks. Validator log symptom (repeats every ~18s): - `Refusing to second candidate at leaf. Is not a potential member.` - `Rejected v2 advertisement ... error=BlockedByBacking` - Provisioner: `candidates_count=0` in every inherent # Root cause The v1.12.0 relay-side prospective-parachains subsystem is the pre-paritytech#4937 ("prospective-parachains rework: take II") fragment-chain. In `polkadot/node/core/prospective-parachains/src/fragment_chain/mod.rs:797` `is_fork_or_cycle` returns true if ANY other candidate already has the same `parent_head_hash`, rejecting it as a fork. When inclusion is slow, the collator's cumulus aura-ext consensus hook (FixedVelocityConsensusHook<V=1, C=2>) allows authoring a 2nd block every ~18s with the same parent while the first one is still waiting to be included. Each re-author produces a new block N with different extrinsics → same parent_head_hash → fragment-chain rejects all but the first → backing pipeline never completes a full cycle → permanent stall. Upstream fix is paritytech#4937, first in stable2409. Not cherry-pickable into v1.12.0 without dragging Constraints types from later releases (~1355 LoC rewrite). # Workaround Set UNINCLUDED_SEGMENT_CAPACITY = 1. `FixedVelocityConsensusHook::can_build_upon` checks `size_after_included >= C` and returns false if true. With C=1, once the collator has produced one unincluded block, all further authoring attempts at the same parent are blocked until that block is included on the relay. No forks are ever created at the cumulus side → fragment-chain sees a single candidate per cycle → accepts it normally. Tradeoff: para block production becomes synchronous-backing pace (one block per ~18s = 3 relay slots) instead of async's potential 1:1 with relay. Acceptable for small networks that need v1.12.0 as a stable endpoint rather than a transient upgrade hop. # Validation Rehearsed on forked-testnet 2026-04-18/19: - v0.9.40 → binary upgrade → leafchain setCode spec 20 → 21 (capacity=1) - rootchain setCode v1.12.0 (spec 112000003) - Para advanced from stuck-at-13 → 4556+ overnight, finalization keeping pace with head, validator provisioner consistently reports `candidates_count=1` on backing cycles. Prior capacity=2 rehearsal on the same topology reproducibly stalled para at 13–30 within minutes of rootchain setCode and never recovered. Bump leafchain spec_version 20 → 21 so setCode applies after the v1.12.0 rootchain upgrade. The stable2512 hop restores capacity to 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…th setCode Extends `EnableAsyncBackingAndCoretime` migration to force-free any occupied `AvailabilityCores` and kill `ClaimQueue` right after rewriting `ActiveConfig`. Mirrors what `Scheduler::push_occupied_cores_to_assignment_provider` does at session rotation, so the next `ParaInherent` pass can schedule fresh candidates without waiting ~1 hour for the prod-testnet session boundary. Background: when the v1.12.0 rootchain runtime enables async backing while a candidate is still occupying a core (common under capacity=2 cumulus + pre-paritytech#4937 relay), the occupying entry never reaches availability and the core stays stuck until the next session. Empirically measured on forked-testnet: - plain migration → para stuck 56 min (until session rotation) - this patch + validator restart → para advancing 16 s after setCode InBlock The restart step is required because prospective-parachains / SessionInfo caches live in validator-process memory and need flushing to pick up the new scheduler state. Runbook: `kubectl rollout restart deploy/validator-*` post-setCode. - Bumps spec_version 112_000_003 → 112_000_004 - Idempotent: re-running on a chain where cores are already free is a no-op Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…from v1.13)

…UDED_SEGMENT_CAPACITY=1)

…ilityCores in migration)

Brings the consolidated v1.12.0 fixes (PR #30) into the stable2512 upgrade path. Resolution strategy: - Took stable2512 side for polkadot/node/*, polkadot/runtime/parachains/*, zombienet_tests/*: stable2512 has upstream paritytech#4937 + related PRs natively, so the v1.12.0 backport cherry-picks (paritytech#4727, paritytech#4471, paritytech#4724) are redundant. - Deleted prdoc/pr_4471.prdoc + prdoc/pr_4724.prdoc: those PRs are already in the stable2512 branch history. - Kept stable2512 leafchain/runtime/general/src/lib.rs (capacity=2) — paritytech#4937 fixes the fragment-chain fork deadlock natively so capacity=1 is not needed on stable2512. - Manually merged thxnet/runtime/thxnet-testnet/src/lib.rs: - spec_version stays at 125_120_005 (stable2512) - EnableAsyncBackingAndCoretime now includes the free-stuck-cores logic from PR #29 (also useful on stable2512 for setCode-atomic unstick). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e2512 PR #30 squash-merged into upgrade/1.12.0, bringing in the consolidated v1.12.0 fixes. Re-merging on top of stable2512. Resolution strategy (same as prior merge of upgrade/1.12.0-all): - polkadot/node/*, polkadot/runtime/parachains/*, zombienet_tests/*, .gitlab/*: taken from stable2512 side — paritytech#4937 is native to stable2512 so v1.12.0 backport cherry-picks (paritytech#4727/paritytech#4471/paritytech#4724) are redundant. - thxnet/leafchain/runtime/general/src/lib.rs: stable2512 (capacity=2 safe with native paritytech#4937). - thxnet/runtime/thxnet-testnet/src/lib.rs: merged manually: - spec_version stays 125_120_005 (stable2512) - EnableAsyncBackingAndCoretime keeps ClaimQueue::kill only (stable2512 doesn't have AvailabilityCores storage after paritytech#4937 refactor) - Deleted prdoc/pr_4471.prdoc, prdoc/pr_4724.prdoc: upstream in stable2512. - Deleted newly-added zombienet test files from backports: upstream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

alexggh and others added 9 commits April 17, 2026 17:52

statement-distribution: Fix false warning (paritytech#4727)

b036e8e

... when backing group is of size 1. Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>

merge: upgrade/1.12.0-backports (PR #26 - async-backing cherry-picks …

1deb8cd

…from v1.13)

merge: upgrade/1.12.0-capacity1-workaround (PR #27 - leafchain UNINCL…

9fffd28

…UDED_SEGMENT_CAPACITY=1)

merge: fix/v1.12.0-free-stuck-cores-on-setcode (PR #29 - free Availab…

8ed5b2a

…ilityCores in migration)

DrudgeRajen merged commit 91a64b5 into upgrade/1.12.0 Apr 19, 2026
3 of 21 checks passed

DrudgeRajen mentioned this pull request Apr 28, 2026

fix(release/v1.12.0): bring in v1.12.0 upgrade fixes + Docker compat #35

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consolidate: v1.12.0 upgrade with all fixes (supersedes #26, #27, #29)#30

consolidate: v1.12.0 upgrade with all fixes (supersedes #26, #27, #29)#30
DrudgeRajen merged 9 commits into
upgrade/1.12.0from
upgrade/1.12.0-all

DrudgeRajen commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

DrudgeRajen commented Apr 19, 2026

Summary

Why consolidate

End-to-end measurement (forked-testnet, prod-runtime, 1h epoch)

Upgrade runbook (v0.9.40 → v1.12.0)

What's included

Tradeoffs

Test plan

Supersedes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants