Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
852f48d
spectrum: incremental per-tick digest via dirty-leaf list; net: throt…
hackerby888 Jun 11, 2026
ea153d4
net: bulk catch-up protocol (lite types 240/241)
hackerby888 Jun 12, 2026
eccf98a
net: show bulk chunk counters (received/served) on the status line
hackerby888 Jun 12, 2026
2be5387
net: bulk catch-up capability probe — target the peer that speaks 240…
hackerby888 Jun 12, 2026
5a92ef7
net: bulk responder must not overflow the chunk buffer
hackerby888 Jun 12, 2026
5ff4c74
net: parallel bulk chunk application; perf: Track 1b dirty-path diges…
hackerby888 Jun 12, 2026
2bb101c
net: bulk striping — N concurrent requests across multiple capable peers
hackerby888 Jun 12, 2026
9b3b098
net: cut catch-up prefetch fan-out 5->2 (request-don't-flood)
hackerby888 Jun 12, 2026
3935b33
net: revert catch-up fan-out cut (boot fragility)
hackerby888 Jun 12, 2026
4a9c5a3
net: gate prefetch to depth 2 while bulk active
hackerby888 Jun 12, 2026
bd2bdca
bulk: single-outstanding contiguous pull (kill gap-prone optimistic s…
hackerby888 Jun 12, 2026
dbb5211
bulk: drain work queue in the normal request loop, not only epoch tra…
hackerby888 Jun 12, 2026
79fd651
diag: bulk apply/queue/frontier/future-vote counters in status line (…
hackerby888 Jun 12, 2026
687cf85
net: revert prefetch gate; prefetch is the baseline path, bulk is a b…
hackerby888 Jun 12, 2026
ff63aba
bulk: pipelined fixed-span pull (N in-flight, gap-free, hole re-request)
hackerby888 Jun 12, 2026
7438edd
net: guard random(0) in peerReconnectIfInactive (SIGFPE on peerless n…
hackerby888 Jun 12, 2026
2a0495c
bulk: don't request/serve past the provider's tip (root-cause of ap=0)
hackerby888 Jun 12, 2026
9bed885
bulk: fixed prefetch window, skip-don't-resend (kills the churn)
hackerby888 Jun 12, 2026
dc28c53
bulk: prefetch window 128->512, in-flight 8->64 to fill it
hackerby888 Jun 12, 2026
0ec48e4
pump catch-up lookahead to 512
hackerby888 Jun 12, 2026
b23a3c0
tune catch-up windows: REQUEST_SPAN 512->128, CATCHUP_MAX_PREFETCH 51…
hackerby888 Jun 12, 2026
9c3bd3a
catch-up: CATCHUP_MAX_PREFETCH 32->128
hackerby888 Jun 12, 2026
4c649c7
remove lite bulk catch-up (240/241 chunk protocol)
hackerby888 Jun 12, 2026
92063f8
net: fix peers stuck in isClosing forever
hackerby888 Jun 12, 2026
df1505b
net: bound outgoing connect to 5s (non-blocking)
hackerby888 Jun 12, 2026
9f1ccc9
net: connect timeout 4s (< reaper); recycle forgotten peers via inact…
hackerby888 Jun 12, 2026
2545ede
net: cap incoming connections per IP (runtime --max-inbound-per-ip, d…
hackerby888 Jun 12, 2026
3de6619
net: don't cull handshaked peers in the 120s peer refresh
hackerby888 Jun 12, 2026
ce9851e
fine tune params
hackerby888 Jun 13, 2026
80c9ef2
accept loopback by default
hackerby888 Jun 15, 2026
20c2583
add fork-based tick rollback (--rollback-mode=fork)
hackerby888 Jun 15, 2026
f530f35
fork rollback WIP: child networking reset (Overload::resetForChildPro…
hackerby888 Jun 15, 2026
0733e89
fork rollback: stop-the-world networking + child net rebuild (Plan A)
hackerby888 Jun 15, 2026
1f1403d
fork rollback: test hooks (force-match, edge-case injector, bench) + …
hackerby888 Jun 15, 2026
c774b89
fork rollback: correct solution logs on the re-run (drop spurious dep…
hackerby888 Jun 15, 2026
b814ffc
fork rollback: drop legacy reprocess, revert logging to upstream-plain
hackerby888 Jun 15, 2026
b766647
testnet: 10 solution processors, 20 max processors (was 2/6)
hackerby888 Jun 15, 2026
5b0ce37
Merge origin/main into feat/tick-fork-rollback
hackerby888 Jun 15, 2026
b8efda9
fork rollback: discard parent shadow diverts on mismatch (fix /s orph…
hackerby888 Jun 15, 2026
aee0549
testnet: keep loopback peer so a single node self-echoes to quorum
hackerby888 Jun 16, 2026
1b074d0
fork rollback: supervisor shim to keep PID stable across promotes
hackerby888 Jun 16, 2026
98c9c3e
fork rollback: review-pass hardening + a force-mismatch test hook
hackerby888 Jun 16, 2026
3bd0151
fork rollback: RPC sidecar (design B) so RPC survives promotes
hackerby888 Jun 16, 2026
0aede10
fork rollback: checkpoint-and-replay window (k=16) to amortize fork
hackerby888 Jun 16, 2026
729112f
fork rollback + rpc: shorten verbose header/block comments to 1-2 lines
hackerby888 Jun 16, 2026
a27835c
qubic.cpp: consolidate extension/optimization includes into two group…
hackerby888 Jun 16, 2026
e806f44
rpc: share rate limiter, default to sidecar
hackerby888 Jun 16, 2026
0c5ed48
fork rollback: fail-safe on every fork error (strict or loud exit)
hackerby888 Jun 16, 2026
f2e184c
fork rollback: close inherited RPC unix listener on promote (fd leak)
hackerby888 Jun 16, 2026
376dd13
fix comments
hackerby888 Jun 16, 2026
114be0c
merge main
hackerby888 Jun 16, 2026
68179bc
fork rollback: keep disk_shadow.h Windows-safe (MSVC test build)
hackerby888 Jun 16, 2026
321d8c4
fork rollback: add --fork-force-rollback-every N test flag
hackerby888 Jun 16, 2026
d3b5209
fork rollback: only fork at the network frontier, not while catching up
hackerby888 Jun 16, 2026
2ade182
fork rollback: revert catch-up gate; log each BSP fork step
hackerby888 Jun 16, 2026
0a6c74e
net: lazy-spawn per-socket tx/rx workers (cv-blocked when idle)
hackerby888 Jun 16, 2026
0c26405
Merge fast-sync into feat/tick-fork-rollback
hackerby888 Jun 17, 2026
8ea0191
fork rollback: reset swapVM pins on child promote; cap catch-up prefetch
hackerby888 Jun 17, 2026
9f6bf1b
net: FAST_TX_WINDOW_TICKS 512->64 (~2.4GB -> ~300MB)
hackerby888 Jun 17, 2026
e46d01f
net: fix per-socket worker reuse on slot reconnect + worker stop-check
hackerby888 Jun 17, 2026
869a33e
fork rollback: never let a checkpoint window span the epoch boundary
hackerby888 Jun 17, 2026
08b7f24
rpc_routes.h: drop route-path divider comments that restate the RPC_R…
hackerby888 Jun 17, 2026
9cd0ab5
hide debug logs
hackerby888 Jun 17, 2026
fee4f38
log: line-buffer stdout so docker/pipe logs survive fork _exit
hackerby888 Jun 17, 2026
3db83fe
pump gForkWindowK
hackerby888 Jun 17, 2026
0e3773f
merge main
hackerby888 Jun 17, 2026
a16223c
rpc: release swap pins per request (fix tickData "all cache pages pin…
hackerby888 Jun 17, 2026
3484ed3
Merge main: checkin-thread SIGSEGV fix (serialize JSON per-call, fix …
hackerby888 Jun 17, 2026
f50d3e7
fix fork re-run double-reward: drop isRevalidation dedup bypass
hackerby888 Jun 18, 2026
3705094
merge main
hackerby888 Jun 20, 2026
41a3262
fork-rollback: quiesce swap writers before shadow commit, retire wind…
hackerby888 Jun 20, 2026
3120a64
fork-rollback: census-gated fork eligibility (no hand-maintained lock…
hackerby888 Jun 21, 2026
a2591ba
fork-stats: guard gmtime_r for MSVC build
hackerby888 Jun 22, 2026
882736c
fix more subtle bugs
hackerby888 Jun 23, 2026
0dbe64e
Increase gForkWindowK from 32 to 64
hackerby888 Jun 24, 2026
28a61b4
Merge branch 'main' into feat/tick-fork-rollback
hackerby888 Jun 24, 2026
99f6b94
merge main
hackerby888 Jun 25, 2026
b917597
Merge branch 'main' into feat/tick-fork-rollback
hackerby888 Jun 25, 2026
326e51c
fix duplicated options bug
hackerby888 Jun 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,18 @@ if(BUILD_BENCHMARK)
message(STATUS "-- EFI Benchmark ---")
add_subdirectory(benchmark_uefi)
endif()

# Fork-eligibility census enforcement (tick fork-rollback): fail the build on a bare
# std::mutex declaration that would escape the lock census. Linux-only (the fork machinery + bash).
if(UNIX)
add_custom_target(check_smart_mutex ALL
COMMAND bash ${CMAKE_CURRENT_SOURCE_DIR}/tools/check_smart_mutex.sh
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
COMMENT "check_smart_mutex: scanning for unsanctioned std::mutex declarations (fork census)")
if(TARGET Qubic)
add_dependencies(Qubic check_smart_mutex)
endif()
if(TARGET qubic_core_tests)
add_dependencies(qubic_core_tests check_smart_mutex)
endif()
endif()
7 changes: 5 additions & 2 deletions src/assets/assets.h
Original file line number Diff line number Diff line change
Expand Up @@ -724,26 +724,29 @@ static void getUniverseDigest(m256i& digest)
unsigned int digestIndex;
for (digestIndex = 0; digestIndex < ASSETS_CAPACITY; digestIndex++)
{
if ((digestIndex & 63) == 0 && assetChangeFlags[digestIndex >> 6] == 0) { digestIndex += 63; continue; }
if (assetChangeFlags[digestIndex >> 6] & (1ULL << (digestIndex & 63)))
{
KangarooTwelve(&assets[digestIndex], sizeof(AssetRecord), &assetDigests[digestIndex], 32);
}
}
unsigned int previousLevelBeginning = 0;
unsigned int writeBase = ASSETS_CAPACITY;
unsigned int numberOfLeafs = ASSETS_CAPACITY;
while (numberOfLeafs > 1)
{
for (unsigned int i = 0; i < numberOfLeafs; i += 2)
{
if ((i & 63) == 0 && assetChangeFlags[i >> 6] == 0) { i += 62; continue; } // skip 32 clean pairs
if (assetChangeFlags[i >> 6] & (3ULL << (i & 63)))
{
KangarooTwelve64To32(&assetDigests[previousLevelBeginning + i], &assetDigests[digestIndex]);
KangarooTwelve64To32(&assetDigests[previousLevelBeginning + i], &assetDigests[writeBase + (i >> 1)]);
assetChangeFlags[i >> 6] &= ~(3ULL << (i & 63));
assetChangeFlags[i >> 7] |= (1ULL << ((i >> 1) & 63));
}
digestIndex++;
}
previousLevelBeginning += numberOfLeafs;
writeBase += (numberOfLeafs >> 1);
numberOfLeafs >>= 1;
}
assetChangeFlags[0] = 0;
Expand Down
247 changes: 247 additions & 0 deletions src/extensions/disk_shadow.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
#pragma once

// Disk-rollback shadow for the fork tick rollback: during a window, parent VM page writes divert
// to a per-dir /s subdir (real files stay pristine for the child); commit on match, discard on
// mismatch. Include before virtual_memory.h. CHAR16 = 2-byte (-fshort-wchar): no std::wstring/libc-wide.
// The fork machinery is Linux-only; only the flags below + the VM hooks are cross-platform (the heavy
// std deps pull <process.h> on MSVC, which collides with system.h's `system` macro), so they are gated.

#include <atomic> // the fork-rollback flags below are CLI-settable on every platform

// Armed on the parent for the duration of one fork window.
inline volatile bool gForkWindowActive = false;
// Drives isRevalidation strict scoring during the child re-run (not forceVerifySolutions).
inline std::atomic<bool> gReRunStrict{ false };
// Checkpoint-and-replay: the promoted child re-runs strict through this tick (the window's last
// processed tick at mismatch), then resumes optimistic. 0 = single-tick strict (legacy/fork-fail).
inline std::atomic<unsigned int> gReRunStrictUntilTick{ 0 };
// Set when a shadow dir/commit disk op fails; verdict then forces a strict child replay from the
// pristine real files (the optimistic on-disk state can no longer be trusted). Cleared at arm().
inline std::atomic<bool> gShadowPoisoned{ false };
// Test: assert the fork re-run reproduces the quorum digest.
inline volatile bool gVerifyForkRollback = false;

// Test hooks (fork mode): force a fork every tick (exercise the MATCH path on clean ticks),
// force the verdict to take the match branch, and print per-fork timing + RSS.
inline volatile bool gForkForceFork = false;
inline volatile bool gForkForceMatch = false;
inline volatile bool gForkForceMismatch = false;
inline volatile bool gForkBench = false;
// Force a single-tick fork + rollback-replay every N ticks (0 = off).
inline unsigned int gForkForceRollbackEvery = 0;

// Request-processor quiesce for a consistent fork snapshot.
inline std::atomic<bool> gForkQuiesceRequest{ false };
inline std::atomic<int> gForkParked{ 0 };
inline std::atomic<unsigned> gForkParkGen{ 0 }; // bumped per fork window; see liteForkRequestPark

#ifdef __linux__ // fork-based disk rollback: Linux-only (fork/COW); these std deps pull <process.h> on MSVC

#include <map>
#include <set>
#include <string>
#include <vector>
#include <mutex>
#include <new>
#include <thread>
#include <chrono>
#include <filesystem>
#include <utility>
#include <cstdlib>
#include <unistd.h>

// Called by request processors at loop top; parks while a fork window is set up.
static inline void liteForkRequestPark()
{
if (!gForkQuiesceRequest.load(std::memory_order_acquire)) return;
// Count once per fork window (generation): a straggler from a prior window can't double-count or
// underflow the barrier, and there is no decrement-on-release to race the next window's reset.
static thread_local unsigned myGen = (unsigned)-1;
unsigned g = gForkParkGen.load(std::memory_order_acquire);
if (myGen != g) { myGen = g; gForkParked.fetch_add(1, std::memory_order_acq_rel); }
while (gForkQuiesceRequest.load(std::memory_order_acquire)) std::this_thread::yield();
}

class DiskShadow
{
std::mutex mtx; // SMARTMUTEX-EXEMPT: shadow-dir lock, owner-reinit in reinitForChildPromote; provably not held across fork()
std::map<std::string, std::vector<CHAR16>> shadowDir; // real dir (utf8) -> shadow dir buffer
std::set<std::pair<std::string, std::string>> written; // (real dir utf8, page name utf8)

// 2-byte safe length; volatile defeats clang rewriting the scan into libc wcslen.
static size_t len16(const CHAR16* s)
{
const volatile CHAR16* p = s;
size_t n = 0;
while (p[n]) ++n;
return n;
}

// mtx held; cached "<realDir>/s" CHAR16 buffer, created on first use.
CHAR16* ensure(const std::string& realUtf8, const CHAR16* realDir)
{
auto it = shadowDir.find(realUtf8);
if (it == shadowDir.end())
{
size_t n = len16(realDir);
std::vector<CHAR16> buf(n + 3);
for (size_t i = 0; i < n; i++) buf[i] = realDir[i];
buf[n] = (CHAR16)'/'; buf[n + 1] = (CHAR16)'s'; buf[n + 2] = 0;
if (!createDir(buf.data()))
{
gShadowPoisoned.store(true, std::memory_order_release);
fprintf(stderr, "[SHADOW] createDir failed for %s/s -> poison (force strict replay)\n", realUtf8.c_str());
fflush(stderr);
}
it = shadowDir.emplace(realUtf8, std::move(buf)).first;
}
return it->second.data();
}

void clearWindow()
{
gForkWindowActive = false;
active.store(false, std::memory_order_release);
written.clear();
}

public:
std::atomic<bool> active{ false };

void arm()
{
std::lock_guard<std::mutex> g(mtx);
// Purge any orphan /s/ pages left on disk by a prior window (failed commit-rename / crash /
// commit race) so this window starts from a clean divert dir. Clear shadowDir too so ensure()
// recreates the dirs fresh on the next writeDir.
for (const auto& kv : shadowDir)
{
std::error_code ec;
std::filesystem::remove_all(kv.first + "/s", ec);
}
shadowDir.clear();
written.clear();
gShadowPoisoned.store(false, std::memory_order_release);
active.store(true, std::memory_order_release);
gForkWindowActive = true;
}

// Write choke-point: record the page and redirect to the shadow dir.
CHAR16* writeDir(CHAR16* realDir, const CHAR16* pageName)
{
if (!active.load(std::memory_order_acquire)) return realDir;
std::lock_guard<std::mutex> g(mtx);
if (!active.load(std::memory_order_acquire)) return realDir; // re-check under mtx: commit()/discard() may have closed the window in the check->lock gap
std::string realUtf8 = wchar_to_string(realDir);
CHAR16* sd = ensure(realUtf8, realDir);
std::string pageUtf8 = wchar_to_string(pageName);
if (gForkBench) { fprintf(stderr, "[SHADOW] divert %s/%s\n", realUtf8.c_str(), pageUtf8.c_str()); fflush(stderr); }
written.insert({ std::move(realUtf8), std::move(pageUtf8) });
return sd;
}

// Read choke-point: serve from the shadow dir if the page was diverted, else real.
CHAR16* readDir(CHAR16* realDir, const CHAR16* pageName)
{
if (!active.load(std::memory_order_acquire)) return realDir;
std::lock_guard<std::mutex> g(mtx);
if (!active.load(std::memory_order_acquire)) return realDir; // re-check under mtx (see writeDir)
std::string realUtf8 = wchar_to_string(realDir);
auto it = shadowDir.find(realUtf8);
if (it == shadowDir.end()) return realDir;
// Divert to /s/ ONLY if this window actually wrote the page (the `written` set) — not merely
// because a /s/ file exists on disk, which could be a stale orphan from a prior window.
if (!written.count({ realUtf8, wchar_to_string(pageName) })) return realDir;
if (getFileSize((CHAR16*)pageName, it->second.data()) < 0) return realDir;
return it->second.data();
}

// Quorum match: move diverted pages into their real dirs. A failed rename is NOT benign: an evicted
// page's only current copy is its /s/ file (it is no longer resident, despite the prior "RAM is
// authoritative" claim), so the next arm() purge or the following snapshot would lose it -> silent
// corruption / boot exit(1). Mirror the swap writeback: bounded retry, then fatal so restart reloads
// the last good snapshot + re-syncs rather than persisting a stale on-disk page.
void commit()
{
std::lock_guard<std::mutex> g(mtx);
if (gForkBench && !written.empty()) { fprintf(stderr, "[SHADOW] commit %zu diverted page(s) -> real\n", written.size()); fflush(stderr); }
for (const auto& [real, name] : written)
{
const std::string from = real + "/s/" + name;
const std::string to = real + "/" + name;
unsigned int delayMs = 100; // mirrors SWAPVM_IO_INITIAL_DELAY_MS
bool ok = false;
for (int attempt = 0; attempt < 5; attempt++) // mirrors SWAPVM_IO_MAX_ATTEMPTS
{
std::error_code ec;
std::filesystem::rename(from, to, ec);
if (!ec) { ok = true; break; }
fprintf(stderr, "[SHADOW] commit rename failed (attempt %d/5) %s -> %s: %s\n",
attempt + 1, from.c_str(), to.c_str(), ec.message().c_str());
fflush(stderr);
if (attempt + 1 < 5) { std::this_thread::sleep_for(std::chrono::milliseconds(delayMs)); delayMs *= 2; }
}
if (!ok)
{
fprintf(stderr, "[SHADOW] FATAL: commit could not persist %s (disk failure) -> exit for restart from snapshot\n", to.c_str());
fflush(stderr);
_exit(1); // not exit(): skip atexit/global dtors that would deadlock under the held mtx + gRpcDispatchLock
}
}
clearWindow();
}

// Quorum mismatch: drop diverted pages; real files were never touched.
void discard()
{
std::lock_guard<std::mutex> g(mtx);
if (gForkBench && !written.empty()) { fprintf(stderr, "[SHADOW] discard %zu diverted page(s)\n", written.size()); fflush(stderr); }
for (const auto& [real, name] : written)
{
std::error_code ec;
std::filesystem::remove(real + "/s/" + name, ec);
}
clearWindow();
}

// Promoted fork child: the inherited mtx may be held by a thread that did not survive the fork.
// Reinit it (mirrors Overload::resetForChildPromote) so the following purgeOrphans cannot deadlock.
void reinitForChildPromote()
{
new (&mtx) std::mutex();
}

// Defensive cleanup of leftover shadow subdirs (e.g. after a parent crash).
void purgeOrphans()
{
std::lock_guard<std::mutex> g(mtx);
if (gForkBench && !written.empty()) { fprintf(stderr, "[SHADOW] child purgeOrphans: drop %zu diverted page(s); real pristine\n", written.size()); fflush(stderr); }
for (const auto& kv : shadowDir)
{
std::error_code ec;
std::filesystem::remove_all(kv.first + "/s", ec);
}
written.clear();
active.store(false, std::memory_order_release);
}
};

inline DiskShadow gShadow;

// Hooks called from the VM disk choke-points in virtual_memory.h.
static inline CHAR16* liteShadowWriteDir(CHAR16* pageDir, const CHAR16* pageName)
{
return gShadow.writeDir(pageDir, pageName);
}
static inline CHAR16* liteShadowReadDir(CHAR16* pageDir, const CHAR16* pageName)
{
return gShadow.readDir(pageDir, pageName);
}

#else // !__linux__ : no fork rollback; the VM hooks pass through and the request park is a no-op.

static inline void liteForkRequestPark() {}
static inline CHAR16* liteShadowWriteDir(CHAR16* pageDir, const CHAR16*) { return pageDir; }
static inline CHAR16* liteShadowReadDir(CHAR16* pageDir, const CHAR16*) { return pageDir; }

#endif // __linux__
35 changes: 35 additions & 0 deletions src/extensions/fork_census.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#pragma once

// Census-aware mutex wrappers (tick fork-rollback). A std::mutex held by a non-AP thread would be
// inherited locked by the forked child; declaring it SmartMutex/SmartSharedMutex funnels it through the
// same census as the ACQUIRE/RELEASE spin-locks so bspForkPoint's gate sees it too -> no hand lock list.

#include <mutex>
#include <shared_mutex>
#include "platform/concurrency.h" // forkCensusEnter/forkCensusLeave

struct SmartMutex
{
std::mutex m; // SMARTMUTEX-EXEMPT: wrapper internal (this IS the census-aware wrapper)
const char* nm;
explicit SmartMutex(const char* name = "SmartMutex") : nm(name) {}

void lock() { m.lock(); forkCensusEnter(nm); }
bool try_lock() { if (!m.try_lock()) return false; forkCensusEnter(nm); return true; }
void unlock() { forkCensusLeave(); m.unlock(); }
};

struct SmartSharedMutex
{
std::shared_mutex m; // SMARTMUTEX-EXEMPT: wrapper internal (this IS the census-aware wrapper)
const char* nm;
explicit SmartSharedMutex(const char* name = "SmartSharedMutex") : nm(name) {}

void lock() { m.lock(); forkCensusEnter(nm); }
bool try_lock() { if (!m.try_lock()) return false; forkCensusEnter(nm); return true; }
void unlock() { forkCensusLeave(); m.unlock(); }

void lock_shared() { m.lock_shared(); forkCensusEnter(nm); }
bool try_lock_shared() { if (!m.try_lock_shared()) return false; forkCensusEnter(nm); return true; }
void unlock_shared() { forkCensusLeave(); m.unlock_shared(); }
};
Loading
Loading