Skip to content

e2e-tests: Stage relay snapshot per-validator (fix bulletin_fetch flake)#3248

Closed
BigTava wants to merge 2 commits into
mainfrom
tiago-prefetch-bulletin-snapshots
Closed

e2e-tests: Stage relay snapshot per-validator (fix bulletin_fetch flake)#3248
BigTava wants to merge 2 commits into
mainfrom
tiago-prefetch-bulletin-snapshots

Conversation

@BigTava
Copy link
Copy Markdown
Contributor

@BigTava BigTava commented May 8, 2026

Summary

bulletin_fetch flakes with archive.unpack().unwrap() panicking on UnexpectedEof mid-extract inside zombienet-provider. Empirically reproduced 3/5 reruns on a single commit — and not on a fresh download path: cache hit, zombienet copying our pre-validated tarball, still failed.

The cause: alice and bob both pass the same relay snapshot path to with_db_snapshot(...). zombienet-provider 0.4.11 keys its internal cache by sha256(path_string) — so the two validators race writing/reading the same intermediate <hash>.tgz in the namespace dir. One sees a partially-written file, panics.

The smoke tests already documented this race and worked around it the same way (stage_per_node_snapshots in e2e-tests/src/network.rs:171). This PR applies the same workaround to bulletin_fetch: copy the relay tarball to two distinct paths (relay-alice.tgz, relay-bob.tgz) so each validator gets its own cache slot. Bulletin collators were already fine — they use different tarballs (bulletin-full vs bulletin-partial), so distinct paths already.

The previous commit (prefetch via curl) stays as a defensive download hardening but isn't load-bearing for this fix.

Test plan

  • cargo check clean
  • CI: 5 consecutive bulletin_fetch runs all pass

BigTava added 2 commits May 8, 2026 15:10
…hot cache race (same workaround the smoke tests use)
@BigTava BigTava changed the title e2e-tests: Pre-fetch bulletin snapshots to dodge a zombienet-provider tar-extract flake e2e-tests: Stage relay snapshot per-validator (fix bulletin_fetch flake) May 8, 2026
@BigTava
Copy link
Copy Markdown
Contributor Author

BigTava commented May 8, 2026

Closing in favor of #3249. That PR has a single focused commit and the correct framing — the actual fix is per-validator staging of the relay snapshot (same workaround the smoke tests use). The download-hardening reasoning in this PR's title was misleading; the GCS bug I diagnosed wasn't what was actually flaking the test.

@BigTava BigTava closed this May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant