e2e-tests: Stage relay snapshot per-validator (fix bulletin_fetch flake)#3248
Closed
BigTava wants to merge 2 commits into
Closed
e2e-tests: Stage relay snapshot per-validator (fix bulletin_fetch flake)#3248BigTava wants to merge 2 commits into
BigTava wants to merge 2 commits into
Conversation
… panic zombienet-provider's tar extraction
…hot cache race (same workaround the smoke tests use)
Contributor
Author
|
Closing in favor of #3249. That PR has a single focused commit and the correct framing — the actual fix is per-validator staging of the relay snapshot (same workaround the smoke tests use). The download-hardening reasoning in this PR's title was misleading; the GCS bug I diagnosed wasn't what was actually flaking the test. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
bulletin_fetchflakes witharchive.unpack().unwrap()panicking onUnexpectedEofmid-extract inside zombienet-provider. Empirically reproduced 3/5 reruns on a single commit — and not on a fresh download path: cache hit, zombienet copying our pre-validated tarball, still failed.The cause: alice and bob both pass the same relay snapshot path to
with_db_snapshot(...). zombienet-provider 0.4.11 keys its internal cache bysha256(path_string)— so the two validators race writing/reading the same intermediate<hash>.tgzin the namespace dir. One sees a partially-written file, panics.The smoke tests already documented this race and worked around it the same way (
stage_per_node_snapshotsine2e-tests/src/network.rs:171). This PR applies the same workaround tobulletin_fetch: copy the relay tarball to two distinct paths (relay-alice.tgz,relay-bob.tgz) so each validator gets its own cache slot. Bulletin collators were already fine — they use different tarballs (bulletin-fullvsbulletin-partial), so distinct paths already.The previous commit (prefetch via curl) stays as a defensive download hardening but isn't load-bearing for this fix.
Test plan
cargo checkcleanbulletin_fetchruns all pass