Skip to content

ci: cut GCS egress for go-build image and master build cache#13037

Merged
fasaxc merged 5 commits into
projectcalico:masterfrom
fasaxc:ci-cache-reduce-gcs-egress
Jun 24, 2026
Merged

ci: cut GCS egress for go-build image and master build cache#13037
fasaxc merged 5 commits into
projectcalico:masterfrom
fasaxc:ci-cache-reduce-gcs-egress

Conversation

@fasaxc

@fasaxc fasaxc commented Jun 22, 2026

Copy link
Copy Markdown
Member

Storing the build cache to GCS bucket
(calico-transient-build-artifacts-europe-west3) is driving high egress costs,
because Semaphore agents download them on nearly every job. This trims the two
biggest contributors with no change to other branches.

  • Move master's branch-keyed caches to Semaphore's free built-in cache
    command.
    The ~1.8 GB Go build cache (build-cache-<group>) and the
    working-copy tarball now use cache store/cache restore on master. Other
    branches keep using GCS for now.
  • Use the cached calico/go-build image in GCS as fallback only. Try a (free!)
    pull from dockerhub first and only fall back to GCS on failure.

@fasaxc fasaxc requested a review from a team as a code owner June 22, 2026 13:22
Copilot AI review requested due to automatic review settings June 22, 2026 13:22
@fasaxc fasaxc added docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact labels Jun 22, 2026
@marvin-tigera marvin-tigera added this to the Calico v3.33.0 milestone Jun 22, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Calico’s Semaphore CI pipelines to reduce GCS egress costs by eliminating the largest recurring downloads and shifting master’s cross-workflow caches to Semaphore’s built-in cache backend.

Changes:

  • Removed the GCS-backed calico/go-build image caching path (and its prerequisite job), relying on Docker Hub pulls when needed.
  • Switched master’s branch-keyed working-copy tarball and Go build-cache tarballs from GCS to cache restore/cache store (other branches continue using GCS).
  • Regenerated .semaphore/*.yml outputs from the .semaphore/semaphore.yml.d/ templates.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
.semaphore/semaphore.yml.d/blocks/10-prerequisites.yml Updates prerequisites block(s): stores working copy via Semaphore cache on master; removes go-build image caching job.
.semaphore/semaphore.yml.d/02-global_job_config.yml Updates prologue/epilogue cache restore/store logic to use Semaphore cache for master and removes go-build GCS load.
.semaphore/semaphore.yml Regenerated pipeline reflecting the template changes.
.semaphore/semaphore-scheduled-builds.yml Regenerated scheduled-build pipeline reflecting the template changes.

Comment thread .semaphore/semaphore.yml.d/blocks/10-prerequisites.yml
Comment thread .semaphore/semaphore.yml
Comment thread .semaphore/semaphore-scheduled-builds.yml
Storing CI artifacts to the europe-west3 GCS bucket is driving high
egress costs because Semaphore agents download them on nearly every job.

- Stop caching the calico/go-build image in GCS entirely. Remove the
  "Pull: go-build image" prerequisite job and the prologue GCS-load step;
  jobs now pull calico/go-build fresh from Docker Hub on first use (we
  already docker login in the prologue for the authenticated rate limit).
- Move master's branch-keyed caches (the ~1.8GB Go build cache and the
  working-copy tarball) to Semaphore's free built-in `cache` toolbox
  command. Other branches keep using GCS for now. Keyed on
  SEMAPHORE_GIT_BRANCH == master, which covers both master branch builds
  (store) and PRs targeting master (restore).

The cache flows branch -> PR only: stores happen exclusively on master
branch builds (gated on empty SEMAPHORE_GIT_PR_NUMBER), so PRs never
write the shared master cache, they only restore it. Semaphore cache
keys are not overwritten, so branch builds delete-then-store to refresh.

Generated .semaphore/*.yml regenerated via make gen-semaphore-yaml.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@fasaxc fasaxc force-pushed the ci-cache-reduce-gcs-egress branch from 5bce3ed to abfd8a6 Compare June 22, 2026 13:43
@fasaxc

fasaxc commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

Thanks @copilot — good catch. Fixed in the latest push.

The "Store working copy" job runs on every branch (when: "branch =~ '.*'"), and on a PR SEMAPHORE_GIT_BRANCH is the base branch (master), so the master/Semaphore-cache path was being taken on PRs — cache delete + cache store would have thrashed the shared working-copy-master key. (I can't simply drop the delete: Semaphore cache keys are not overwritten, so a refresh genuinely needs delete-then-store.)

The store on the master path is now gated on -z "${SEMAPHORE_GIT_PR_NUMBER}", matching how the build-cache store is already gated. So the cache flows branch → PR only: master branch builds populate the cache, PRs only restore it and never write it. Other branches' GCS path is unchanged.

Also rebased onto current master and regenerated — the earlier "Check SemaphoreCI files" failure was stale generated YAML (new /lib/logrusr and /lib/std/log package triggers landed on master since the branch was cut), not the cache change.

Comment thread .semaphore/semaphore.yml.d/blocks/10-prerequisites.yml Outdated
# targeting master restore master's Semaphore cache.) Note: `cache
# restore` exits 0 even on a miss, so we test for the restored file.
use_sem_cache=false
[[ "${SEMAPHORE_GIT_BRANCH}" == "master" ]] && use_sem_cache=true

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a weird way to write

if [[ "${SEMAPHORE_GIT_BRANCH}" == "master" ]]; then use_sem_cache=true; fi

Any reason for the above form? Also I believe the above would fail if any set -e logic was in place.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — switched to the plain if [[ ... ]]; then use_sem_cache=true; else use_sem_cache=false; fi form. The old [[ ... ]] && use_sem_cache=true was just habit, no real reason — and you're right it returns non-zero when the test is false, which would bite under set -e. Fixed in e448e1e.

Comment thread .semaphore/semaphore.yml.d/02-global_job_config.yml
fasaxc and others added 2 commits June 24, 2026 10:54
- Write use_sem_cache with a plain if/else instead of `[[ ]] && x=true`,
  which would short-circuit (return non-zero) under set -e (review: nelljerram).
- Clarify the working-copy store comment: the block's `when` already limits
  it to branch builds; the SEMAPHORE_GIT_PR_NUMBER guard is belt-and-suspenders.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reinstate the GCS cache of calico/go-build, but invert the priority: jobs
now pull from Docker Hub first (free egress) and only fall back to loading
the image from GCS if the pull fails. We still see occasional Docker Hub
pull failures even with an authenticated session, so this keeps that
robustness while spending essentially nothing on GCS egress in the common
case.

- Global prologue: docker pull first, GCS load only on pull failure.
- Restore the "Pull: go-build image" prerequisite job (and its
  Prerequisites dependency) to keep the GCS fallback populated. Its
  uploads are free and it is a no-op `gcloud storage ls` on a cache hit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

CI triage — Calico

Recommendation: CI pipeline aborted by fail-fast — re-run before investigating

Failed jobs (most likely killed by fail-fast, not root causes):

  • CNI Plugin / CNI Plugin: CI
  • CNI Plugin: Windows / CNI Plugin: Windows Containerd FV - l2bridge
  • CNI Plugin: Windows / CNI Plugin: Windows Containerd FV - overlay
  • E2E tests (KinD) / E2E tests: Conformance (cluster routing: BIRD)
  • E2E tests (KinD) / E2E tests: Conformance (cluster routing: Felix)
  • E2E tests (KinD) / E2E tests: ClusterNetworkPolicy (cluster routing: Felix)
  • Felix: Windows FV / Felix: Windows FV
  • Felix: FV / Felix: BPF tests on Ubuntu 22.04 (nftables)
  • Felix: FV / Felix: iptables tests on Ubuntu 22.04
  • Felix: FV / Felix: nftables tests on Ubuntu 22.04
  • Felix: FV / Felix: BPF tests on Ubuntu 24.04 (iptables)
  • Felix: FV / Felix: BPF tests on Ubuntu 25.10 with jitharden=2 (nftables)
  • Felix: FV / Felix: BPF tests on Ubuntu 25.10 with netkit (nftables)
  • KubeVirt live migration (KIND) / KubeVirt live migration (KIND)
  • libcalico-go / libcalico-go: CI (crd.projectcalico.org/v1)
  • Node: kind-cluster tests / Node: kind-cluster tests

workflow_id: c4c855d0-4366-4961-b1c8-057a5a8e4bae

The "Store working copy" block only runs on branch builds (its
`when: "branch =~ '.*'"`), so the inner `-z SEMAPHORE_GIT_PR_NUMBER`
check could never be false here. Rely on that single load-bearing
condition instead of the belt-and-suspenders guard (review: nelljerram).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@fasaxc fasaxc merged commit e26a88c into projectcalico:master Jun 24, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required Docs not required for this change release-note-not-required Change has no user-facing impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants