ci: cut GCS egress for go-build image and master build cache#13037
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates Calico’s Semaphore CI pipelines to reduce GCS egress costs by eliminating the largest recurring downloads and shifting master’s cross-workflow caches to Semaphore’s built-in cache backend.
Changes:
- Removed the GCS-backed
calico/go-buildimage caching path (and its prerequisite job), relying on Docker Hub pulls when needed. - Switched master’s branch-keyed working-copy tarball and Go build-cache tarballs from GCS to
cache restore/cache store(other branches continue using GCS). - Regenerated
.semaphore/*.ymloutputs from the.semaphore/semaphore.yml.d/templates.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| .semaphore/semaphore.yml.d/blocks/10-prerequisites.yml | Updates prerequisites block(s): stores working copy via Semaphore cache on master; removes go-build image caching job. |
| .semaphore/semaphore.yml.d/02-global_job_config.yml | Updates prologue/epilogue cache restore/store logic to use Semaphore cache for master and removes go-build GCS load. |
| .semaphore/semaphore.yml | Regenerated pipeline reflecting the template changes. |
| .semaphore/semaphore-scheduled-builds.yml | Regenerated scheduled-build pipeline reflecting the template changes. |
Storing CI artifacts to the europe-west3 GCS bucket is driving high egress costs because Semaphore agents download them on nearly every job. - Stop caching the calico/go-build image in GCS entirely. Remove the "Pull: go-build image" prerequisite job and the prologue GCS-load step; jobs now pull calico/go-build fresh from Docker Hub on first use (we already docker login in the prologue for the authenticated rate limit). - Move master's branch-keyed caches (the ~1.8GB Go build cache and the working-copy tarball) to Semaphore's free built-in `cache` toolbox command. Other branches keep using GCS for now. Keyed on SEMAPHORE_GIT_BRANCH == master, which covers both master branch builds (store) and PRs targeting master (restore). The cache flows branch -> PR only: stores happen exclusively on master branch builds (gated on empty SEMAPHORE_GIT_PR_NUMBER), so PRs never write the shared master cache, they only restore it. Semaphore cache keys are not overwritten, so branch builds delete-then-store to refresh. Generated .semaphore/*.yml regenerated via make gen-semaphore-yaml. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5bce3ed to
abfd8a6
Compare
|
Thanks @copilot — good catch. Fixed in the latest push. The "Store working copy" job runs on every branch ( The store on the master path is now gated on Also rebased onto current master and regenerated — the earlier "Check SemaphoreCI files" failure was stale generated YAML (new |
| # targeting master restore master's Semaphore cache.) Note: `cache | ||
| # restore` exits 0 even on a miss, so we test for the restored file. | ||
| use_sem_cache=false | ||
| [[ "${SEMAPHORE_GIT_BRANCH}" == "master" ]] && use_sem_cache=true |
There was a problem hiding this comment.
This feels like a weird way to write
if [[ "${SEMAPHORE_GIT_BRANCH}" == "master" ]]; then use_sem_cache=true; fi
Any reason for the above form? Also I believe the above would fail if any set -e logic was in place.
There was a problem hiding this comment.
Good point — switched to the plain if [[ ... ]]; then use_sem_cache=true; else use_sem_cache=false; fi form. The old [[ ... ]] && use_sem_cache=true was just habit, no real reason — and you're right it returns non-zero when the test is false, which would bite under set -e. Fixed in e448e1e.
- Write use_sem_cache with a plain if/else instead of `[[ ]] && x=true`, which would short-circuit (return non-zero) under set -e (review: nelljerram). - Clarify the working-copy store comment: the block's `when` already limits it to branch builds; the SEMAPHORE_GIT_PR_NUMBER guard is belt-and-suspenders. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reinstate the GCS cache of calico/go-build, but invert the priority: jobs now pull from Docker Hub first (free egress) and only fall back to loading the image from GCS if the pull fails. We still see occasional Docker Hub pull failures even with an authenticated session, so this keeps that robustness while spending essentially nothing on GCS egress in the common case. - Global prologue: docker pull first, GCS load only on pull failure. - Restore the "Pull: go-build image" prerequisite job (and its Prerequisites dependency) to keep the GCS fallback populated. Its uploads are free and it is a no-op `gcloud storage ls` on a cache hit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI triage — CalicoRecommendation: CI pipeline aborted by fail-fast — re-run before investigating Failed jobs (most likely killed by fail-fast, not root causes):
workflow_id: c4c855d0-4366-4961-b1c8-057a5a8e4bae |
The "Store working copy" block only runs on branch builds (its `when: "branch =~ '.*'"`), so the inner `-z SEMAPHORE_GIT_PR_NUMBER` check could never be false here. Rely on that single load-bearing condition instead of the belt-and-suspenders guard (review: nelljerram). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Storing the build cache to GCS bucket
(
calico-transient-build-artifacts-europe-west3) is driving high egress costs,because Semaphore agents download them on nearly every job. This trims the two
biggest contributors with no change to other branches.
cachecommand. The ~1.8 GB Go build cache (
build-cache-<group>) and theworking-copy tarball now use
cache store/cache restoreon master. Otherbranches keep using GCS for now.
calico/go-buildimage in GCS as fallback only. Try a (free!)pull from dockerhub first and only fall back to GCS on failure.