Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 94 additions & 1 deletion docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ providers:
project: my-gcp-project
credentialsFile: my-gcp-creds.json

- name: nomad-scw
type: nomad
address: https://nomad.internal:4646
token: <nomad-acl-token>
namespace: runners

trayTypes:
- name: cattery-docker-local
provider: docker-local
Expand Down Expand Up @@ -59,6 +65,18 @@ trayTypes:
- europe-west1-d
machineType: e2-standard-4
instanceTemplate: global/instanceTemplates/cattery-default

- name: cattery-nomad
provider: nomad-scw
githubOrg: my-org
runnerGroupId: 3
maxTrays: 5
shutdown: true
config:
jobId: scw-cattery-runner-tray
runnerFolder: /cattery
script: |
echo "extra setup for $TRAY_NAME"
```

### Config sections
Expand Down Expand Up @@ -98,7 +116,7 @@ Common fields for all providers:
| Key | Type | Required | Description |
|------|--------|----------|-----------------------------------------------------|
| name | string | yes | Provider name to reference from trayTypes. |
| type | enum | yes | Provider type. Currently implemented: docker, google (GCE). |
| type | enum | yes | Provider type. Currently implemented: docker, google (GCE), nomad. |

Provider-specific fields:

Expand All @@ -113,6 +131,19 @@ Provider-specific fields:
| project | string | yes | GCP project ID |
| credentialsFile | string | no | Path to GCP service account JSON credentials. If omitted, uses Application Default Credentials. |

- nomad

Cattery dispatches each tray as a child of a **parameterized parent job** that must already be registered in your Nomad cluster. The provider supplies `tray_name`, `bootstrap_token` and `cattery_url` as dispatch meta plus a generated bash payload that downloads and execs the cattery agent. Resources, driver and constraints come from the parent job spec — Nomad does not allow overriding them at dispatch time, so use distinct parameterized jobs for distinct resource shapes.

| Key | Type | Required | Description |
|-----------|--------|----------|---------------------------------------------------------------------------------------------------|
| address | string | yes | Nomad agent HTTP(S) address, e.g. `https://nomad.internal:4646`. |
| token | string | no | Nomad ACL token. Needs `dispatch-job` (StartDeploy), `read-job` (WaitDeploy reads the evaluation), `list-jobs` (CleanTray's leaked-child recovery scan), and `deregister-job`/`purge-job` (CleanTray purges the dispatched child) on the parent job's namespace. See [Nomad ACL policies](https://developer.hashicorp.com/nomad/docs/secure/acl/policies) for the exact capability names in your Nomad version. |
| namespace | string | no | Nomad namespace to dispatch into. Defaults to `default`. |
| region | string | no | Nomad region. Defaults to the agent's region. |
| tlsCaFile | string | no | Path to a PEM CA bundle for verifying the Nomad agent's TLS certificate. |
| insecure | bool | no | Skip TLS verification. Dev-only. |

#### trayTypes
Defines one or more tray "profiles" that the Tray Manager can maintain.

Expand Down Expand Up @@ -147,6 +178,68 @@ Provider-specific config under trayType.config:
| instanceTemplate | string | yes | Template to base instances on (e.g. `global/instanceTemplates/cattery-default`) |
| namePrefix | string | no | Prefix for VM names |

- nomad config

| Key | Type | Required | Description |
|--------------|--------|----------|------------------------------------------------------------------------------------------------------|
| jobId | string | yes | ID of a parameterized parent job already registered in Nomad. Cattery dispatches one child per tray. |
| runnerFolder | string | no | Path inside the guest where the GitHub Actions runner distribution lives. Passed as `--runner-folder` to `cattery agent`. Defaults to `/cattery`. |
| script | string | no | Inline bash, executed after the agent binary is downloaded and before the agent is exec'd. Use YAML's `\|` block scalar for multi-line. |

**Bootstrap composition.** The provider builds the dispatched payload from three pieces:

1. A fixed prelude that downloads the cattery agent binary from `$CATTERY_URL/agent/binary` to `/usr/local/bin/cattery`.
2. The optional `script` field, executed as a pre-agent hook.
3. An `exec /usr/local/bin/cattery agent -i "$TRAY_NAME" -s "$CATTERY_URL" --runner-folder <runnerFolder>`, where `<runnerFolder>` defaults to `/cattery`.

To take over the agent invocation entirely (e.g. when the image starts the agent itself via systemd), put your own `exec ...` at the end of `script` — the default exec emitted afterwards becomes unreachable.

**Parent-job contract.** The parameterized parent job must declare the `parameterized` stanza at the job level *and* materialize the dispatched payload at the task level via `dispatch_payload`. Without `dispatch_payload`, Nomad accepts the dispatch but never writes the payload bytes anywhere the task can read.

```hcl
job "my-runner-tray" {
type = "batch"

parameterized {
payload = "required"
meta_required = ["tray_name", "bootstrap_token", "cattery_url"]
}

group "g" {
task "t" {
// Nomad writes the dispatched payload to ${NOMAD_TASK_DIR}/bootstrap.sh
// before the task starts.
dispatch_payload {
file = "bootstrap.sh"
}

// ... driver, config, resources ...
}
}
}
```

The dispatched bytes land at `${NOMAD_TASK_DIR}/bootstrap.sh`. Your task is responsible for executing that file *with the dispatch meta values exported as env vars* — the script generated by cattery references `$CATTERY_URL`, `$TRAY_NAME` and `$BOOTSTRAP_TOKEN`. Two common ways to wire that up:

- For raw_exec / exec drivers running directly on the host: source a small env file and exec the payload, e.g.
```
set -a; . /etc/cattery/bootstrap.env; set +a
bash "$NOMAD_TASK_DIR/bootstrap.sh"
```
- For VM-style drivers (qemu, firecracker, custom `nomad-runner-vm` wrappers): render a cloud-init userdata template that uses `write_files` to drop the meta values into an env file (e.g. `/etc/cattery/bootstrap.env`) and a `runcmd` that sources it before exec'ing the dispatched payload. The wrapper bakes the rendered userdata into the guest's cidata seed iso.

Either approach must produce an environment where `TRAY_NAME`, `BOOTSTRAP_TOKEN` and `CATTERY_URL` are exported when the payload script runs.

**Lifecycle.**

- Cattery dispatches the parent job with `idPrefixTemplate = tray.Id` and `IdempotencyToken = tray.Id`, and stores `dispatchedJobId` + `evalId` + `parentJobId` in the tray's provider data. The provider stages `parentJobId` and `namespace` in memory before the dispatch call; the trayManager persists provider data once `StartDeploy` returns (success path) or right before cleanup (error path). This recovers the case where Dispatch creates the child but the HTTP response is lost — it does *not* recover a process crash mid-dispatch (parentJobId never reaches the database in that window).
- Cattery blocks until the dispatch evaluation leaves `pending`. `complete` → success; `blocked` → returned as `ErrCapacityBlocked` (Nomad has no capacity for this alloc); `failed`/`canceled` → error.
- On tray cleanup, the dispatched child job is deregistered with `purge=true`. If `dispatchedJobId` is missing (e.g., the dispatch response was lost in transit), cattery lists the parent's dispatched children with the prefix `<parentJobId>/dispatch-` and deregisters any whose ID starts with `<parentJobId>/dispatch-<trayId>-` (the shape Nomad assigns when `idPrefixTemplate = tray.Id`).

**Resource shapes.** Resources, driver, constraints and reschedule policy are baked into the parent job spec — they cannot be set per-dispatch. To run trays at different sizes, register multiple parameterized parent jobs and reference them by `jobId` from different trayTypes.

**`extraMetadata` and Nomad meta.** Any keys in the trayType's `extraMetadata` are forwarded as Nomad dispatch meta alongside `tray_name` / `bootstrap_token` / `cattery_url`. The provider-owned keys are written *last* and cannot be clobbered by `extraMetadata`. Nomad rejects dispatch meta keys that are not declared in the parent job's `meta_required` or `meta_optional`, so any keys you add via `extraMetadata` must also be declared `meta_optional` in the parameterized parent job.


Notes:
- Ensure runnerGroupId corresponds to an existing Runner Group in your GitHub org and that your GitHub App has permission to register runners.
Expand Down
36 changes: 36 additions & 0 deletions examples/example-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,19 @@ providers:
project: my-gcp-project
credentialsFile: path/to/credentials.json

- name: nomad-scw
type: nomad
address: https://nomad.internal:4646
# ACL token. Needs dispatch-job (StartDeploy), read-job (WaitDeploy reads
# the evaluation), list-jobs (CleanTray's leaked-child recovery scan),
# and deregister-job/purge-job (CleanTray purges the dispatched child)
# on the parent job's namespace.
token: <nomad-acl-token>
namespace: runners # optional
region: global # optional
tlsCaFile: path/to/nomad-ca.pem # optional
insecure: false # optional, skip TLS verification (dev only)

trayTypes:
- name: cattery-tiny
provider: docker-local
Expand All @@ -68,3 +81,26 @@ trayTypes:
provider: gce
runnerGroupId: 3 # check in github org settings -> Runner groups
shutdown: true

- name: cattery-nomad
provider: nomad-scw
githubOrg: My-Github-Org
runnerGroupId: 3
maxTrays: 5
shutdown: true
config:
# ID of a parameterized parent job already registered in Nomad. Resources,
# driver and constraints come from that job spec — Nomad does not allow
# overriding them at dispatch time. Use distinct parent jobs for distinct
# resource shapes.
jobId: scw-cattery-runner-tray
# Path inside the guest where the GitHub Actions runner distribution
# lives. Passed as --runner-folder to `cattery agent`. Defaults to
# /cattery if omitted.
runnerFolder: /cattery
# Optional inline bash, executed after the agent binary is downloaded
# and before the agent is exec'd. The parent job is expected to have
# exported TRAY_NAME, BOOTSTRAP_TOKEN and CATTERY_URL from meta.
script: |
echo "extra setup for $TRAY_NAME"
# mkfs / mount scratch volume, install build tools, etc.
7 changes: 7 additions & 0 deletions src/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ require (
github.com/go-playground/validator/v10 v10.30.2
github.com/go-viper/mapstructure/v2 v2.5.0
github.com/google/go-github/v84 v84.0.0
github.com/hashicorp/nomad/api v0.0.0-20260507064547-505b8f595ce4
github.com/prometheus/client_golang v1.23.2
github.com/sirupsen/logrus v1.9.4
github.com/spf13/cobra v1.10.2
Expand Down Expand Up @@ -39,11 +40,17 @@ require (
github.com/google/uuid v1.6.0 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.15 // indirect
github.com/googleapis/gax-go/v2 v2.22.0 // indirect
github.com/gorilla/websocket v1.5.3 // indirect
github.com/hashicorp/cronexpr v1.1.3 // indirect
github.com/hashicorp/errwrap v1.0.0 // indirect
github.com/hashicorp/go-cleanhttp v0.5.2 // indirect
github.com/hashicorp/go-multierror v1.1.1 // indirect
github.com/hashicorp/go-retryablehttp v0.7.8 // indirect
github.com/hashicorp/go-rootcerts v1.0.2 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/klauspost/compress v1.18.5 // indirect
github.com/leodido/go-urn v1.4.0 // indirect
github.com/mitchellh/go-homedir v1.1.0 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/pelletier/go-toml/v2 v2.3.0 // indirect
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
Expand Down
18 changes: 18 additions & 0 deletions src/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XL
github.com/cpuguy83/go-md2man/v2 v2.0.6/go.mod h1:oOW0eioCTA6cOiMLiUPZOpcVxMig6NIQQ7OS05n1F4g=
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1VwoXQT9A3Wy9MM3WgvqSxFWenqJduM=
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/docker/go-units v0.5.0 h1:69rxXcBk27SvSaaxTtLh/8llcHD8vYHT7WSdRZ/jvr4=
github.com/docker/go-units v0.5.0/go.mod h1:fgPhTUdO+D/Jk86RDLlptpiXQzgHJF7gydDDbaIK4Dk=
github.com/fatih/color v1.16.0 h1:zmkK9Ngbjj+K0yRhTVONQh1p/HknKYSlNT+vZCzyokM=
github.com/fatih/color v1.16.0/go.mod h1:fL2Sau1YI5c0pdGEVCbKQbLXB6edEj1ZgiY4NijnWvE=
github.com/felixge/httpsnoop v1.0.4 h1:NFTV2Zj1bL4mc9sqWACXbQFVBBg2W3GPvqp8/ESS2Wg=
Expand Down Expand Up @@ -63,12 +65,24 @@ github.com/googleapis/enterprise-certificate-proxy v0.3.15 h1:xolVQTEXusUcAA5Ugt
github.com/googleapis/enterprise-certificate-proxy v0.3.15/go.mod h1:vqVt9yG9480NtzREnTlmGSBmFrA+bzb0yl0TxoBQXOg=
github.com/googleapis/gax-go/v2 v2.22.0 h1:PjIWBpgGIVKGoCXuiCoP64altEJCj3/Ei+kSU5vlZD4=
github.com/googleapis/gax-go/v2 v2.22.0/go.mod h1:irWBbALSr0Sk3qlqb9SyJ1h68WjgeFuiOzI4Rqw5+aY=
github.com/gorilla/websocket v1.5.3 h1:saDtZ6Pbx/0u+bgYQ3q96pZgCzfhKXGPqt7kZ72aNNg=
github.com/gorilla/websocket v1.5.3/go.mod h1:YR8l580nyteQvAITg2hZ9XVh4b55+EU/adAjf1fMHhE=
github.com/hashicorp/cronexpr v1.1.3 h1:rl5IkxXN2m681EfivTlccqIryzYJSXRGRNa0xeG7NA4=
github.com/hashicorp/cronexpr v1.1.3/go.mod h1:P4wA0KBl9C5q2hABiMO7cp6jcIg96CDh1Efb3g1PWA4=
github.com/hashicorp/errwrap v1.0.0 h1:hLrqtEDnRye3+sgx6z4qVLNuviH3MR5aQ0ykNJa/UYA=
github.com/hashicorp/errwrap v1.0.0/go.mod h1:YH+1FKiLXxHSkmPseP+kNlulaMuP3n2brvKWEqk/Jc4=
github.com/hashicorp/go-cleanhttp v0.5.2 h1:035FKYIWjmULyFRBKPs8TBQoi0x6d9G4xc9neXJWAZQ=
github.com/hashicorp/go-cleanhttp v0.5.2/go.mod h1:kO/YDlP8L1346E6Sodw+PrpBSV4/SoxCXGY6BqNFT48=
github.com/hashicorp/go-hclog v1.6.3 h1:Qr2kF+eVWjTiYmU7Y31tYlP1h0q/X3Nl3tPGdaB11/k=
github.com/hashicorp/go-hclog v1.6.3/go.mod h1:W4Qnvbt70Wk/zYJryRzDRU/4r0kIg0PVHBcfoyhpF5M=
github.com/hashicorp/go-multierror v1.1.1 h1:H5DkEtf6CXdFp0N0Em5UCwQpXMWke8IA0+lD48awMYo=
github.com/hashicorp/go-multierror v1.1.1/go.mod h1:iw975J/qwKPdAO1clOe2L8331t/9/fmwbPZ6JB6eMoM=
github.com/hashicorp/go-retryablehttp v0.7.8 h1:ylXZWnqa7Lhqpk0L1P1LzDtGcCR0rPVUrx/c8Unxc48=
github.com/hashicorp/go-retryablehttp v0.7.8/go.mod h1:rjiScheydd+CxvumBsIrFKlx3iS0jrZ7LvzFGFmuKbw=
github.com/hashicorp/go-rootcerts v1.0.2 h1:jzhAVGtqPKbwpyCPELlgNWhE1znq+qwJtW5Oi2viEzc=
github.com/hashicorp/go-rootcerts v1.0.2/go.mod h1:pqUvnprVnM5bf7AOirdbb01K4ccR319Vf4pU3K5EGc8=
github.com/hashicorp/nomad/api v0.0.0-20260507064547-505b8f595ce4 h1:jRgobXGG/+ZsFRz8Iy0xB4OE7qBSw/8xR2kPF4AJz5s=
github.com/hashicorp/nomad/api v0.0.0-20260507064547-505b8f595ce4/go.mod h1:KkLNLU0Nyfh5jWsFoF/PsmMbKpRIAoIV4lmQoJWgKCk=
github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8=
github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw=
github.com/klauspost/compress v1.18.5 h1:/h1gH5Ce+VWNLSWqPzOVn6XBO+vJbCNGvjoaGBFW2IE=
Expand All @@ -85,6 +99,8 @@ github.com/mattn/go-colorable v0.1.13 h1:fFA4WZxdEF4tXPZVKMLwD8oUnCTTo08duU7wxec
github.com/mattn/go-colorable v0.1.13/go.mod h1:7S9/ev0klgBDR4GtXTXX8a3vIGJpMovkB8vQcUbaXHg=
github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
github.com/mitchellh/go-homedir v1.1.0 h1:lukF9ziXFxDFPkA1vsr5zpc1XuPDn/wFntq5mG+4E0Y=
github.com/mitchellh/go-homedir v1.1.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq1c1nUAm88MOHcQC9l5mIlSMApZMrHA=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ=
github.com/pelletier/go-toml/v2 v2.3.0 h1:k59bC/lIZREW0/iVaQR8nDHxVq8OVlIzYCOJf421CaM=
Expand All @@ -104,6 +120,8 @@ github.com/rogpeppe/go-internal v1.14.1/go.mod h1:MaRKkUm5W0goXpeCfT7UZI6fk/L7L7
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
github.com/sagikazarmark/locafero v0.12.0 h1:/NQhBAkUb4+fH1jivKHWusDYFjMOOKU88eegjfxfHb4=
github.com/sagikazarmark/locafero v0.12.0/go.mod h1:sZh36u/YSZ918v0Io+U9ogLYQJ9tLLBmM4eneO6WwsI=
github.com/shoenig/test v1.12.2 h1:ZVT8NeIUwGWpZcKaepPmFMoNQ3sVpxvqUh/MAqwFiJI=
github.com/shoenig/test v1.12.2/go.mod h1:UxJ6u/x2v/TNs/LoLxBNJRV9DiwBBKYxXSyczsBHFoI=
github.com/sirupsen/logrus v1.9.4 h1:TsZE7l11zFCLZnZ+teH4Umoq5BhEIfIzfRDZ1Uzql2w=
github.com/sirupsen/logrus v1.9.4/go.mod h1:ftWc9WdOfJ0a92nsE2jF5u5ZwH8Bv2zdeOC42RjbV2g=
github.com/spf13/afero v1.15.0 h1:b/YBCLWAJdFWJTN9cLhiXXcD7mzKn9Dm86dNnfyQw1I=
Expand Down
4 changes: 4 additions & 0 deletions src/lib/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,10 @@ func LoadConfig(configPath *string) (*CatteryConfig, error) {
var dc DockerTrayConfig
decodeError = mapstructure.Decode(trayType.Config, &dc)
trayType.Config = dc
case "nomad":
var nc NomadTrayConfig
decodeError = mapstructure.Decode(trayType.Config, &nc)
trayType.Config = nc
//case "scaleway":
default:

Expand Down
26 changes: 26 additions & 0 deletions src/lib/config/trayTypeConfig.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,29 @@ type DockerTrayConfig struct {
Image string `yaml:"image"`
NamePrefix string `yaml:"namePrefix"`
}

// NomadTrayConfig configures a Nomad-dispatched tray.
//
// JobId is the ID of a parameterized parent job already registered in Nomad.
// Resources, driver and constraints come from that job spec — Nomad does not
// allow overriding them at dispatch time. Use distinct parameterized jobs for
// distinct resource shapes.
//
// Script is an optional inline bash snippet inlined into the dispatched
// payload before the agent is exec'd. Use it for per-tray-type setup
// (mounting volumes, installing tools, etc.). Use YAML's `|` block scalar to
// embed multi-line scripts.
//
// RunnerFolder is the path inside the guest where the GitHub Actions runner
// distribution lives. The provider's default bootstrap passes it as the
// `--runner-folder` flag to `cattery agent` (which is required by the agent).
// Defaults to /cattery if empty. To take over the agent invocation entirely
// (e.g. when the image starts the agent itself via systemd), put your own
// `exec ...` at the end of Script — the default exec emitted afterwards
// becomes unreachable.
type NomadTrayConfig struct {
TrayConfig
JobId string `yaml:"jobId"`
Script string `yaml:"script"`
RunnerFolder string `yaml:"runnerFolder"`
}
6 changes: 6 additions & 0 deletions src/lib/trayManager/trayManager.go
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,12 @@ func (tm *TrayManager) CreateTray(ctx context.Context, trayType *config.TrayType
if err := provider.StartDeploy(ctx, tray); err != nil {
log.Errorf("Failed start deploy for tray %s: %v", tray.Id, err)
metrics.TrayProviderErrors(tray.GitHubOrgName, tray.ProviderName, tray.TrayTypeName, "create")
// Persist any provider data the failed StartDeploy populated (e.g.,
// nomad's parentJobId for leaked-child recovery) before DeleteTray
// reloads the row and dispatches CleanTray on it.
if _, pErr := tm.trayRepository.SetProviderData(ctx, tray.Id, tray.ProviderData); pErr != nil {
log.Errorf("Failed to persist provider data after start deploy error for tray %s: %v", tray.Id, pErr)
}
if _, dErr := tm.DeleteTray(ctx, tray.Id); dErr != nil {
log.Errorf("Failed to delete tray %s after start deploy error: %v", tray.Id, dErr)
}
Expand Down
Loading
Loading