Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,31 @@ Unreleased
``isp-full-generic`` platform variant when nodes lack a native load
balancer (cloud VMs, bare metal).

Unreleased
==========

Bugfixes
--------

- Prepare playbooks now enable
``device_ownership_from_security_context`` on the containerd CRI
plugin (k3s drop-in
``config-v3.toml.d/10-cozystack-cri.toml``). KubeVirt's CDI importer
writes disk images into raw block volumes as a non-root pod, which
requires containerd to chown the block device to the pod's
SecurityContext; k3s disables this by default. Without it the
importer failed with ``blockdev: cannot open /dev/cdi-block-volume:
Permission denied``, the ``DataVolume`` hung in ``ImportInProgress``,
and VMs referencing the disk stayed ``Pending``. Gated behind
``cozystack_enable_kubevirt``; drop-in directory overridable via
``cozystack_k3s_containerd_dropin_dir`` for containerd 1.x clusters.
Setting ``cozystack_enable_kubevirt`` to ``false`` removes a
previously written drop-in so the host state matches the toggle, and
the restart handler only restarts a k3s unit that is actually present
(a genuine restart failure now fails the play instead of being
silently ignored).


v1.4.0
======

Expand Down
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,3 +144,4 @@ The host only needs the kernel modules and, for KVM, a working `/dev/kvm`.
- **`br_netfilter` missing**: `net.bridge.bridge-nf-call-*` sysctls fail
with "No such file or directory". Load the module before applying the
sysctl.
- **containerd `device_ownership_from_security_context` disabled**: k3s ships it off; without the `config-v3.toml.d/10-cozystack-cri.toml` drop-in, KubeVirt's non-root CDI importer cannot open a raw block volume (`blockdev: cannot open /dev/cdi-block-volume: Permission denied`), the DataVolume hangs in `ImportInProgress`, and VMs that reference the disk stay Pending. Apply when KubeVirt is enabled (gated on `cozystack_enable_kubevirt`).
79 changes: 32 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,7 @@ Supported targets:

Cloud-image users **must** set `cozystack_flush_iptables: true` for multi-master k3s to bootstrap — Ubuntu cloud images ship with `REJECT icmp-host-prohibited` in INPUT that blocks etcd peer port 2380 between nodes. See **Node Prerequisites → Known limitations** below.

Deploys the Cozystack operator and Platform Package using the
`kubernetes.core.helm` module with automatic Helm and helm-diff
installation.
Deploys the Cozystack operator and Platform Package using the `kubernetes.core.helm` module with automatic Helm and helm-diff installation.

## Prerequisites

Expand All @@ -30,9 +28,7 @@ ansible-galaxy collection install --requirements-file requirements.yml

- SSH access to the target nodes

The role automatically installs Helm and the
[helm-diff](https://github.com/databus23/helm-diff) plugin
on the control-plane node. No manual Helm installation is needed.
The role automatically installs Helm and the [helm-diff](https://github.com/databus23/helm-diff) plugin on the control-plane node. No manual Helm installation is needed.

### Node Prerequisites

Expand Down Expand Up @@ -168,11 +164,25 @@ tun
kvm_intel # or kvm_amd depending on the CPU
```

#### Enabled by default: containerd device ownership for CDI block imports

When KubeVirt is enabled, the prepare playbook drops a containerd CRI config that sets `device_ownership_from_security_context = true`. KubeVirt's CDI (Containerized Data Importer) writes VM disk images into raw **block** volumes from a non-root importer pod; containerd only chowns the block device to the pod's `SecurityContext` UID/GID when this option is on, and k3s ships it disabled. Without it the importer fails with `blockdev: cannot open /dev/cdi-block-volume: Permission denied`, the `DataVolume` is stuck in `ImportInProgress`, and every VM that references the disk stays `Pending` — one of the silent "VMs stuck in Pending" failure modes called out above.

Written as a drop-in that containerd merges on top of k3s's generated `config.toml`:

```text
/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d/10-cozystack-cri.toml
```

`config-v3.toml.d` and the `io.containerd.cri.v1.runtime` plugin table are the containerd 2.x (config version 3) paths shipped by current k3s (the example inventories pin `k3s_version: v1.36.1+k3s1`), and the drop-in content is hardcoded for that — `version = 3` and the v3 table. `cozystack_k3s_containerd_dropin_dir` only relocates the file; it does not rewrite the content. So on a containerd 1.x cluster (older k3s) this drop-in does not apply as-is — write your own under `config.toml.d/` with `version = 2` and the `io.containerd.grpc.v1.cri` table. The drop-in is read at first k3s start in the full pipeline; on a re-run against a running cluster a handler restarts k3s so the change takes effect.

k3s also exposes a native `--nonroot-devices` flag (valid on both server and agent) that sets the same containerd option. This collection uses the config drop-in instead because it applies uniformly to every node in the `cluster` group — including agent/worker nodes, for which the example playbooks do not wire `extra_agent_args` — and because it can be applied to an already-running cluster, which an install-time k3s flag cannot.

The restart handler only fires when the drop-in is first created or its content changes; idempotent re-runs leave k3s untouched. When it does fire, `systemctl restart k3s` (or `k3s-agent`) briefly disrupts the control plane and the node's workloads on that host, so apply such a change in a maintenance window rather than casually mid-day.

#### Known limitations

ZFS support depends on the OS ecosystem and kernel flavor. The prepare
playbooks skip ZFS automation gracefully in these cases and emit an
informational notice:
ZFS support depends on the OS ecosystem and kernel flavor. The prepare playbooks skip ZFS automation gracefully in these cases and emit an informational notice:

| OS / kernel | ZFS automation | Reason |
| --- | --- | --- |
Expand Down Expand Up @@ -213,9 +223,7 @@ Enable and start:

#### iptables (cloud providers)

Cloud providers (OCI, AWS, GCP) may ship images with restrictive iptables
INPUT rules that block inter-node Kubernetes traffic (API 6443, kubelet 10250,
etcd 2379-2380) even when security groups allow it.
Cloud providers (OCI, AWS, GCP) may ship images with restrictive iptables INPUT rules that block inter-node Kubernetes traffic (API 6443, kubelet 10250, etcd 2379-2380) even when security groups allow it.

Fix: flush the INPUT chain and set policy to ACCEPT before deploying k3s.

Expand Down Expand Up @@ -249,11 +257,7 @@ cluster-cidr: 10.42.0.0/16
service-cidr: 10.43.0.0/16
```

These CIDRs are the k3s defaults. The example prepare playbooks
(e.g., `examples/ubuntu/prepare-ubuntu.yml`) set them via the
`server_config_yaml` variable used by `k3s.orchestration`. The role
variables `cozystack_pod_cidr` and `cozystack_svc_cidr` must match —
they default to the same values.
These CIDRs are the k3s defaults. The example prepare playbooks (e.g., `examples/ubuntu/prepare-ubuntu.yml`) set them via the `server_config_yaml` variable used by `k3s.orchestration`. The role variables `cozystack_pod_cidr` and `cozystack_svc_cidr` must match — they default to the same values.

## Installation

Expand All @@ -273,8 +277,7 @@ collections:

## Quick start

1. Create your environment (pick your distro — see `examples/ubuntu/`,
`examples/rhel/`, or `examples/suse/`):
1. Create your environment (pick your distro — see `examples/ubuntu/`, `examples/rhel/`, or `examples/suse/`):

```text
my-env/
Expand Down Expand Up @@ -314,9 +317,7 @@ Both stages are handled automatically by the `cozystack` role.

## Role: cozystack.installer.cozystack

Installs Cozystack via the official `cozy-installer` Helm chart using
the `kubernetes.core.helm` module with automatic Helm and helm-diff
installation.
Installs Cozystack via the official `cozy-installer` Helm chart using the `kubernetes.core.helm` module with automatic Helm and helm-diff installation.

Runs on `server[0]` only.

Expand Down Expand Up @@ -353,14 +354,13 @@ Runs on `server[0]` only.

### Example playbook variables

These variables are consumed only by the example prepare playbooks in
`examples/*/`, not by the role itself. Set them as inventory host/group
vars to opt out of the corresponding prepare step:
These variables are consumed only by the example prepare playbooks in `examples/*/`, not by the role itself. Set them as inventory host/group vars to opt out of the corresponding prepare step:

| Variable | Default | Description |
| --- | --- | --- |
| `cozystack_enable_zfs` | `true` | Example playbooks: install ZFS userspace and load the module. Set `false` to skip. |
| `cozystack_enable_kubevirt` | `true` | Example playbooks: load KubeVirt kernel modules. Set `false` to skip. |
| `cozystack_enable_kubevirt` | `true` | Example playbooks: load KubeVirt kernel modules **and** install the containerd `device_ownership_from_security_context` drop-in for CDI block imports. Set `false` to skip both. |
| `cozystack_k3s_containerd_dropin_dir` | `/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d` | Example playbooks: directory for the containerd CRI drop-in (gated on `cozystack_enable_kubevirt`). Only relocates the file — the drop-in content is hardcoded for containerd 2.x (config v3); a containerd 1.x cluster needs a hand-written `config.toml.d` drop-in instead. |
| `cozystack_flush_iptables` | `false` | Example playbooks: flush the iptables INPUT chain before k3s installs. Set `true` on Ubuntu/Debian cloud images (OCI/AWS/GCP) where the default INPUT chain ends with `REJECT icmp-host-prohibited` and blocks k3s inter-node ports 2380/6443. |
| `cozystack_zfs_release_rpm_extra` | `{}` | `examples/rhel/` only: merged on top of the built-in `cozystack_zfs_release_rpm_by_major` dict, so you can add (or override) a single EL-major → OpenZFS release RPM entry from inventory without wiping the base dict. Example: `{"10": "https://zfsonlinux.org/epel/zfs-release-X-Y.el10.noarch.rpm"}` once upstream ships one. |
| `cozystack_enable_drbd_dkms` | `true` | `examples/ubuntu/` only: install `drbd-dkms` from the LINBIT PPA on Ubuntu LTS 22.04 / 24.04 hosts so DRBD's kernel module is signed via dkms+shim under Secure Boot. Set `false` on Talos hosts (Talos ships pre-signed DRBD modules in extensions) or where Secure Boot is disabled and the in-cluster compile path is preferred. The toggle stops *future* installs but does NOT undo a prior install — manually `apt purge drbd-dkms` and remove the LINBIT entry from `/etc/apt/sources.list.d/` if you flipped to `false` after a successful run. |
Expand All @@ -371,8 +371,7 @@ vars to opt out of the corresponding prepare step:

This collection is designed to work alongside [k3s.orchestration](https://github.com/k3s-io/k3s-ansible). The inventory structure (groups: `cluster`, `server`, `agent`) is fully compatible.

Example full pipeline (`site.yml`) — see `examples/ubuntu/`, `examples/rhel/`,
or `examples/suse/`:
Example full pipeline (`site.yml`) — see `examples/ubuntu/`, `examples/rhel/`, or `examples/suse/`:

```yaml
- name: Prepare nodes
Expand All @@ -393,12 +392,9 @@ On cloud providers with NAT (OCI, AWS, GCP), nodes have internal IPs different f

### Multi-master setup (kube-ovn RAFT)

Kube-ovn requires `MASTER_NODES` — a comma-separated list of all
control-plane node IPs for OVN RAFT consensus. By default, the role
auto-detects these IPs from the `server` inventory group host keys.
Kube-ovn requires `MASTER_NODES` — a comma-separated list of all control-plane node IPs for OVN RAFT consensus. By default, the role auto-detects these IPs from the `server` inventory group host keys.

This works when host keys are internal IPs (the recommended inventory
pattern):
This works when host keys are internal IPs (the recommended inventory pattern):

```yaml
server:
Expand All @@ -409,30 +405,19 @@ server:
ansible_host: 203.0.113.11
```

If your inventory uses hostnames or non-IP host keys, set
`cozystack_master_nodes` explicitly:
If your inventory uses hostnames or non-IP host keys, set `cozystack_master_nodes` explicitly:

```yaml
cozystack_master_nodes: "10.0.0.10,10.0.0.11,10.0.0.12"
```

### Automatic Helm installation

The role installs Helm and the
[helm-diff](https://github.com/databus23/helm-diff) plugin on the
target node automatically. The `helm-diff` plugin enables true
idempotency — repeated runs report no changes when the release is
already up to date.
The role installs Helm and the [helm-diff](https://github.com/databus23/helm-diff) plugin on the target node automatically. The `helm-diff` plugin enables true idempotency — repeated runs report no changes when the release is already up to date.

### Customizing variables

The example prepare playbooks define internal variables (like
`cozystack_k3s_server_args`) in the play `vars` section. User-facing
variables such as `cozystack_k3s_extra_args` and
`cozystack_flush_iptables` should be set **in the inventory**, not in
the playbook. Ansible play `vars` take precedence over inventory
variables, so defining them in both places causes the inventory values
to be silently ignored.
The example prepare playbooks define internal variables (like `cozystack_k3s_server_args`) in the play `vars` section. User-facing variables such as `cozystack_k3s_extra_args` and `cozystack_flush_iptables` should be set **in the inventory**, not in the playbook. Ansible play `vars` take precedence over inventory variables, so defining them in both places causes the inventory values to be silently ignored.

### Idempotency

Expand Down
71 changes: 71 additions & 0 deletions examples/rhel/prepare-rhel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,29 @@
state: restarted
failed_when: false # tolerated: same reason as the enable task below

# Refresh service facts on the same notify topic so the restart
# handler below sees the current unit set. Defined first, so it runs
# first (handlers fire in definition order, not notify order).
- name: Refresh service facts before k3s restart
ansible.builtin.service_facts:
listen: Restart k3s to apply containerd config

- name: Restart k3s to apply containerd config
ansible.builtin.systemd:
name: "{{ item }}"
state: restarted
loop:
- k3s
- k3s-agent
# Restart only the unit that exists on this node: a server runs
# k3s, an agent runs k3s-agent, and on a full-pipeline run neither
# exists yet when prepare runs (the drop-in is read at first k3s
# start instead). service_facts keys systemd units with the
# .service suffix. A unit that IS present but fails to restart
# still fails the play — a malformed drop-in or a k3s that will not
# come back is surfaced, not masked by failed_when: false.
when: (item ~ '.service') in ansible_facts.services

tasks:
- name: Create k3s_cluster group for k3s.orchestration
ansible.builtin.group_by:
Expand Down Expand Up @@ -188,6 +211,54 @@
| map(attribute='item')
| list }}

# CDI (Containerized Data Importer) streams VM disk images into raw
# block volumes from a NON-root importer pod. containerd only chowns
# the block device to the pod's SecurityContext UID/GID when
# device_ownership_from_security_context is enabled on the CRI
# plugin, and k3s ships it disabled. Without it the importer dies
# with "blockdev: cannot open /dev/cdi-block-volume: Permission
# denied", the DataVolume hangs in ImportInProgress, and every VM
# that references the disk stays Pending.
#
# The drop-in is merged by containerd on top of k3s's generated
# config.toml via the config-v3.toml.d import glob — read at first
# k3s start (full pipeline) or applied by the handler on re-runs
# against a running cluster. config-v3.toml.d and
# io.containerd.cri.v1.runtime are the containerd 2.x (config
# version 3) paths shipped by current k3s; override
# cozystack_k3s_containerd_dropin_dir for a containerd 1.x cluster.
- name: Ensure k3s containerd config drop-in directory exists
ansible.builtin.file:
path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}"
state: directory
mode: "0755"
when: cozystack_enable_kubevirt | default(true) | bool

- name: Enable device_ownership_from_security_context for CDI block imports
ansible.builtin.copy:
dest: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml"
mode: "0644"
content: |
version = 3

[plugins.'io.containerd.cri.v1.runtime']
device_ownership_from_security_context = true
Comment thread
coderabbitai[bot] marked this conversation as resolved.
when: cozystack_enable_kubevirt | default(true) | bool
notify: Restart k3s to apply containerd config
Comment thread
coderabbitai[bot] marked this conversation as resolved.

# Reverse the drop-in when KubeVirt is turned off: a host that
# carried 10-cozystack-cri.toml from an earlier enabled run would
# otherwise keep device_ownership_from_security_context on, so the
# host state no longer matches the toggle. Removal notifies the
# restart handler so a running cluster drops the setting too. (No-op
# when the file was never written — file: absent reports unchanged.)
- name: Remove containerd CDI drop-in when KubeVirt is disabled
ansible.builtin.file:
path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml"
state: absent
when: not (cozystack_enable_kubevirt | default(true) | bool)
notify: Restart k3s to apply containerd config

- name: Ensure multipath drop-in directory exists
ansible.builtin.file:
path: /etc/multipath/conf.d
Expand Down
Loading