diff --git a/CHANGELOG.rst b/CHANGELOG.rst index 6e4428d..a65571a 100644 --- a/CHANGELOG.rst +++ b/CHANGELOG.rst @@ -19,6 +19,34 @@ Unreleased ``isp-full-generic`` platform variant when nodes lack a native load balancer (cloud VMs, bare metal). +Unreleased +========== + +Bugfixes +-------- + +- Prepare playbooks now enable + ``device_ownership_from_security_context`` on the containerd CRI + plugin (k3s drop-in + ``config-v3.toml.d/10-cozystack-cri.toml``). KubeVirt's CDI importer + writes disk images into raw block volumes as a non-root pod, which + requires containerd to chown the block device to the pod's + SecurityContext; k3s disables this by default. Without it the + importer failed with ``blockdev: cannot open /dev/cdi-block-volume: + Permission denied``, the ``DataVolume`` hung in ``ImportInProgress``, + and VMs referencing the disk stayed ``Pending``. Gated behind + ``cozystack_enable_kubevirt``; drop-in directory overridable via + ``cozystack_k3s_containerd_dropin_dir`` (relocates the file only — the + content is hardcoded for containerd 2.x / config version 3 as shipped + by current k3s; a containerd 1.x cluster needs a hand-written + ``config.toml.d`` drop-in instead). + Setting ``cozystack_enable_kubevirt`` to ``false`` removes a + previously written drop-in so the host state matches the toggle, and + the restart handler only restarts a k3s unit that is actually present + (a genuine restart failure now fails the play instead of being + silently ignored). + + v1.4.0 ====== diff --git a/CLAUDE.md b/CLAUDE.md index 40647b6..1f9b1c6 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -144,3 +144,4 @@ The host only needs the kernel modules and, for KVM, a working `/dev/kvm`. - **`br_netfilter` missing**: `net.bridge.bridge-nf-call-*` sysctls fail with "No such file or directory". Load the module before applying the sysctl. +- **containerd `device_ownership_from_security_context` disabled**: k3s ships it off; without the `config-v3.toml.d/10-cozystack-cri.toml` drop-in, KubeVirt's non-root CDI importer cannot open a raw block volume (`blockdev: cannot open /dev/cdi-block-volume: Permission denied`), the DataVolume hangs in `ImportInProgress`, and VMs that reference the disk stay Pending. Apply when KubeVirt is enabled (gated on `cozystack_enable_kubevirt`). diff --git a/README.md b/README.md index 5bf6a19..d3449d4 100644 --- a/README.md +++ b/README.md @@ -12,9 +12,7 @@ Supported targets: Cloud-image users **must** set `cozystack_flush_iptables: true` for multi-master k3s to bootstrap — Ubuntu cloud images ship with `REJECT icmp-host-prohibited` in INPUT that blocks etcd peer port 2380 between nodes. See **Node Prerequisites → Known limitations** below. -Deploys the Cozystack operator and Platform Package using the -`kubernetes.core.helm` module with automatic Helm and helm-diff -installation. +Deploys the Cozystack operator and Platform Package using the `kubernetes.core.helm` module with automatic Helm and helm-diff installation. ## Prerequisites @@ -30,9 +28,7 @@ ansible-galaxy collection install --requirements-file requirements.yml - SSH access to the target nodes -The role automatically installs Helm and the -[helm-diff](https://github.com/databus23/helm-diff) plugin -on the control-plane node. No manual Helm installation is needed. +The role automatically installs Helm and the [helm-diff](https://github.com/databus23/helm-diff) plugin on the control-plane node. No manual Helm installation is needed. ### Node Prerequisites @@ -168,11 +164,25 @@ tun kvm_intel # or kvm_amd depending on the CPU ``` +#### Enabled by default: containerd device ownership for CDI block imports + +When KubeVirt is enabled, the prepare playbook drops a containerd CRI config that sets `device_ownership_from_security_context = true`. KubeVirt's CDI (Containerized Data Importer) writes VM disk images into raw **block** volumes from a non-root importer pod; containerd only chowns the block device to the pod's `SecurityContext` UID/GID when this option is on, and k3s ships it disabled. Without it the importer fails with `blockdev: cannot open /dev/cdi-block-volume: Permission denied`, the `DataVolume` is stuck in `ImportInProgress`, and every VM that references the disk stays `Pending` — one of the silent "VMs stuck in Pending" failure modes called out above. + +Written as a drop-in that containerd merges on top of k3s's generated `config.toml`: + +```text +/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d/10-cozystack-cri.toml +``` + +`config-v3.toml.d` and the `io.containerd.cri.v1.runtime` plugin table are the containerd 2.x (config version 3) paths shipped by current k3s (the example inventories pin `k3s_version: v1.36.1+k3s1`), and the drop-in content is hardcoded for that — `version = 3` and the v3 table. `cozystack_k3s_containerd_dropin_dir` only relocates the file; it does not rewrite the content. So on a containerd 1.x cluster (older k3s) this drop-in does not apply as-is — write your own under `config.toml.d/` with `version = 2` and the `io.containerd.grpc.v1.cri` table. The drop-in is read at first k3s start in the full pipeline; on a re-run against a running cluster a handler restarts k3s so the change takes effect. + +k3s also exposes a native `--nonroot-devices` flag (valid on both server and agent) that sets the same containerd option. This collection uses the config drop-in instead because it applies uniformly to every node in the `cluster` group — including agent/worker nodes, for which the example playbooks do not wire `extra_agent_args` — and because it can be applied to an already-running cluster, which an install-time k3s flag cannot. + +The restart handler only fires when the drop-in is first created or its content changes; idempotent re-runs leave k3s untouched. When it does fire, `systemctl restart k3s` (or `k3s-agent`) briefly disrupts the control plane and the node's workloads on that host, so apply such a change in a maintenance window rather than casually mid-day. + #### Known limitations -ZFS support depends on the OS ecosystem and kernel flavor. The prepare -playbooks skip ZFS automation gracefully in these cases and emit an -informational notice: +ZFS support depends on the OS ecosystem and kernel flavor. The prepare playbooks skip ZFS automation gracefully in these cases and emit an informational notice: | OS / kernel | ZFS automation | Reason | | --- | --- | --- | @@ -213,9 +223,7 @@ Enable and start: #### iptables (cloud providers) -Cloud providers (OCI, AWS, GCP) may ship images with restrictive iptables -INPUT rules that block inter-node Kubernetes traffic (API 6443, kubelet 10250, -etcd 2379-2380) even when security groups allow it. +Cloud providers (OCI, AWS, GCP) may ship images with restrictive iptables INPUT rules that block inter-node Kubernetes traffic (API 6443, kubelet 10250, etcd 2379-2380) even when security groups allow it. Fix: flush the INPUT chain and set policy to ACCEPT before deploying k3s. @@ -249,11 +257,7 @@ cluster-cidr: 10.42.0.0/16 service-cidr: 10.43.0.0/16 ``` -These CIDRs are the k3s defaults. The example prepare playbooks -(e.g., `examples/ubuntu/prepare-ubuntu.yml`) set them via the -`server_config_yaml` variable used by `k3s.orchestration`. The role -variables `cozystack_pod_cidr` and `cozystack_svc_cidr` must match — -they default to the same values. +These CIDRs are the k3s defaults. The example prepare playbooks (e.g., `examples/ubuntu/prepare-ubuntu.yml`) set them via the `server_config_yaml` variable used by `k3s.orchestration`. The role variables `cozystack_pod_cidr` and `cozystack_svc_cidr` must match — they default to the same values. ## Installation @@ -273,8 +277,7 @@ collections: ## Quick start -1. Create your environment (pick your distro — see `examples/ubuntu/`, - `examples/rhel/`, or `examples/suse/`): +1. Create your environment (pick your distro — see `examples/ubuntu/`, `examples/rhel/`, or `examples/suse/`): ```text my-env/ @@ -314,9 +317,7 @@ Both stages are handled automatically by the `cozystack` role. ## Role: cozystack.installer.cozystack -Installs Cozystack via the official `cozy-installer` Helm chart using -the `kubernetes.core.helm` module with automatic Helm and helm-diff -installation. +Installs Cozystack via the official `cozy-installer` Helm chart using the `kubernetes.core.helm` module with automatic Helm and helm-diff installation. Runs on `server[0]` only. @@ -353,14 +354,13 @@ Runs on `server[0]` only. ### Example playbook variables -These variables are consumed only by the example prepare playbooks in -`examples/*/`, not by the role itself. Set them as inventory host/group -vars to opt out of the corresponding prepare step: +These variables are consumed only by the example prepare playbooks in `examples/*/`, not by the role itself. Set them as inventory host/group vars to opt out of the corresponding prepare step: | Variable | Default | Description | | --- | --- | --- | | `cozystack_enable_zfs` | `true` | Example playbooks: install ZFS userspace and load the module. Set `false` to skip. | -| `cozystack_enable_kubevirt` | `true` | Example playbooks: load KubeVirt kernel modules. Set `false` to skip. | +| `cozystack_enable_kubevirt` | `true` | Example playbooks: load KubeVirt kernel modules **and** install the containerd `device_ownership_from_security_context` drop-in for CDI block imports. Set `false` to skip both. | +| `cozystack_k3s_containerd_dropin_dir` | `/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d` | Example playbooks: directory for the containerd CRI drop-in (gated on `cozystack_enable_kubevirt`). Only relocates the file — the drop-in content is hardcoded for containerd 2.x (config v3); a containerd 1.x cluster needs a hand-written `config.toml.d` drop-in instead. | | `cozystack_flush_iptables` | `false` | Example playbooks: flush the iptables INPUT chain before k3s installs. Set `true` on Ubuntu/Debian cloud images (OCI/AWS/GCP) where the default INPUT chain ends with `REJECT icmp-host-prohibited` and blocks k3s inter-node ports 2380/6443. | | `cozystack_zfs_release_rpm_extra` | `{}` | `examples/rhel/` only: merged on top of the built-in `cozystack_zfs_release_rpm_by_major` dict, so you can add (or override) a single EL-major → OpenZFS release RPM entry from inventory without wiping the base dict. Example: `{"10": "https://zfsonlinux.org/epel/zfs-release-X-Y.el10.noarch.rpm"}` once upstream ships one. | | `cozystack_enable_drbd_dkms` | `true` | `examples/ubuntu/` only: install `drbd-dkms` from the LINBIT PPA on Ubuntu LTS 22.04 / 24.04 hosts so DRBD's kernel module is signed via dkms+shim under Secure Boot. Set `false` on Talos hosts (Talos ships pre-signed DRBD modules in extensions) or where Secure Boot is disabled and the in-cluster compile path is preferred. The toggle stops *future* installs but does NOT undo a prior install — manually `apt purge drbd-dkms` and remove the LINBIT entry from `/etc/apt/sources.list.d/` if you flipped to `false` after a successful run. | @@ -371,8 +371,7 @@ vars to opt out of the corresponding prepare step: This collection is designed to work alongside [k3s.orchestration](https://github.com/k3s-io/k3s-ansible). The inventory structure (groups: `cluster`, `server`, `agent`) is fully compatible. -Example full pipeline (`site.yml`) — see `examples/ubuntu/`, `examples/rhel/`, -or `examples/suse/`: +Example full pipeline (`site.yml`) — see `examples/ubuntu/`, `examples/rhel/`, or `examples/suse/`: ```yaml - name: Prepare nodes @@ -393,12 +392,9 @@ On cloud providers with NAT (OCI, AWS, GCP), nodes have internal IPs different f ### Multi-master setup (kube-ovn RAFT) -Kube-ovn requires `MASTER_NODES` — a comma-separated list of all -control-plane node IPs for OVN RAFT consensus. By default, the role -auto-detects these IPs from the `server` inventory group host keys. +Kube-ovn requires `MASTER_NODES` — a comma-separated list of all control-plane node IPs for OVN RAFT consensus. By default, the role auto-detects these IPs from the `server` inventory group host keys. -This works when host keys are internal IPs (the recommended inventory -pattern): +This works when host keys are internal IPs (the recommended inventory pattern): ```yaml server: @@ -409,8 +405,7 @@ server: ansible_host: 203.0.113.11 ``` -If your inventory uses hostnames or non-IP host keys, set -`cozystack_master_nodes` explicitly: +If your inventory uses hostnames or non-IP host keys, set `cozystack_master_nodes` explicitly: ```yaml cozystack_master_nodes: "10.0.0.10,10.0.0.11,10.0.0.12" @@ -418,21 +413,11 @@ cozystack_master_nodes: "10.0.0.10,10.0.0.11,10.0.0.12" ### Automatic Helm installation -The role installs Helm and the -[helm-diff](https://github.com/databus23/helm-diff) plugin on the -target node automatically. The `helm-diff` plugin enables true -idempotency — repeated runs report no changes when the release is -already up to date. +The role installs Helm and the [helm-diff](https://github.com/databus23/helm-diff) plugin on the target node automatically. The `helm-diff` plugin enables true idempotency — repeated runs report no changes when the release is already up to date. ### Customizing variables -The example prepare playbooks define internal variables (like -`cozystack_k3s_server_args`) in the play `vars` section. User-facing -variables such as `cozystack_k3s_extra_args` and -`cozystack_flush_iptables` should be set **in the inventory**, not in -the playbook. Ansible play `vars` take precedence over inventory -variables, so defining them in both places causes the inventory values -to be silently ignored. +The example prepare playbooks define internal variables (like `cozystack_k3s_server_args`) in the play `vars` section. User-facing variables such as `cozystack_k3s_extra_args` and `cozystack_flush_iptables` should be set **in the inventory**, not in the playbook. Ansible play `vars` take precedence over inventory variables, so defining them in both places causes the inventory values to be silently ignored. ### Idempotency diff --git a/examples/rhel/prepare-rhel.yml b/examples/rhel/prepare-rhel.yml index f37b01a..479d48f 100644 --- a/examples/rhel/prepare-rhel.yml +++ b/examples/rhel/prepare-rhel.yml @@ -122,6 +122,29 @@ state: restarted failed_when: false # tolerated: same reason as the enable task below + # Refresh service facts on the same notify topic so the restart + # handler below sees the current unit set. Defined first, so it runs + # first (handlers fire in definition order, not notify order). + - name: Refresh service facts before k3s restart + ansible.builtin.service_facts: + listen: Restart k3s to apply containerd config + + - name: Restart k3s to apply containerd config + ansible.builtin.systemd: + name: "{{ item }}" + state: restarted + loop: + - k3s + - k3s-agent + # Restart only the unit that exists on this node: a server runs + # k3s, an agent runs k3s-agent, and on a full-pipeline run neither + # exists yet when prepare runs (the drop-in is read at first k3s + # start instead). service_facts keys systemd units with the + # .service suffix. A unit that IS present but fails to restart + # still fails the play — a malformed drop-in or a k3s that will not + # come back is surfaced, not masked by failed_when: false. + when: (item ~ '.service') in ansible_facts.services + tasks: - name: Create k3s_cluster group for k3s.orchestration ansible.builtin.group_by: @@ -188,6 +211,58 @@ | map(attribute='item') | list }} + # CDI (Containerized Data Importer) streams VM disk images into raw + # block volumes from a NON-root importer pod. containerd only chowns + # the block device to the pod's SecurityContext UID/GID when + # device_ownership_from_security_context is enabled on the CRI + # plugin, and k3s ships it disabled. Without it the importer dies + # with "blockdev: cannot open /dev/cdi-block-volume: Permission + # denied", the DataVolume hangs in ImportInProgress, and every VM + # that references the disk stays Pending. + # + # The drop-in is merged by containerd on top of k3s's generated + # config.toml via the config-v3.toml.d import glob — read at first + # k3s start (full pipeline) or applied by the handler on re-runs + # against a running cluster. config-v3.toml.d and + # io.containerd.cri.v1.runtime are the containerd 2.x (config + # version 3) paths shipped by current k3s, and the content is + # hardcoded for that schema. cozystack_k3s_containerd_dropin_dir + # only relocates the file (e.g. a non-default k3s data-dir); it does + # not rewrite the content, so a containerd 1.x cluster needs a + # hand-written config.toml.d drop-in (version = 2, + # io.containerd.grpc.v1.cri) instead. + - name: Ensure k3s containerd config drop-in directory exists + ansible.builtin.file: + path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}" + state: directory + mode: "0755" + when: cozystack_enable_kubevirt | default(true) | bool + + - name: Enable device_ownership_from_security_context for CDI block imports + ansible.builtin.copy: + dest: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml" + mode: "0644" + content: | + version = 3 + + [plugins.'io.containerd.cri.v1.runtime'] + device_ownership_from_security_context = true + when: cozystack_enable_kubevirt | default(true) | bool + notify: Restart k3s to apply containerd config + + # Reverse the drop-in when KubeVirt is turned off: a host that + # carried 10-cozystack-cri.toml from an earlier enabled run would + # otherwise keep device_ownership_from_security_context on, so the + # host state no longer matches the toggle. Removal notifies the + # restart handler so a running cluster drops the setting too. (No-op + # when the file was never written — file: absent reports unchanged.) + - name: Remove containerd CDI drop-in when KubeVirt is disabled + ansible.builtin.file: + path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml" + state: absent + when: not (cozystack_enable_kubevirt | default(true) | bool) + notify: Restart k3s to apply containerd config + - name: Ensure multipath drop-in directory exists ansible.builtin.file: path: /etc/multipath/conf.d diff --git a/examples/suse/prepare-suse.yml b/examples/suse/prepare-suse.yml index a6e3b91..9d91722 100644 --- a/examples/suse/prepare-suse.yml +++ b/examples/suse/prepare-suse.yml @@ -117,6 +117,29 @@ state: restarted failed_when: false # tolerated: same reason as the enable task below + # Refresh service facts on the same notify topic so the restart + # handler below sees the current unit set. Defined first, so it runs + # first (handlers fire in definition order, not notify order). + - name: Refresh service facts before k3s restart + ansible.builtin.service_facts: + listen: Restart k3s to apply containerd config + + - name: Restart k3s to apply containerd config + ansible.builtin.systemd: + name: "{{ item }}" + state: restarted + loop: + - k3s + - k3s-agent + # Restart only the unit that exists on this node: a server runs + # k3s, an agent runs k3s-agent, and on a full-pipeline run neither + # exists yet when prepare runs (the drop-in is read at first k3s + # start instead). service_facts keys systemd units with the + # .service suffix. A unit that IS present but fails to restart + # still fails the play — a malformed drop-in or a k3s that will not + # come back is surfaced, not masked by failed_when: false. + when: (item ~ '.service') in ansible_facts.services + tasks: - name: Create k3s_cluster group for k3s.orchestration ansible.builtin.group_by: @@ -183,6 +206,58 @@ | map(attribute='item') | list }} + # CDI (Containerized Data Importer) streams VM disk images into raw + # block volumes from a NON-root importer pod. containerd only chowns + # the block device to the pod's SecurityContext UID/GID when + # device_ownership_from_security_context is enabled on the CRI + # plugin, and k3s ships it disabled. Without it the importer dies + # with "blockdev: cannot open /dev/cdi-block-volume: Permission + # denied", the DataVolume hangs in ImportInProgress, and every VM + # that references the disk stays Pending. + # + # The drop-in is merged by containerd on top of k3s's generated + # config.toml via the config-v3.toml.d import glob — read at first + # k3s start (full pipeline) or applied by the handler on re-runs + # against a running cluster. config-v3.toml.d and + # io.containerd.cri.v1.runtime are the containerd 2.x (config + # version 3) paths shipped by current k3s, and the content is + # hardcoded for that schema. cozystack_k3s_containerd_dropin_dir + # only relocates the file (e.g. a non-default k3s data-dir); it does + # not rewrite the content, so a containerd 1.x cluster needs a + # hand-written config.toml.d drop-in (version = 2, + # io.containerd.grpc.v1.cri) instead. + - name: Ensure k3s containerd config drop-in directory exists + ansible.builtin.file: + path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}" + state: directory + mode: "0755" + when: cozystack_enable_kubevirt | default(true) | bool + + - name: Enable device_ownership_from_security_context for CDI block imports + ansible.builtin.copy: + dest: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml" + mode: "0644" + content: | + version = 3 + + [plugins.'io.containerd.cri.v1.runtime'] + device_ownership_from_security_context = true + when: cozystack_enable_kubevirt | default(true) | bool + notify: Restart k3s to apply containerd config + + # Reverse the drop-in when KubeVirt is turned off: a host that + # carried 10-cozystack-cri.toml from an earlier enabled run would + # otherwise keep device_ownership_from_security_context on, so the + # host state no longer matches the toggle. Removal notifies the + # restart handler so a running cluster drops the setting too. (No-op + # when the file was never written — file: absent reports unchanged.) + - name: Remove containerd CDI drop-in when KubeVirt is disabled + ansible.builtin.file: + path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml" + state: absent + when: not (cozystack_enable_kubevirt | default(true) | bool) + notify: Restart k3s to apply containerd config + - name: Ensure multipath drop-in directory exists ansible.builtin.file: path: /etc/multipath/conf.d diff --git a/examples/ubuntu/prepare-ubuntu.yml b/examples/ubuntu/prepare-ubuntu.yml index dea9fbc..cb803db 100644 --- a/examples/ubuntu/prepare-ubuntu.yml +++ b/examples/ubuntu/prepare-ubuntu.yml @@ -138,6 +138,29 @@ # IS consulted downstream.) failed_when: false + # Refresh service facts on the same notify topic so the restart + # handler below sees the current unit set. Defined first, so it runs + # first (handlers fire in definition order, not notify order). + - name: Refresh service facts before k3s restart + ansible.builtin.service_facts: + listen: Restart k3s to apply containerd config + + - name: Restart k3s to apply containerd config + ansible.builtin.systemd: + name: "{{ item }}" + state: restarted + loop: + - k3s + - k3s-agent + # Restart only the unit that exists on this node: a server runs + # k3s, an agent runs k3s-agent, and on a full-pipeline run neither + # exists yet when prepare runs (the drop-in is read at first k3s + # start instead). service_facts keys systemd units with the + # .service suffix. A unit that IS present but fails to restart + # still fails the play — a malformed drop-in or a k3s that will not + # come back is surfaced, not masked by failed_when: false. + when: (item ~ '.service') in ansible_facts.services + tasks: - name: Create k3s_cluster group for k3s.orchestration ansible.builtin.group_by: @@ -229,6 +252,58 @@ | map(attribute='item') | list }} + # CDI (Containerized Data Importer) streams VM disk images into raw + # block volumes from a NON-root importer pod. containerd only chowns + # the block device to the pod's SecurityContext UID/GID when + # device_ownership_from_security_context is enabled on the CRI + # plugin, and k3s ships it disabled. Without it the importer dies + # with "blockdev: cannot open /dev/cdi-block-volume: Permission + # denied", the DataVolume hangs in ImportInProgress, and every VM + # that references the disk stays Pending. + # + # The drop-in is merged by containerd on top of k3s's generated + # config.toml via the config-v3.toml.d import glob — read at first + # k3s start (full pipeline) or applied by the handler on re-runs + # against a running cluster. config-v3.toml.d and + # io.containerd.cri.v1.runtime are the containerd 2.x (config + # version 3) paths shipped by current k3s, and the content is + # hardcoded for that schema. cozystack_k3s_containerd_dropin_dir + # only relocates the file (e.g. a non-default k3s data-dir); it does + # not rewrite the content, so a containerd 1.x cluster needs a + # hand-written config.toml.d drop-in (version = 2, + # io.containerd.grpc.v1.cri) instead. + - name: Ensure k3s containerd config drop-in directory exists + ansible.builtin.file: + path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}" + state: directory + mode: "0755" + when: cozystack_enable_kubevirt | default(true) | bool + + - name: Enable device_ownership_from_security_context for CDI block imports + ansible.builtin.copy: + dest: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml" + mode: "0644" + content: | + version = 3 + + [plugins.'io.containerd.cri.v1.runtime'] + device_ownership_from_security_context = true + when: cozystack_enable_kubevirt | default(true) | bool + notify: Restart k3s to apply containerd config + + # Reverse the drop-in when KubeVirt is turned off: a host that + # carried 10-cozystack-cri.toml from an earlier enabled run would + # otherwise keep device_ownership_from_security_context on, so the + # host state no longer matches the toggle. Removal notifies the + # restart handler so a running cluster drops the setting too. (No-op + # when the file was never written — file: absent reports unchanged.) + - name: Remove containerd CDI drop-in when KubeVirt is disabled + ansible.builtin.file: + path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml" + state: absent + when: not (cozystack_enable_kubevirt | default(true) | bool) + notify: Restart k3s to apply containerd config + - name: Ensure multipath drop-in directory exists ansible.builtin.file: path: /etc/multipath/conf.d diff --git a/tests/unit/playbooks/test_ubuntu_examples.py b/tests/unit/playbooks/test_ubuntu_examples.py index db3ec6c..bede1f8 100644 --- a/tests/unit/playbooks/test_ubuntu_examples.py +++ b/tests/unit/playbooks/test_ubuntu_examples.py @@ -793,3 +793,339 @@ def test_claude_md_dkms_exception_documented(): "CLAUDE.md must document the drbd-dkms exception (Secure Boot " "hosts) so the 'do NOT install' rule is no longer absolute." ) + + +# containerd device_ownership_from_security_context drop-in (CDI block +# imports) — cross-distro invariant on BOTH the task and the handler. + + +_PREPARE_PLAYBOOKS = ( + "examples/ubuntu/prepare-ubuntu.yml", + "examples/rhel/prepare-rhel.yml", + "examples/suse/prepare-suse.yml", +) + + +def _find_handler(plays, name): + for play in plays: + for handler in play.get("handlers", []) or []: + if handler.get("name") == name: + return handler + raise AssertionError( + "handler %r not found in %r" + % (name, [p.get("name") for p in plays]) + ) + + +def test_device_ownership_dropin_enabled_for_cdi_on_all_distros(): + # KubeVirt's CDI importer is a non-root pod that streams VM disk + # images into raw block volumes; containerd only chowns the block + # device to the pod's SecurityContext when + # device_ownership_from_security_context is true on the CRI plugin, + # and k3s ships it disabled. Without it the importer dies with + # "Permission denied", the DataVolume hangs in ImportInProgress, and + # every VM referencing the disk stays Pending. The prepare playbooks + # drop in a CRI config that enables it. Pin the drop-in, its KubeVirt + # gate, and the restart handler across all three distros so the + # mechanism cannot silently regress or drift. + for relpath in _PREPARE_PLAYBOOKS: + plays = _load_playbook(relpath) + + drop = _find_task( + plays, + "Enable device_ownership_from_security_context for CDI block imports", + ) + copy = drop.get("ansible.builtin.copy", {}) or {} + dest = copy.get("dest", "") + assert dest.endswith("/10-cozystack-cri.toml"), ( + "%s: drop-in must be written to a 10-cozystack-cri.toml file " + "under the containerd config-dir glob; got dest=%r" + % (relpath, dest) + ) + content = copy.get("content", "") + assert "device_ownership_from_security_context = true" in content, ( + "%s: drop-in content must set " + "device_ownership_from_security_context = true; got %r" + % (relpath, content) + ) + # The containerd 2.x (config v3) CRI runtime table is the path + # shipped by the pinned k3s. Pin it so a regression to the v2 + # io.containerd.grpc.v1.cri table (which current k3s ignores) is + # caught. + assert "io.containerd.cri.v1.runtime" in content, ( + "%s: drop-in must target the containerd v3 CRI runtime table " + "io.containerd.cri.v1.runtime; got %r" % (relpath, content) + ) + + # Gated on the KubeVirt toggle — no virt, no drop-in. + assert "cozystack_enable_kubevirt" in str(drop.get("when", "")), ( + "%s: drop-in task must gate on cozystack_enable_kubevirt so " + "non-virt clusters skip it; got when=%r" + % (relpath, drop.get("when")) + ) + + # Must notify the restart handler so a re-run against a running + # cluster actually applies the change (the drop-in alone is only + # read at containerd start otherwise). + notify = drop.get("notify") + notify_list = notify if isinstance(notify, list) else [notify] + assert "Restart k3s to apply containerd config" in notify_list, ( + "%s: drop-in task must notify 'Restart k3s to apply containerd " + "config' so the setting takes effect on a running cluster; " + "got notify=%r" % (relpath, notify) + ) + + # The drop-in directory task shares the dest dir and the gate. + mkdir = _find_task( + plays, "Ensure k3s containerd config drop-in directory exists" + ) + f = mkdir.get("ansible.builtin.file", {}) or {} + assert f.get("state") == "directory", ( + "%s: drop-in dir task must create a directory; got state=%r" + % (relpath, f.get("state")) + ) + assert "cozystack_k3s_containerd_dropin_dir" in str(f.get("path", "")), ( + "%s: drop-in dir path must be overridable via " + "cozystack_k3s_containerd_dropin_dir (relocates the file, " + "e.g. a non-default k3s data-dir); got path=%r" + % (relpath, f.get("path")) + ) + assert "cozystack_enable_kubevirt" in str(mkdir.get("when", "")), ( + "%s: drop-in dir task must share the KubeVirt gate; got when=%r" + % (relpath, mkdir.get("when")) + ) + + # The restart handler applies the drop-in on running clusters and + # must tolerate a missing unit: only the server OR agent unit + # exists on a node, and on the full pipeline k3s is not installed + # yet when prepare runs. It must do so WITHOUT masking a genuine + # restart failure of a unit that IS present. + handler = _find_handler( + plays, "Restart k3s to apply containerd config" + ) + systemd = handler.get("ansible.builtin.systemd", {}) or {} + assert systemd.get("state") == "restarted", ( + "%s: restart handler must restart the unit; got state=%r" + % (relpath, systemd.get("state")) + ) + loop = handler.get("loop") or [] + assert "k3s" in loop and "k3s-agent" in loop, ( + "%s: restart handler must cover both k3s and k3s-agent units " + "(server vs agent role); got loop=%r" % (relpath, loop) + ) + # A missing unit is skipped via a service_facts existence gate, + # NOT blanket-suppressed with failed_when: false — a present unit + # that fails to restart (malformed drop-in, k3s that won't come + # back) must still fail the play. + assert "failed_when" not in handler, ( + "%s: restart handler must NOT use failed_when: false — it " + "masks a genuine k3s restart failure, not just a missing " + "unit; gate on ansible_facts.services instead. Got " + "failed_when=%r" % (relpath, handler.get("failed_when")) + ) + assert "ansible_facts.services" in str(handler.get("when", "")), ( + "%s: restart handler must restart only units present in " + "ansible_facts.services; got when=%r" + % (relpath, handler.get("when")) + ) + # ansible_facts.services is not populated by default fact + # gathering, so a service_facts handler must refresh it on the + # same notify topic, and must be defined before the restart so it + # runs first (handlers fire in definition order). + refresh = _find_handler( + plays, "Refresh service facts before k3s restart" + ) + assert "ansible.builtin.service_facts" in refresh, ( + "%s: a service_facts handler must populate " + "ansible_facts.services before the restart gate; got %r" + % (relpath, refresh) + ) + assert ( + refresh.get("listen") == "Restart k3s to apply containerd config" + ), ( + "%s: the service_facts handler must listen on the restart " + "topic so the same notify triggers both; got listen=%r" + % (relpath, refresh.get("listen")) + ) + + +def test_device_ownership_dropin_removed_when_kubevirt_disabled(): + # The drop-in toggle must be reversible (same symmetric-cleanup shape + # as the DRBD drop-ins): a host that carried 10-cozystack-cri.toml + # from an earlier KubeVirt-enabled run, then had + # cozystack_enable_kubevirt set to false, must have the file removed + # so containerd's device_ownership_from_security_context matches the + # toggle. Skipping the create task alone leaves stale host state. + # Removal notifies the restart handler so a running cluster drops it. + for relpath in _PREPARE_PLAYBOOKS: + plays = _load_playbook(relpath) + task = _find_task( + plays, "Remove containerd CDI drop-in when KubeVirt is disabled" + ) + f = task.get("ansible.builtin.file", {}) or {} + assert f.get("state") == "absent", ( + "%s: cleanup task must use state=absent; got %r" + % (relpath, f.get("state")) + ) + path = str(f.get("path", "")) + assert path.endswith("/10-cozystack-cri.toml"), ( + "%s: cleanup must remove the 10-cozystack-cri.toml drop-in; " + "got path=%r" % (relpath, f.get("path")) + ) + assert "cozystack_k3s_containerd_dropin_dir" in path, ( + "%s: cleanup path must honor the cozystack_k3s_containerd_" + "dropin_dir override so it removes the same file the create " + "task wrote; got path=%r" % (relpath, f.get("path")) + ) + when_blob = str(task.get("when", "")) + assert "not" in when_blob and "cozystack_enable_kubevirt" in when_blob, ( + "%s: cleanup must run only when KubeVirt is disabled " + "(not cozystack_enable_kubevirt); got when=%r" + % (relpath, task.get("when")) + ) + notify = task.get("notify") + notify_list = notify if isinstance(notify, list) else [notify] + assert "Restart k3s to apply containerd config" in notify_list, ( + "%s: cleanup must notify the restart handler so a running " + "cluster drops the setting; got notify=%r" % (relpath, notify) + ) + + +def test_readme_documents_dropin_rationale_and_restart(): + # Two things a maintainer/operator must learn from the README section + # describing the drop-in, pinned against doc drift: + # 1. k3s has a native --nonroot-devices flag that sets the same + # option; the section must acknowledge it and say why the drop-in + # is used instead (uniform server+agent coverage, applies to a + # running cluster) — otherwise a future maintainer rediscovers the + # flag and assumes the drop-in was chosen out of ignorance. + # 2. the restart handler bounces k3s on a live re-run, which is + # disruptive — operators must be warned to run it in a window. + readme_path = os.path.join(REPO_ROOT, "README.md") + with open(readme_path, "r", encoding="utf-8") as fh: + readme = fh.read() + marker = "#### Enabled by default: containerd device ownership for CDI" + assert marker in readme, ( + "README must keep the '%s...' anchor so this test can scope to it" + % marker + ) + start = readme.index(marker) + rest = readme[start + len(marker):] + nxt = rest.find("\n#### ") + nxt_h3 = rest.find("\n### ") + candidates = [i for i in (nxt, nxt_h3) if i != -1] + end = start + len(marker) + (min(candidates) if candidates else len(rest)) + section = readme[start:end] + + assert "--nonroot-devices" in section, ( + "README drop-in section must mention the native k3s " + "--nonroot-devices flag as the alternative, and why the drop-in " + "is used instead, so the design choice is documented. Section: %r" + % section + ) + lowered = section.lower() + assert "restart" in lowered and ( + "maintenance window" in lowered or "mid-day" in lowered + ), ( + "README drop-in section must warn that a re-run against a live " + "cluster restarts k3s and should be done in a maintenance window. " + "Section: %r" % section + ) + + # The containerd-1.x guidance must be accurate: the drop-in content is + # hardcoded v3 and cozystack_k3s_containerd_dropin_dir only relocates + # the file, so 1.x operators must write their own drop-in. The README + # must not imply a one-variable 1.x path the code cannot honor. + assert "write your own" in lowered, ( + "README drop-in section must tell containerd-1.x operators to " + "write their own drop-in — the v3 content is hardcoded and the " + "directory variable does not rewrite it. Section: %r" % section + ) + assert "only relocates" in lowered or "does not rewrite" in lowered, ( + "README must state that cozystack_k3s_containerd_dropin_dir only " + "relocates the file and does not change its (v3) content, so the " + "containerd-1.x instruction is not misleading. Section: %r" + % section + ) + + +def test_claude_md_documents_cdi_device_ownership_trap(): + # This change adds a fourth silent-failure trap of the same class as + # the ones CLAUDE.md already enumerates (multipath DRBD blacklist, + # vhost_net, br_netfilter). The canonical "Critical silent-failure + # traps" list must include the containerd device-ownership trap so + # the project guidance does not go stale and a future contributor + # does not reintroduce the gap. Mirrors + # test_claude_md_dkms_exception_documented. + claude_path = os.path.join(REPO_ROOT, "CLAUDE.md") + if not os.path.exists(claude_path): + return # CLAUDE.md is optional; skip silently if absent + with open(claude_path, "r", encoding="utf-8") as fh: + claude = fh.read() + assert "device_ownership_from_security_context" in claude, ( + "CLAUDE.md must list the containerd " + "device_ownership_from_security_context trap alongside the " + "multipath/vhost_net/br_netfilter traps so the CDI block-import " + "failure mode is part of the canonical silent-failure list." + ) + # The entry must be actionable — name the symptom so a reader can + # match it to what they observe. + assert "ImportInProgress" in claude or "cdi-block-volume" in claude, ( + "CLAUDE.md device-ownership trap entry must name the observable " + "symptom (CDI importer Permission denied / DataVolume " + "ImportInProgress) so it is actionable, not just a flag name." + ) + + +def test_readme_variables_table_documents_containerd_dropin(): + # The "Example playbook variables" table is the canonical reference + # operators consult. Two things must be true after this change: + # 1. the new cozystack_k3s_containerd_dropin_dir tunable appears in + # the table (every other examples/* tunable does), and + # 2. the cozystack_enable_kubevirt row reflects that the toggle now + # ALSO gates the containerd device-ownership drop-in, not just + # the kernel modules — otherwise an operator flipping it to false + # won't realise they also disable the CDI block-import fix. + readme_path = os.path.join(REPO_ROOT, "README.md") + with open(readme_path, "r", encoding="utf-8") as fh: + readme = fh.read() + marker = "### Example playbook variables" + assert marker in readme, ( + "README must keep the '%s' anchor so this test can scope to it" + % marker + ) + section_start = readme.index(marker) + rest = readme[section_start + len(marker):] + next_h2 = rest.find("\n## ") + next_h3 = rest.find("\n### ") + candidates = [i for i in (next_h2, next_h3) if i != -1] + section_end = section_start + len(marker) + ( + min(candidates) if candidates else len(rest) + ) + section = readme[section_start:section_end] + + assert "cozystack_k3s_containerd_dropin_dir" in section, ( + "cozystack_k3s_containerd_dropin_dir must appear in the Example " + "playbook variables table — it is an overridable examples/* " + "tunable, and the table is where operators look for one." + ) + + # The kubevirt row specifically must mention the drop-in, so the + # expanded meaning of the toggle is visible at the row a reader scans. + kubevirt_rows = [ + line for line in section.splitlines() + if "`cozystack_enable_kubevirt`" in line + ] + assert kubevirt_rows, ( + "Example playbook variables table must have a " + "cozystack_enable_kubevirt row" + ) + assert any( + "device_ownership" in row or "drop-in" in row or "CDI" in row + for row in kubevirt_rows + ), ( + "the cozystack_enable_kubevirt row must note it also gates the " + "containerd device-ownership drop-in (CDI block imports), so " + "disabling KubeVirt prep is understood to disable that fix too. " + "Got rows: %r" % kubevirt_rows + )