Skip to content

fix(prepare): enable containerd device_ownership_from_security_context for CDI block imports#48

Merged
Aleksei Sviridkin (lexfrei) merged 10 commits into
mainfrom
feat/k3s-nonroot-devices
Jun 3, 2026
Merged

fix(prepare): enable containerd device_ownership_from_security_context for CDI block imports#48
Aleksei Sviridkin (lexfrei) merged 10 commits into
mainfrom
feat/k3s-nonroot-devices

Conversation

@lexfrei

@lexfrei Aleksei Sviridkin (lexfrei) commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Enable containerd's device_ownership_from_security_context on the CRI plugin via a k3s config drop-in, so KubeVirt's CDI importer (a non-root pod) can write VM disk images into raw block volumes. k3s ships the setting disabled; without it the importer fails with blockdev: cannot open /dev/cdi-block-volume: Permission denied, the DataVolume hangs in ImportInProgress, and every VM referencing the disk stays Pending.

Builds on the containerd drop-in approach from #47, adding cross-distro test coverage and documentation.

Changes

  • Drop a containerd CRI config (config-v3.toml.d/10-cozystack-cri.toml) in all three example prepare playbooks (Ubuntu/Debian, RHEL-family, SUSE), gated behind cozystack_enable_kubevirt. A handler restarts k3s so the setting also applies on re-runs against a running cluster.
  • Drop-in directory overridable via cozystack_k3s_containerd_dropin_dir for containerd 1.x clusters.
  • Add cross-distro structural tests pinning the drop-in, its KubeVirt gate, the containerd v3 CRI runtime table, and the restart handler.
  • Document the setting and the CDI failure mode in the README.
  • Record the change under a new Unreleased CHANGELOG section (no collection version bump — the version tracks the upstream Cozystack chart release).
  • Reflow the remaining hard-wrapped README prose into single-line paragraphs (formatting only).

Test plan

  • ansible-lint passes (production profile)
  • ansible-test sanity passes (covered by CI)
  • Tested on a live cluster (describe environment)
  • Idempotency verified (second run: changed=0)

Verified locally: ansible-playbook --syntax-check on all three prepare playbooks, and pytest tests/unit/playbooks/ (30 passed).

Summary by CodeRabbit

  • Bug Fixes

    • Fixed KubeVirt CDI disk imports failing with permission errors when importing to raw block volumes.
  • New Features

    • Added automatic containerd configuration support for KubeVirt operations.
  • Documentation

    • Updated installation guides explaining containerd configuration requirements and important notes.
  • Tests

    • Added validation tests for the containerd configuration and documentation accuracy.

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@lexfrei, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 46 minutes and 46 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8c5f0e16-694e-4170-8699-f4743ef4ed9d

📥 Commits

Reviewing files that changed from the base of the PR and between 5f8195d and 8213574.

📒 Files selected for processing (5)
  • CHANGELOG.rst
  • examples/rhel/prepare-rhel.yml
  • examples/suse/prepare-suse.yml
  • examples/ubuntu/prepare-ubuntu.yml
  • tests/unit/playbooks/test_ubuntu_examples.py
📝 Walkthrough

Walkthrough

Example prepare playbooks add a kubevirt-gated containerd CRI drop-in (10-cozystack-cri.toml) enabling device_ownership_from_security_context, a handler to restart k3s/k3s-agent to apply it, README and changelog documentation, and cross-distro unit tests verifying the drop-in and handler wiring.

Changes

KubeVirt CDI containerd drop-in and wiring

Layer / File(s) Summary
Release notes entry
CHANGELOG.rst
Adds Unreleased → Bugfixes note describing enabling containerd CRI device_ownership_from_security_context via a k3s drop-in and documents gating (cozystack_enable_kubevirt) and the drop-in dir override (cozystack_k3s_containerd_dropin_dir).
README / CLAUDE documentation updates
README.md, CLAUDE.md
Adds a subsection explaining containerd device ownership for CDI block imports, reflows and condenses several README sections, and adds a CLAUDE.md trap entry noting the ImportInProgress/VM Pending symptom and the required drop-in.
Example prepare playbooks: handlers + containerd drop-in
examples/ubuntu/prepare-ubuntu.yml, examples/rhel/prepare-rhel.yml, examples/suse/prepare-suse.yml
Adds a Refresh service facts before k3s restart handler and Restart k3s to apply containerd config handler (restarts k3s and k3s-agent only when present), and kubevirt-gated tasks to create/remove the containerd config-v3.toml.d drop-in directory and 10-cozystack-cri.toml enabling device_ownership_from_security_context, notifying the restart handler.
Cross-distro unit tests for drop-in and handler
tests/unit/playbooks/test_ubuntu_examples.py
Adds _PREPARE_PLAYBOOKS, helper _find_handler, and tests asserting the drop-in file destination/content, gating on cozystack_enable_kubevirt, directory override via cozystack_k3s_containerd_dropin_dir, notification to the restart handler, presence of a preceding service_facts refresh handler, and a cleanup task when disabled.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I write a tiny TOML, a gentle, guarded tweak,
So virt-imports find their block with permissions they seek,
Playbooks hum, handlers restart k3s with care,
Tests nod in chorus — the drop-in's placed there,
Hopping onward, configs now play fair.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: enabling containerd's device_ownership_from_security_context for KubeVirt CDI block imports via a fix to the prepare playbooks.
Description check ✅ Passed The description follows the template with complete Summary and Changes sections; Test plan lists verification efforts with some items checked and notes local pytest passage (30 passed).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/k3s-nonroot-devices

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables the k3s --nonroot-devices flag by default across the example prepare playbooks (Ubuntu, RHEL, and SUSE) to allow non-root KubeVirt VMs to access required device nodes. It also introduces the cozystack_k3s_agent_args variable to apply this setting to agent nodes, and updates the documentation and unit tests accordingly. The review feedback highlights an issue with Ansible variable precedence: defining cozystack_k3s_agent_args in the play-level vars block prevents inventory overrides. It is recommended to remove this variable from play-level vars and instead use the default() filter within the tasks and update the corresponding tests.

Comment thread examples/ubuntu/prepare-ubuntu.yml Outdated
# its devices. Servers get the flag via cozystack_k3s_server_args
# (they run an embedded agent); this covers agent-only nodes, which
# k3s.orchestration wires through extra_agent_args.
cozystack_k3s_agent_args: "--nonroot-devices"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Defining cozystack_k3s_agent_args in the play's vars block prevents it from being overridden by inventory variables (since play vars have higher precedence in Ansible). This contradicts the README which states that cozystack_k3s_agent_args is a user-facing variable meant to be set in the inventory. We should remove it from play vars and define the default value directly in the task using the default() filter.

    # cozystack_k3s_agent_args is defined in the task below via default() to allow inventory overrides.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was reworked after this review: the --nonroot-devices / cozystack_k3s_agent_args approach was replaced with a containerd CRI config drop-in (10-cozystack-cri.toml) that enables device_ownership_from_security_context. The drop-in applies uniformly to every node in the cluster group (including agents, for which the example playbooks do not wire extra_agent_args) and can be applied to an already-running cluster, which an install-time k3s flag cannot. cozystack_k3s_agent_args no longer exists in the playbooks, so this comment no longer applies.

Comment thread examples/ubuntu/prepare-ubuntu.yml Outdated
Comment on lines +174 to +177
- name: Set k3s agent args for Cozystack
ansible.builtin.set_fact:
extra_agent_args: "{{ cozystack_k3s_agent_args }}"
when: extra_agent_args is not defined

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the task to use the default() filter so that cozystack_k3s_agent_args defaults to --nonroot-devices if not defined in the inventory.

    - name: Set k3s agent args for Cozystack
      ansible.builtin.set_fact:
        extra_agent_args: "{{ cozystack_k3s_agent_args | default('--nonroot-devices') }}"
      when: extra_agent_args is not defined

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was reworked after this review: the --nonroot-devices / cozystack_k3s_agent_args approach was replaced with a containerd CRI config drop-in (10-cozystack-cri.toml) that enables device_ownership_from_security_context. The drop-in applies uniformly to every node in the cluster group (including agents, for which the example playbooks do not wire extra_agent_args) and can be applied to an already-running cluster, which an install-time k3s flag cannot. cozystack_k3s_agent_args no longer exists in the playbooks, so this comment no longer applies.

Comment thread examples/rhel/prepare-rhel.yml Outdated
# its devices. Servers get the flag via cozystack_k3s_server_args
# (they run an embedded agent); this covers agent-only nodes, which
# k3s.orchestration wires through extra_agent_args.
cozystack_k3s_agent_args: "--nonroot-devices"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Defining cozystack_k3s_agent_args in the play's vars block prevents it from being overridden by inventory variables (since play vars have higher precedence in Ansible). This contradicts the README which states that cozystack_k3s_agent_args is a user-facing variable meant to be set in the inventory. We should remove it from play vars and define the default value directly in the task using the default() filter.

    # cozystack_k3s_agent_args is defined in the task below via default() to allow inventory overrides.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was reworked after this review: the --nonroot-devices / cozystack_k3s_agent_args approach was replaced with a containerd CRI config drop-in (10-cozystack-cri.toml) that enables device_ownership_from_security_context. The drop-in applies uniformly to every node in the cluster group (including agents, for which the example playbooks do not wire extra_agent_args) and can be applied to an already-running cluster, which an install-time k3s flag cannot. cozystack_k3s_agent_args no longer exists in the playbooks, so this comment no longer applies.

Comment thread examples/rhel/prepare-rhel.yml Outdated
Comment on lines +158 to +161
- name: Set k3s agent args for Cozystack
ansible.builtin.set_fact:
extra_agent_args: "{{ cozystack_k3s_agent_args }}"
when: extra_agent_args is not defined

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the task to use the default() filter so that cozystack_k3s_agent_args defaults to --nonroot-devices if not defined in the inventory.

    - name: Set k3s agent args for Cozystack
      ansible.builtin.set_fact:
        extra_agent_args: "{{ cozystack_k3s_agent_args | default('--nonroot-devices') }}"
      when: extra_agent_args is not defined

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was reworked after this review: the --nonroot-devices / cozystack_k3s_agent_args approach was replaced with a containerd CRI config drop-in (10-cozystack-cri.toml) that enables device_ownership_from_security_context. The drop-in applies uniformly to every node in the cluster group (including agents, for which the example playbooks do not wire extra_agent_args) and can be applied to an already-running cluster, which an install-time k3s flag cannot. cozystack_k3s_agent_args no longer exists in the playbooks, so this comment no longer applies.

Comment thread examples/suse/prepare-suse.yml Outdated
# its devices. Servers get the flag via cozystack_k3s_server_args
# (they run an embedded agent); this covers agent-only nodes, which
# k3s.orchestration wires through extra_agent_args.
cozystack_k3s_agent_args: "--nonroot-devices"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Defining cozystack_k3s_agent_args in the play's vars block prevents it from being overridden by inventory variables (since play vars have higher precedence in Ansible). This contradicts the README which states that cozystack_k3s_agent_args is a user-facing variable meant to be set in the inventory. We should remove it from play vars and define the default value directly in the task using the default() filter.

    # cozystack_k3s_agent_args is defined in the task below via default() to allow inventory overrides.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was reworked after this review: the --nonroot-devices / cozystack_k3s_agent_args approach was replaced with a containerd CRI config drop-in (10-cozystack-cri.toml) that enables device_ownership_from_security_context. The drop-in applies uniformly to every node in the cluster group (including agents, for which the example playbooks do not wire extra_agent_args) and can be applied to an already-running cluster, which an install-time k3s flag cannot. cozystack_k3s_agent_args no longer exists in the playbooks, so this comment no longer applies.

Comment thread examples/suse/prepare-suse.yml Outdated
Comment on lines +153 to +156
- name: Set k3s agent args for Cozystack
ansible.builtin.set_fact:
extra_agent_args: "{{ cozystack_k3s_agent_args }}"
when: extra_agent_args is not defined

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the task to use the default() filter so that cozystack_k3s_agent_args defaults to --nonroot-devices if not defined in the inventory.

    - name: Set k3s agent args for Cozystack
      ansible.builtin.set_fact:
        extra_agent_args: "{{ cozystack_k3s_agent_args | default('--nonroot-devices') }}"
      when: extra_agent_args is not defined

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was reworked after this review: the --nonroot-devices / cozystack_k3s_agent_args approach was replaced with a containerd CRI config drop-in (10-cozystack-cri.toml) that enables device_ownership_from_security_context. The drop-in applies uniformly to every node in the cluster group (including agents, for which the example playbooks do not wire extra_agent_args) and can be applied to an already-running cluster, which an install-time k3s flag cannot. cozystack_k3s_agent_args no longer exists in the playbooks, so this comment no longer applies.

Comment on lines +834 to +840
agent_args = play_vars.get("cozystack_k3s_agent_args") or ""
assert "--nonroot-devices" in agent_args, (
"%s: cozystack_k3s_agent_args must include --nonroot-devices "
"so agent/worker nodes get the same containerd setting — VMs "
"scheduled there would otherwise fail to access devices. "
"Got: %r" % (relpath, agent_args)
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the test assertion to verify that cozystack_k3s_agent_args is NOT defined in play-level vars (to ensure it can be overridden from inventory).

        assert "cozystack_k3s_agent_args" not in play_vars, (
            "%s: cozystack_k3s_agent_args must not be defined in play-level vars "
            "so it can be overridden from inventory. Got: %r" % (relpath, play_vars)
        )

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was reworked after this review: the --nonroot-devices / cozystack_k3s_agent_args approach was replaced with a containerd CRI config drop-in (10-cozystack-cri.toml) that enables device_ownership_from_security_context. The drop-in applies uniformly to every node in the cluster group (including agents, for which the example playbooks do not wire extra_agent_args) and can be applied to an already-running cluster, which an install-time k3s flag cannot. cozystack_k3s_agent_args no longer exists in the playbooks, so this comment no longer applies.

Comment on lines +852 to +856
assert "cozystack_k3s_agent_args" in set_fact["extra_agent_args"], (
"%s: extra_agent_args must be driven by cozystack_k3s_agent_args "
"so the default flows through. Got: %r"
% (relpath, set_fact["extra_agent_args"])
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the test assertion to verify that extra_agent_args is driven by cozystack_k3s_agent_args and defaults to --nonroot-devices.

        assert "cozystack_k3s_agent_args" in set_fact["extra_agent_args"], (
            "%s: extra_agent_args must be driven by cozystack_k3s_agent_args "
            "so the default flows through. Got: %r"
            % (relpath, set_fact["extra_agent_args"])
        )
        assert "default('--nonroot-devices')" in set_fact["extra_agent_args"], (
            "%s: extra_agent_args must default to --nonroot-devices when "
            "cozystack_k3s_agent_args is undefined. Got: %r"
            % (relpath, set_fact["extra_agent_args"])
        )

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was reworked after this review: the --nonroot-devices / cozystack_k3s_agent_args approach was replaced with a containerd CRI config drop-in (10-cozystack-cri.toml) that enables device_ownership_from_security_context. The drop-in applies uniformly to every node in the cluster group (including agents, for which the example playbooks do not wire extra_agent_args) and can be applied to an already-running cluster, which an install-time k3s flag cannot. cozystack_k3s_agent_args no longer exists in the playbooks, so this comment no longer applies.

@lexfrei Aleksei Sviridkin (lexfrei) marked this pull request as ready for review June 1, 2026 17:59
@lexfrei Aleksei Sviridkin (lexfrei) changed the title feat(examples): enable k3s nonroot-devices for KubeVirt device access fix(prepare): enable containerd device_ownership_from_security_context for CDI block imports Jun 1, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/rhel/prepare-rhel.yml`:
- Around line 227-237: Add a reversible cleanup task that removes the drop-in
file "10-cozystack-cri.toml" when cozystack_enable_kubevirt is false so the host
state matches the toggle: create a new ansible.builtin.file or
ansible.builtin.file state=absent task (matching the existing "Enable
device_ownership_from_security_context for CDI block imports" task) that targets
the same destination path and is conditioned on "when: not
(cozystack_enable_kubevirt | default(true) | bool)" and also notify the same
handler "Restart k3s to apply containerd config"; replicate the same cleanup
task in the Ubuntu and SUSE variant playbooks where the one-way drop-in creation
exists.

In `@examples/ubuntu/prepare-ubuntu.yml`:
- Around line 141-152: The task "Restart k3s to apply containerd config"
currently uses failed_when: false which masks real restart errors; instead
gather service facts first (via ansible.builtin.service_facts) and change the
ansible.builtin.systemd task looping over items (k3s, k3s-agent) to run only
when the unit exists (e.g. when: "item in ansible_facts.services") and remove
failed_when so genuine failures surface; apply the same change to the RHEL and
SUSE variants to ensure malformed drop-ins or restart failures fail the play
rather than being silently ignored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ec08ded-70f1-4e84-8096-109efde4d21c

📥 Commits

Reviewing files that changed from the base of the PR and between ab07586 and 9efc32e.

📒 Files selected for processing (6)
  • CHANGELOG.rst
  • README.md
  • examples/rhel/prepare-rhel.yml
  • examples/suse/prepare-suse.yml
  • examples/ubuntu/prepare-ubuntu.yml
  • tests/unit/playbooks/test_ubuntu_examples.py
✅ Files skipped from review due to trivial changes (2)
  • CHANGELOG.rst
  • README.md

Comment thread examples/rhel/prepare-rhel.yml
Comment thread examples/ubuntu/prepare-ubuntu.yml Outdated
Andrei Kvapil (kvaps) and others added 9 commits June 3, 2026 11:07
…t for CDI block imports

KubeVirt's CDI importer writes VM disk images into raw block volumes
from a non-root pod. containerd only chowns the block device to the
pod's SecurityContext when device_ownership_from_security_context is
enabled on the CRI plugin, and k3s ships it disabled. Without it the
importer fails with 'cannot open /dev/cdi-block-volume: Permission
denied', the DataVolume hangs in ImportInProgress, and VMs referencing
the disk stay Pending.

Add a k3s containerd drop-in (config-v3.toml.d/10-cozystack-cri.toml)
to all three prepare playbooks, gated behind cozystack_enable_kubevirt
and overridable via cozystack_k3s_containerd_dropin_dir.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <andrei.kvapil@aenix.io>
Pin the device_ownership_from_security_context drop-in, its KubeVirt gate, the containerd v3 CRI runtime table, and the k3s restart handler across the Ubuntu/RHEL/SUSE prepare playbooks, so the mechanism cannot silently regress or drift between distros.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…stream

The collection version tracks the upstream Cozystack chart release; a bugfix on its own does not force a version bump. Keep galaxy.yml at the inherited version and record the change under an Unreleased CHANGELOG section, to be renamed when upstream bumps.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Reflow the remaining hard-wrapped prose paragraphs so each paragraph is one continuous line, letting the renderer wrap to viewer width. Formatting only; tables, code blocks and YAML are left untouched.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
… caveat

Explain why the collection uses a containerd config drop-in rather than k3s's native --nonroot-devices flag (uniform server+agent coverage without wiring agent args, and it applies to an already-running cluster), and warn that the restart handler bounces k3s when the drop-in is first added or changes on a live re-run. Add a doc-drift guard test pinning both points to the drop-in README section.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add the containerd device_ownership_from_security_context failure (CDI block import Permission denied -> DataVolume ImportInProgress -> VM Pending) to the project's canonical Critical silent-failure traps list, alongside the multipath/vhost_net/br_netfilter entries, with a guard test pinning it.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The drop-in content (version = 3, io.containerd.cri.v1.runtime) is hardcoded for containerd 2.x; cozystack_k3s_containerd_dropin_dir only relocates the file, it does not rewrite the content. State plainly that containerd 1.x is not handled as-is and operators must write their own v2 drop-in, rather than implying a one-variable override that the code cannot honor. Guard the corrected wording with a test.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…gle scope

Add cozystack_k3s_containerd_dropin_dir to the Example playbook variables table (the canonical override reference, where every other examples/* tunable is listed), and extend the cozystack_enable_kubevirt row to note it now also gates the containerd device-ownership drop-in for CDI block imports — so disabling KubeVirt prep is understood to disable that fix too. Also attribute the k3s version pin to the example inventories rather than the role. Guard both table facts with a test.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The drop-in toggle was one-way: cozystack_enable_kubevirt: false only
skipped writing 10-cozystack-cri.toml, so a host that ran with KubeVirt
enabled kept device_ownership_from_security_context on and the host
state no longer matched the toggle. Add a symmetric cleanup task that
removes the drop-in when the toggle is off and notifies the restart
handler, mirroring the existing DRBD drop-in cleanup pattern.

The restart handler used failed_when: false, which masked every
failure, not just the intended missing-unit case (only one of
k3s/k3s-agent exists on a node, and on a full-pipeline run neither
exists yet when prepare runs). A malformed drop-in or a k3s that failed
to come back was silently reported as success. Refresh service facts on
the same notify topic and restart only units present in
ansible_facts.services, so a genuine restart failure fails the play
while a missing unit is still skipped.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/unit/playbooks/test_ubuntu_examples.py (1)

934-947: ⚡ Quick win

Assert handler order, not just handler presence.

This test explains that service_facts must be defined before the restart handler, but it never checks that ordering. A future reorder would still pass here and break the ansible_facts.services gate at runtime, so please lock the relative handler indices down too.

Proposed test tightening
 def _find_handler(plays, name):
     for play in plays:
         for handler in play.get("handlers", []) or []:
             if handler.get("name") == name:
                 return handler
     raise AssertionError(
         "handler %r not found in %r"
         % (name, [p.get("name") for p in plays])
     )
+
+
+def _handler_index(plays, name):
+    for play in plays:
+        handlers = play.get("handlers", []) or []
+        for idx, handler in enumerate(handlers):
+            if handler.get("name") == name:
+                return idx
+    raise AssertionError(
+        "handler %r not found in %r"
+        % (name, [p.get("name") for p in plays])
+    )
@@
         assert (
             refresh.get("listen") == "Restart k3s to apply containerd config"
         ), (
             "%s: the service_facts handler must listen on the restart "
             "topic so the same notify triggers both; got listen=%r"
             % (relpath, refresh.get("listen"))
         )
+        assert _handler_index(
+            plays, "Refresh service facts before k3s restart"
+        ) < _handler_index(
+            plays, "Restart k3s to apply containerd config"
+        ), (
+            "%s: service_facts must be defined before the restart handler "
+            "so ansible_facts.services is populated before the `when:` "
+            "gate runs" % relpath
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/playbooks/test_ubuntu_examples.py` around lines 934 - 947, The
test currently only asserts the presence and listener of the "Refresh service
facts before k3s restart" handler (found via _find_handler and stored in
refresh) but does not assert ordering; update the test to also locate the
restart handler (the handler whose "listen" is "Restart k3s to apply containerd
config" or by name) and assert that the index/position of the service_facts
handler in plays is less than the index of the restart handler so the
service_facts handler runs before the restart gate; use the existing
_find_handler/plays structures and relpath in the assertion message to report
failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/rhel/prepare-rhel.yml`:
- Around line 223-245: The drop-in content in the "Enable
device_ownership_from_security_context for CDI block imports" task is hardcoded
for containerd v2/schema version 3 and will not apply to containerd 1.x; update
the task to choose content based on the containerd major version (or fail fast)
— detect containerd major (e.g. via a new variable like
cozystack_containerd_major or an existing fact), and when major == 2 keep
version = 3 and plugins.'io.containerd.cri.v1.runtime', but when major == 1 emit
version = 2 and plugins.'io.containerd.grpc.v1.cri' (or explicitly error out),
and ensure the directory task ("Ensure k3s containerd config drop-in directory
exists") and cozystack_k3s_containerd_dropin_dir usage remain consistent with
the chosen schema.

In `@examples/suse/prepare-suse.yml`:
- Around line 221-240: The CDI drop-in currently always writes containerd v2/3
content (version = 3 and [plugins.'io.containerd.cri.v1.runtime']) which is
incompatible with containerd 1.x; update the tasks that reference
cozystack_k3s_containerd_dropin_dir and the "Enable
device_ownership_from_security_context for CDI block imports" copy to be
version-aware: add a condition (e.g., a variable like
cozystack_k3s_containerd_major_version or detect via existing override) and
create two alternative copy tasks or a single templated content block—for
containerd 1.x write version = 2 (or omit) and use
[plugins.'io.containerd.grpc.v1.cri'] with
device_ownership_from_security_context = true, and for containerd 2.x/3 keep
version = 3 and [plugins.'io.containerd.cri.v1.runtime']; ensure the when
clauses reference cozystack_enable_kubevirt and the chosen version variable so
the correct drop-in content is written.

---

Nitpick comments:
In `@tests/unit/playbooks/test_ubuntu_examples.py`:
- Around line 934-947: The test currently only asserts the presence and listener
of the "Refresh service facts before k3s restart" handler (found via
_find_handler and stored in refresh) but does not assert ordering; update the
test to also locate the restart handler (the handler whose "listen" is "Restart
k3s to apply containerd config" or by name) and assert that the index/position
of the service_facts handler in plays is less than the index of the restart
handler so the service_facts handler runs before the restart gate; use the
existing _find_handler/plays structures and relpath in the assertion message to
report failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 26a568bf-6a18-46ea-a999-cb3eb02ebd38

📥 Commits

Reviewing files that changed from the base of the PR and between 9efc32e and 5f8195d.

📒 Files selected for processing (7)
  • CHANGELOG.rst
  • CLAUDE.md
  • README.md
  • examples/rhel/prepare-rhel.yml
  • examples/suse/prepare-suse.yml
  • examples/ubuntu/prepare-ubuntu.yml
  • tests/unit/playbooks/test_ubuntu_examples.py
✅ Files skipped from review due to trivial changes (2)
  • CLAUDE.md
  • CHANGELOG.rst
🚧 Files skipped from review as they are similar to previous changes (2)
  • examples/ubuntu/prepare-ubuntu.yml
  • README.md

Comment thread examples/rhel/prepare-rhel.yml
Comment thread examples/suse/prepare-suse.yml
…changelog

The playbook comments and the changelog entry still claimed
cozystack_k3s_containerd_dropin_dir could target a containerd 1.x
cluster, but the drop-in content is hardcoded for containerd 2.x
(config version 3, [plugins.'io.containerd.cri.v1.runtime']). Pointing
the variable at a 1.x config-dir does not produce a working config:
containerd 1.x needs config version 2 and
[plugins.'io.containerd.grpc.v1.cri']. The override only relocates the
file (e.g. a non-default k3s data-dir); it does not rewrite the
content. The README already states this — align the playbook comments,
the changelog, and a stale test message with it.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@lexfrei Aleksei Sviridkin (lexfrei) merged commit b32c277 into main Jun 3, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants