Fix kptr dtor deadlock by kernel-patches-daemon-bpf[bot] · Pull Request #12417 · kernel-patches/bpf

kernel-patches-daemon-bpf · 2026-06-09T20:39:56Z

Pull request for series with
subject: Fix kptr dtor deadlock
version: 3
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1108815

kernel-patches-daemon-bpf · 2026-06-09T20:39:58Z

Upstream branch: c15261b
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1108815
version: 3

kernel-patches-review-bot · 2026-06-09T20:58:13Z

commit 99dd649358244d70fab2b545944a19a59e9a169a
Author: Justin Suess <utilityemal77@gmail.com>

bpf: Reject bpf_obj_drop() from tracing progs

This patch adds NMI safety checks for bpf_obj_drop() and
bpf_percpu_obj_drop() when called from tracing programs. It introduces a
whitelist approach for NMI-safe BTF field types and rejects drops of
objects with NMI-unsafe fields from tracing contexts. The patch also
converts a test case from tracing to syscall to comply with the new
restriction.

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 62bba7a4876f..0654d2ffadc1 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -492,6 +492,35 @@ static inline bool btf_record_has_field(const struct btf_record *rec, enum btf_f
>  	return rec->field_mask & type;
>  }
>
> +static inline bool btf_field_is_nmi_safe(enum btf_field_type type)
> +{
> +	switch (type) {
> +	case BPF_SPIN_LOCK:
> +	case BPF_RES_SPIN_LOCK:
> +	case BPF_TIMER:
> +	case BPF_WORKQUEUE:
> +	case BPF_TASK_WORK:
> +	case BPF_KPTR_UNREF:
> +	case BPF_REFCOUNT:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}

Is it safe to include BPF_TIMER and BPF_WORKQUEUE in this NMI-safe
whitelist? Looking at the cleanup paths, dropping an object with these
fields calls bpf_timer_cancel_and_free() or bpf_wq_cancel_and_free(),
which synchronously invoke bpf_prog_put() via
bpf_async_update_prog_callback():

kernel/bpf/helpers.c:bpf_async_update_prog_callback()
    bpf_prog_put()
        __bpf_prog_put()
            schedule_work()

If this drops the last reference to the BPF program while in NMI context,
__bpf_prog_put() falls back to schedule_work() since irqs are disabled.
Since schedule_work() takes a raw spin lock on the workqueue pool, couldn't
this hard deadlock if the NMI interrupted another thread that was already
holding the workqueue pool lock?

BPF_TASK_WORK safely defers via irq_work_queue(), but these fields do not
appear to have the same protection.

This concern was raised by sashiko-bot in v1:
https://lore.kernel.org/bpf/20260608154008.27E011F00893@smtp.kernel.org/

Would it be safer to exclude BPF_TIMER and BPF_WORKQUEUE from the
NMI-safe whitelist?

> +
> +static inline bool btf_record_has_nmi_unsafe_fields(const struct btf_record *rec)
> +{
> +	int i;
> +
> +	if (IS_ERR_OR_NULL(rec))
> +		return false;
> +	for (i = 0; i < rec->cnt; i++) {
> +		if (!btf_field_is_nmi_safe(rec->fields[i].type))
> +			return true;
> +	}
> +	return false;
> +}

This isn't a bug, but these helper functions were suggested to be moved
from the header file into verifier.c since they are only used there:
https://lore.kernel.org/bpf/6461e40c-d5a9-41a5-aaba-bf063de67d3f@gmail.com/

Is there a reason to keep them in the header?

[ ... ]

AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf: Reject bpf_obj_drop() from tracing progs
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27234422852

kernel-patches-daemon-bpf · 2026-06-09T20:58:21Z

Forwarding comment 4663960142 via email
In-Reply-To: 20260609202548.3571690-2-memxor@gmail.com
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/20260609202548.3571690-2-memxor@gmail.com/

kernel-patches-review-bot · 2026-06-09T21:01:46Z

commit 45e7d1059deddcbecdad88fa065d982fd50210a0
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>

selftests/bpf: Exercise unsafe obj drops from tracing progs

This adds task_kfunc failure cases for bpf_obj_drop() on local objects
with referenced kptr fields from tracing and NMI tracing programs, which
must be rejected because dropping the object would run full special-field
destruction synchronously in an unsafe context.

> diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_failure.c b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
> index 8e947d445f8e..8942b5478129 100644
> --- a/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
> +++ b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c

[ ... ]

> @@ -233,6 +234,45 @@ int BPF_PROG(task_kfunc_release_unacquired, struct task_struct *task, u64 clone_
>  	return 0;
>  }
>
> +SEC("tp_btf/task_newtask")
> +__failure __msg("bpf_obj_drop cannot be used in tracing programs on types with NMI unsafe fields")
> +int BPF_PROG(task_kfunc_obj_drop_with_kptr, struct task_struct *task, u64 clone_flags)
> +{

[ ... ]

> +SEC("tp_btf/task_newtask")
> +__failure __msg("bpf_obj_drop cannot be used in tracing programs on types with NMI unsafe fields")
> +int BPF_PROG(task_kfunc_obj_drop_nmi_with_kptr, struct task_struct *task,
> +	     u64 clone_flags)
> +{

Does task_kfunc_obj_drop_nmi_with_kptr actually exercise the NMI tracing
context its name and the commit message refer to?

It uses SEC("tp_btf/task_newtask"), the same standard tracing section as
task_kfunc_obj_drop_with_kptr just above it, so it would fail at load time
for the same verifier reason rather than from an NMI-specific path.

Would an NMI context such as BPF_PROG_TYPE_PERF_EVENT / SEC("perf_event")
be needed here to actually cover the NMI tracing case?

As written the two tests appear to provide redundant coverage and leave the
intended NMI context untested.

For context, an earlier version used SEC("tp_btf/nmi_handler"), which was
x86-only; switching to SEC("tp_btf/task_newtask") fixed that portability
issue but no longer exercises any NMI context.  This was raised in review
on the list and appears unaddressed in the current version:

  https://lore.kernel.org/bpf/20260609202548.3571690-4-memxor@gmail.com

> +	struct __tasks_kfunc_map_value *local;
> +	struct task_struct *acquired, *old;
> +
> +	(void)clone_flags;
> +
> +	local = bpf_obj_new(typeof(*local));
> +	if (!local)
> +		return 0;
> +
> +	acquired = bpf_task_acquire(task);
> +	if (acquired) {
> +		old = bpf_kptr_xchg(&local->task, acquired);
> +		if (old)
> +			bpf_task_release(old);
> +	}
> +
> +	bpf_obj_drop(local);
> +	return 0;
> +}

AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: selftests/bpf: Exercise unsafe obj drops from tracing progs
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27234422852

kernel-patches-review-bot · 2026-06-09T21:03:22Z

commit 863c40dec4ed733033e59d026643bab30ee49170
Author: Justin Suess <utilityemal77@gmail.com>

    bpf: Cancel special fields on map value recycle

    Map update and delete paths currently call bpf_obj_free_fields() when a
    value is being replaced or recycled. That makes field destruction depend
    on the context of the update/delete operation. For tracing programs this
    can include NMI context, where referenced kptr destructors, uptr
    unpinning, and graph root destruction are not generally safe.

    Introduce bpf_obj_cancel_fields() for the reusable-value path. It only
    performs NMI-safe cleanup for timer, workqueue, and task_work fields.
    Fields that need full destruction are left attached to the recycled value
    and are destroyed by the final cleanup path instead.

    Switch array and hashtab update/delete/recycle paths to this cancel
    helper. Keep bpf_obj_free_fields() for final map destruction and for
    bpf_mem_alloc destructors. Preallocated hashtabs do not have allocator
    destructors, so teardown continues to walk the normal and extra elements
    and fully destroy their fields.

    This deliberately relaxes the eager-free semantics of map update/delete
    for special fields. Programs that relied on a recycled map slot becoming
    empty immediately after update/delete were relying on behavior that
    cannot be implemented safely from every BPF execution context without
    offloading arbitrary destructors.

    There is a chance this change breaks programs making assumptions
    regarding the eager freeing of fields. If so, we can relax semantics to
    cancellation only when irqs_disabled() is true in the future. However,
    theoretically, map values that get reused eagerly already have weaker
    guarantees as parallel users can recreate freed fields before the new
    element becomes visible again.

    Fixes: 14a324f6a67e ("bpf: Wire up freeing of referenced kptr")
    Signed-off-by: Justin Suess <utilityemal77@gmail.com>
    Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

In the v1 discussion on lore, there was a suggestion about adding a
WARN_ON_ONCE for in_nmi in bpf_obj_free_fields to help detect NMI-related
bugs early:

> Would a WARN_ON_ONCE for in_nmi in bpf_obj_free_fields help spot these kind
> of bugs in the future?
>
> Usually calling bpf_obj_free_fields in_nmi won't result in a deadlock unless
> specific conditions are met (ie last reference to a refcounted object, using
> a workqueue, etc).
>
> Adding a check here would quickly surface bugs of this class.

Reference: https://lore.kernel.org/bpf/aicB9SZfVnUV1rz-@zenbox

While there was discussion explaining rationale against this approach due to
panic-on-warn concerns and existing guardrails, the suggestion itself was not
implemented. Would it be worth revisiting this as a future enhancement, perhaps
with a rate-limited warning or a config-gated check to help catch this class of
bugs during development without impacting production systems?

AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf: Cancel special fields on map value recycle
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27234422852

kernel-patches-daemon-bpf · 2026-06-09T21:09:15Z

Forwarding comment 4663990194 via email
In-Reply-To: 20260609202548.3571690-4-memxor@gmail.com
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/20260609202548.3571690-4-memxor@gmail.com/

kernel-patches-daemon-bpf · 2026-06-09T21:09:17Z

Forwarding comment 4664002608 via email
In-Reply-To: 20260609202548.3571690-3-memxor@gmail.com
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/20260609202548.3571690-3-memxor@gmail.com/

bpf_obj_drop() runs bpf_obj_free_fields() synchronously for program-allocated objects. When such an object contains NMI unsafe fields, tracing programs that can run from arbitrary instrumented context can reach that destruction from unsafe contexts, including NMI. NMI is likely one instance of this problem, and other instances would include possible unsafe reentrancy. Deferring bpf_obj_drop() is not appealing either: it would add delayed-free machinery to a release operation that otherwise has straightforward synchronous ownership semantics. Reject bpf_obj_drop() and bpf_percpu_obj_drop() from tracing programs that may run from unsafe contexts unless every field in the object's BTF record is explicitly NMI safe. Do not reject sleepable BPF_PROG_TYPE_TRACING programs, since they are not the arbitrary/NMI contexts that motivate the restriction. Note that while bpf_rb_root and bpf_list_head would be NMI safe on their own to free, the objects recursively held by them may not be; be conservative and just mark them as not NMI safe for now. Use a whitelist for the NMI-safe field set instead of listing only known NMI unsafe fields. Locks, async fields, unreferenced kptrs, and refcounts are known to be NMI safe because their destruction is either a no-op, simple state reset, or async cancellation. Referenced kptrs, percpu referenced kptrs, uptrs, graph roots, graph nodes, and any future field type are rejected until audited for arbitrary tracing and NMI contexts. This is less susceptible to future changes in fields that were previously safe by exclusion, and to new fields being added without updating this check. Convert the existing recursive local-object drop success case to a syscall program in the same commit, since this verifier change makes the old tracing program form invalid. The test still exercises bpf_obj_drop() releasing a referenced task kptr from a safe program type. Fixes: ac9f060 ("bpf: Introduce bpf_obj_drop") Signed-off-by: Justin Suess <utilityemal77@gmail.com> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

kernel-patches-daemon-bpf · 2026-06-10T04:31:30Z

Upstream branch: 140fa23
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1108815
version: 3

Map update and delete paths currently call bpf_obj_free_fields() when a value is being replaced or recycled. That makes field destruction depend on the context of the update/delete operation. For tracing programs this can include NMI context, where referenced kptr destructors, uptr unpinning, and graph root destruction are not generally safe. Introduce bpf_obj_cancel_fields() for the reusable-value path. It only performs NMI-safe cleanup for timer, workqueue, and task_work fields. Fields that need full destruction are left attached to the recycled value and are destroyed by the final cleanup path instead. Switch array and hashtab update/delete/recycle paths to this cancel helper. Keep bpf_obj_free_fields() for final map destruction and for bpf_mem_alloc destructors. Preallocated hashtabs do not have allocator destructors, so teardown continues to walk the normal and extra elements and fully destroy their fields. This deliberately relaxes the eager-free semantics of map update/delete for special fields. Programs that relied on a recycled map slot becoming empty immediately after update/delete were relying on behavior that cannot be implemented safely from every BPF execution context without offloading arbitrary destructors. There is a chance this change breaks programs making assumptions regarding the eager freeing of fields. If so, we can relax semantics to cancellation only when irqs_disabled() is true in the future. However, theoretically, map values that get reused eagerly already have weaker guarantees as parallel users can recreate freed fields before the new element becomes visible again. Fixes: 14a324f ("bpf: Wire up freeing of referenced kptr") Signed-off-by: Justin Suess <utilityemal77@gmail.com> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Add task_kfunc failure cases for bpf_obj_drop() on local objects with referenced kptr fields from tracing and NMI tracing programs. These programs must be rejected because dropping the object would run full special-field destruction synchronously in an unsafe context. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

Add focused map_kptr coverage for BPF-side map updates that touch values containing referenced kptrs. The new syscall programs stash the testmod refcounted object in an array map, a preallocated hash map, and a no-prealloc hash map, then update the same map from BPF. The refcount must remain elevated after the update, while the userspace runner destroys the skeleton and reuses the existing refcount wait to confirm map teardown releases the kptr. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

kernel-patches-daemon-bpf Bot added new bpf-next V3 labels Jun 9, 2026

kernel-patches-review-bot Bot added the ai-review label Jun 9, 2026

kernel-patches-daemon-bpf Bot removed the ai-review label Jun 9, 2026

kernel-patches-daemon-bpf Bot added the V3-ci-fail label Jun 9, 2026

kernel-patches-daemon-bpf Bot force-pushed the bpf-next_base branch from 3a26044 to 818f7b1 Compare June 10, 2026 04:29

RazeLighter777 and others added 3 commits June 9, 2026 21:31

kernel-patches-daemon-bpf Bot force-pushed the series/1108815=>bpf-next branch from b73f487 to 0856da5 Compare June 10, 2026 04:31

kernel-patches-daemon-bpf Bot added V3-ci-pass and removed V3-ci-fail labels Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix kptr dtor deadlock#12417

Fix kptr dtor deadlock#12417
kernel-patches-daemon-bpf[bot] wants to merge 4 commits into
bpf-next_basefrom
series/1108815=>bpf-next

kernel-patches-daemon-bpf Bot commented Jun 9, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 9, 2026

Uh oh!

kernel-patches-review-bot Bot commented Jun 9, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 9, 2026

Uh oh!

kernel-patches-review-bot Bot commented Jun 9, 2026

Uh oh!

kernel-patches-review-bot Bot commented Jun 9, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 9, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 9, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kernel-patches-daemon-bpf Bot commented Jun 9, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 9, 2026

Uh oh!

kernel-patches-review-bot Bot commented Jun 9, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 9, 2026

Uh oh!

kernel-patches-review-bot Bot commented Jun 9, 2026

Uh oh!

kernel-patches-review-bot Bot commented Jun 9, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 9, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 9, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants