Skip to content

uprobes/x86: Fix red zone issue for optimized uprobes#12199

Open
kernel-patches-daemon-bpf[bot] wants to merge 14 commits into
bpf-next_basefrom
series/1101263=>bpf-next
Open

uprobes/x86: Fix red zone issue for optimized uprobes#12199
kernel-patches-daemon-bpf[bot] wants to merge 14 commits into
bpf-next_basefrom
series/1101263=>bpf-next

Conversation

@kernel-patches-daemon-bpf

Copy link
Copy Markdown

Pull request for series with
subject: uprobes/x86: Fix red zone issue for optimized uprobes
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1101263

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

Upstream branch: 8496d90
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1101263
version: 1

@kernel-patches-review-bot

Copy link
Copy Markdown
commit 5ad96df841a0877eeca7d2cbe4b3f726b7fd0b72
Author: Jiri Olsa <jolsa@kernel.org>

libbpf: Change has_nop_combo to work on top of nop10

This commit updates has_nop_combo() to check for the new nop10 instruction
sequence instead of the previous nop5 sequence, aligning with recent kernel
changes to uprobe optimization.

> commit 5ad96df841a0877eeca7d2cbe4b3f726b7fd0b72
> Author: Jiri Olsa <jolsa@kernel.org>
> Date:   Tue May 26 22:58:33 2026 +0200
>
>     libbpf: Change has_nop_combo to work on top of nop10
>
>     We now expect nop combo with 10 bytes nop instead of 5 bytes nop,
>     fixing has_nop_combo to reflect that.
>
>     Fixes: 41a5c7df4466 ("libbpf: Add support to detect nop,nop5 instructions combo for usdt probe")
>     Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
>     Acked-by: Andrii Nakryiko <andrii@kernel.org>
>     Signed-off-by: Jiri Olsa <jolsa@kernel.org>

The Fixes: tag references commit 41a5c7df4466 which originally added
has_nop_combo() with correct nop5 support. However, the actual bug was
introduced by commit af734a7ddb4e which changed the kernel's uprobe
implementation from nop5 to nop10. Should the tag be:

  Fixes: af734a7ddb4e ("uprobes/x86: Move optimized uprobe from nop5 to nop10")

> diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> index e3710933fd52..484a4354e82b 100644
> --- a/tools/lib/bpf/usdt.c
> +++ b/tools/lib/bpf/usdt.c
> @@ -305,7 +305,7 @@ struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
>
>  	/*
>  	 * Detect kernel support for uprobe() syscall, it's presence means we can
> -	 * take advantage of faster nop5 uprobe handling.
> +	 * take advantage of faster nop10 uprobe handling.
>  	 * Added in: 56101b69c919 ("uprobes/x86: Add uprobe syscall to speed up uprobe")
>  	 */

The comment at the end of this block references commit 56101b69c919 which
added the uprobe syscall. Andrii Nakryiko noted in v3 review that it would
be good to also reference the commit that switches nop5 to nop10, though he
acknowledged the hash isn't stable until it lands. Now that af734a7ddb4e is
available, could this be updated to reference both commits?

> @@ -596,14 +596,14 @@ static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note,
>  #if defined(__x86_64__)
>  static bool has_nop_combo(int fd, long off)
>  {
> -	unsigned char nop_combo[6] = {
> -		0x90, 0x0f, 0x1f, 0x44, 0x00, 0x00 /* nop,nop5 */
> +	unsigned char nop_combo[11] = {
> +		0x90, 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00,
>  	};
> -	unsigned char buf[6];
> +	unsigned char buf[11];
>
> -	if (pread(fd, buf, 6, off) != 6)
> +	if (pread(fd, buf, 11, off) != 11)
>  		return false;
> -	return memcmp(buf, nop_combo, 6) == 0;
> +	return memcmp(buf, nop_combo, 11) == 0;
>  }

This isn't a bug, but Jakub Sitnicki suggested in v1 review using
ARRAY_SIZE(buf) instead of repeating the scalar value 11. The author
preferred keeping the literal number for simplicity, though the value
now appears in 5 locations (array declarations, pread size, pread check,
memcmp size).

[ ... ]



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: libbpf: Change has_nop_combo to work on top of nop10
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

@kernel-patches-review-bot

Copy link
Copy Markdown
commit 31ca9a4034db01fad284fe0d3db79d8cc47e2ff8
Author: Jiri Olsa <jolsa@kernel.org>

selftests/bpf: Change uprobe syscall tests to use nop10

Optimized uprobes are now on top of 10-bytes nop instructions,
reflect that in existing tests.

> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> index c944136252c6..0868fb9793e0 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c

[ ... ]

> @@ -388,14 +388,15 @@ static int find_uprobes_trampoline(void *tramp_addr)
>  	return ret;
>  }
>
> -static unsigned char nop5[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 };
> +static unsigned char nop10[10]  = { 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 };
> +static unsigned char lea_rsp[5] = { 0x48, 0x8d, 0x64, 0x24, 0x80 };
>
> -static void *find_nop5(void *fn)
> +static void *find_nop10(void *fn)
>  {
>  	int i;
>
> -	for (i = 0; i < 10; i++) {
> -		if (!memcmp(nop5, fn + i, 5))
> +	for (i = 0; i < 128; i++) {
> +		if (!memcmp(nop10, fn + i, 10))
>  			return fn + i;
>  	}
>  	return NULL;

[ ... ]

> diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
> index 69759b27794d..fda3a298ccfc 100644
> --- a/tools/testing/selftests/bpf/prog_tests/usdt.c
> +++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
> @@ -252,7 +252,7 @@ extern void usdt_1(void);
>  extern void usdt_2(void);
>
>  static unsigned char nop1[1] = { 0x90 };
> -static unsigned char nop1_nop5_combo[6] = { 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x00 };
> +static unsigned char nop1_nop10_combo[11] = { 0x90, 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 };
>
>  static void *find_instr(void *fn, unsigned char *instr, size_t cnt)
>  {

Is the loop in find_instr() adequate to find the updated instruction
sequence? In uprobe_syscall.c, find_nop10() was updated to search up
to 128 bytes to account for compiler-generated prologues:

tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c:find_nop10() {
	for (i = 0; i < 128; i++) {
		if (!memcmp(nop10, fn + i, 10))
			return fn + i;
	}
}

But find_instr() in usdt.c only searches the first 10 bytes:

tools/testing/selftests/bpf/prog_tests/usdt.c:find_instr() {
	for (i = 0; i < 10; i++) {
		if (!memcmp(instr, fn + i, cnt))
			return fn + i;
	}
}

If a modern compiler generates a prologue longer than 9 bytes for
usdt_2(), find_instr() will prematurely terminate and cause
subtest_optimized_attach() to fail. This concern was raised by
reviewers in v2 and v3 of the patch series:

  https://lore.kernel.org/bpf/20260518111604.CC731C2BCB7@smtp.kernel.org/

The author acknowledged this with "yea find_instr needs same update,
will fix" but the loop bound remains 10 in the current v4 patch.

> @@ -271,17 +271,17 @@ static void subtest_optimized_attach(void)
>  	__u8 *addr_1, *addr_2;
>
>  	/* usdt_1 USDT probe has single nop instruction */
> -	addr_1 = find_instr(usdt_1, nop1_nop5_combo, 6);
> -	if (!ASSERT_NULL(addr_1, "usdt_1_find_nop1_nop5_combo"))
> +	addr_1 = find_instr(usdt_1, nop1_nop10_combo, 11);
> +	if (!ASSERT_NULL(addr_1, "usdt_1_find_nop1_nop10_combo"))
>  		return;
>
>  	addr_1 = find_instr(usdt_1, nop1, 1);
>  	if (!ASSERT_OK_PTR(addr_1, "usdt_1_find_nop1"))
>  		return;
>
> -	/* usdt_2 USDT probe has nop,nop5 instructions combo */
> -	addr_2 = find_instr(usdt_2, nop1_nop5_combo, 6);
> -	if (!ASSERT_OK_PTR(addr_2, "usdt_2_find_nop1_nop5_combo"))
> +	/* usdt_2 USDT probe has nop,nop10 instructions combo */
> +	addr_2 = find_instr(usdt_2, nop1_nop10_combo, 11);
> +	if (!ASSERT_OK_PTR(addr_2, "usdt_2_find_nop1_nop10_combo"))
>  		return;

[ ... ]



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: selftests/bpf: Change uprobe syscall tests to use nop10
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

@kernel-patches-review-bot

Copy link
Copy Markdown
commit 9f26ca33d7f4cf484820c4fc422b2b9de66018b4
Author: Jiri Olsa <jolsa@kernel.org>

uprobes/x86: Allow to copy uprobe trampolines on fork

This commit allows uprobe trampoline VMA objects to be copied during fork
or clone operations without CLONE_VM, preventing crashes when the child
process inherits optimized code that calls the trampoline.

> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 733655bc610e..c8af41ed681a 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -697,7 +697,7 @@ static struct vm_area_struct *get_uprobe_trampoline(struct mm_struct *mm, unsign
>  		return ERR_PTR(vaddr);
>
>  	return _install_special_mapping(mm, vaddr, PAGE_SIZE,
> -				VM_READ|VM_EXEC|VM_MAYEXEC|VM_MAYREAD|VM_DONTCOPY|VM_IO,
> +				VM_READ|VM_EXEC|VM_MAYEXEC|VM_MAYREAD|VM_IO,
>  				&tramp_mapping);
>  }

When a process with optimized uprobes is forked, the child inherits the
optimized code and trampoline. This means sys_uprobe invocations in the
child will go into the uprobe trampoline, and the syscall will keep
returning -EPROTO since no handler is registered.

Is there a way to restore the original memory pages for clones?

This behavior seems suboptimal from a performance perspective. For
standard int3-based uprobes, breakpoints are removed via
UPROBE_HANDLER_REMOVE. Could something similar be implemented for
optimized uprobes, perhaps with proper filtering in uprobe_multi?

Reference: https://lore.kernel.org/bpf/CAEf4BzYo-8PAXFJt9MHoUn9ux1O2YVxJADC0tGSsacVu_R8Stw@mail.gmail.com/



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: uprobes/x86: Allow to copy uprobe trampolines on fork
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

@kernel-patches-review-bot

Copy link
Copy Markdown
commit b133a280c60a5a731aa19e1b9f99f7f9e9d95afe
Author: Andrii Nakryiko <andrii@kernel.org>

selftests/bpf: Add tests for uprobe nop10 red zone clobbering

This commit adds tests to verify that the nop10 uprobe optimization does
not clobber the red zone. The tests include an assembly-level test that
stores values in the red zone and verifies they survive, and a USDT test
that validates arguments passed through the red zone.

> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> index 969f4deba9fd..efff0c515184 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c

[ ... ]

> @@ -855,6 +897,37 @@ static void test_uprobe_race(void)
>  #define __NR_uprobe 336
>  #endif
>
> +static void test_uprobe_red_zone(void)
> +{
> +	struct uprobe_syscall_executed *skel;
> +	struct bpf_link *link;
> +	void *nop10_addr;
> +	size_t offset;
> +	int i;
> +
> +	nop10_addr = find_nop10(uprobe_red_zone_test);
> +	if (!ASSERT_NEQ(nop10_addr, NULL, "find_nop10"))
> +		return;
> +
> +	skel = uprobe_syscall_executed__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "open_and_load"))
> +		return;
> +
> +	offset = get_uprobe_offset(nop10_addr);
> +	link = bpf_program__attach_uprobe_opts(skel->progs.test_uprobe,
> +			0, "/proc/self/exe", offset, NULL);
> +	if (!ASSERT_OK_PTR(link, "attach_uprobe"))
> +		goto cleanup;
> +
> +	for (i = 0; i < 10; i++)
> +		ASSERT_EQ(uprobe_red_zone_test(), 0, "red_zone_intact");

Does this loop execute too quickly to guarantee the uprobe optimization has
completed?

Uprobe optimization is performed asynchronously by a worker thread. Since this
is a tight loop with no delays or blocking, the 10 iterations might finish
before the background worker patches the code.

If all hits execute on the unoptimized INT3 path, the test could falsely pass
on kernels where the CALL-based optimization bug is present.

(This concern was raised by sashiko-bot@kernel.org in v1 and remains
unaddressed:
https://lore.kernel.org/bpf/20260514212219.BF1F3C2BCB3@smtp.kernel.org/)

> +
> +	bpf_link__destroy(link);
> +
> +cleanup:
> +	uprobe_syscall_executed__destroy(skel);
> +}

[ ... ]

> diff --git a/tools/testing/selftests/bpf/usdt_2.c b/tools/testing/selftests/bpf/usdt_2.c
> index b359b389f6c0..5e38f8605b02 100644
> --- a/tools/testing/selftests/bpf/usdt_2.c
> +++ b/tools/testing/selftests/bpf/usdt_2.c
> @@ -13,4 +13,17 @@ void usdt_2(void)
>  	USDT(optimized_attach, usdt_2);
>  }
>
> +static volatile unsigned long usdt_red_zone_arg1 = 0xDEADBEEF;
> +static volatile unsigned long usdt_red_zone_arg2 = 0xCAFEBABE;
> +static volatile unsigned long usdt_red_zone_arg3 = 0xFEEDFACE;
> +
> +void __attribute__((noinline)) usdt_red_zone_trigger(void)
> +{
> +	unsigned long a1 = usdt_red_zone_arg1;
> +	unsigned long a2 = usdt_red_zone_arg2;
> +	unsigned long a3 = usdt_red_zone_arg3;
> +
> +	USDT(optimized_attach, usdt_red_zone, a1, a2, a3);
> +}

Is this test reliable across different compiler versions and optimizations?

The USDT macro uses the "nor" inline assembly constraint, which allows the
compiler to allocate these operands to CPU registers.

Because there is no register pressure in usdt_red_zone_trigger(), modern
compilers might allocate a1, a2, and a3 directly to registers rather than the
stack's red zone.

If the operands are placed in registers, the uprobe CALL optimization
clobbering [rsp-8] will not corrupt them. This could cause the test to pass
even on buggy kernels, creating a false positive.

Would it be safer to explicitly force these operands into the red zone using
inline assembly constraints, rather than depending on the compiler's register
allocator?

(This concern was raised by bot+bpf-ci@kernel.org in v1 and v3 and remains
unaddressed:
https://lore.kernel.org/bpf/e91d31e0c1378fe970db6385c7af419e97af0c2acd37b3b8775a2fac02ea2361@mail.kernel.org/)



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: selftests/bpf: Add tests for uprobe nop10 red zone clobbering
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

@kernel-patches-review-bot

Copy link
Copy Markdown
commit a9fe6f54968c7539b76a7761ea0bf75190e24e37
Author: Jiri Olsa <jolsa@kernel.org>

uprobes/x86: Remove struct uprobe_trampoline object

This commit removes the uprobe_trampoline object and its tracking code,
replacing it with direct management via vm_area_struct objects. This
simplifies the code and allows easier propagation of the trampoline vma
into child processes.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>

> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 2be6707e3320..733655bc610e 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c

[ ... ]

> @@ -1111,21 +1051,18 @@ int set_orig_insn(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
>  static int __arch_uprobe_optimize(struct arch_uprobe *auprobe, struct mm_struct *mm,
>  				  unsigned long vaddr)
>  {
> -	struct uprobe_trampoline *tramp;
> -	struct vm_area_struct *vma;
> -	bool new = false;
> -	int err = 0;
> +	struct pt_regs *regs = task_pt_regs(current);
> +	struct vm_area_struct *vma, *tramp;
>
> +	if (!user_64bit_mode(regs))
> +		return -EINVAL;
>  	vma = find_vma(mm, vaddr);
>  	if (!vma)
>  		return -EINVAL;
> -	tramp = get_uprobe_trampoline(vaddr, &new);
> -	if (!tramp)
> -		return -EINVAL;
> -	err = swbp_optimize(auprobe, vma, vaddr, tramp->vaddr);
> -	if (WARN_ON_ONCE(err) && new)
> -		destroy_uprobe_trampoline(tramp);
> -	return err;
> +	tramp = get_uprobe_trampoline(mm, vaddr);
> +	if (IS_ERR(tramp))
> +		return PTR_ERR(tramp);
> +	return WARN_ON_ONCE(swbp_optimize(auprobe, vma, vaddr, tramp->vm_start));
>  }

When swbp_optimize() fails in __arch_uprobe_optimize() and the preceding
get_uprobe_trampoline() call created a new trampoline VMA via
_install_special_mapping(), does the new VMA leak in the process address
space?

The pre-patch code tracked whether the trampoline was newly created and
called destroy_uprobe_trampoline() on the failure path:

  err = swbp_optimize(auprobe, vma, vaddr, tramp->vaddr);
  if (WARN_ON_ONCE(err) && new)
      destroy_uprobe_trampoline(tramp);
  return err;

The new code drops that cleanup entirely:

  return WARN_ON_ONCE(swbp_optimize(auprobe, vma, vaddr, tramp->vm_start));

Although the old destroy_uprobe_trampoline only freed the struct (not the
underlying VMA), the new code appears to introduce a VMA leak: the freshly
mapped PAGE_SIZE special mapping in the user's address space stays mapped
even though optimization failed. arch_uprobe_optimize() then sets
ARCH_UPROBE_FLAG_OPTIMIZE_FAIL so subsequent calls won't retry, leaving the
orphan trampoline mapping in the address space until exit_mmap() reaps it at
process teardown.

The commit message mentions: "Note the original code called
destroy_uprobe_trampoline if the optimiation failed, but it only freed the
struct uprobe_trampoline object, not the vma. The new vma leak is fixed in
following change."

Is the VMA leak addressed in the subsequent commit in this series?

A secondary behaviour change is that 'return WARN_ON_ONCE(swbp_optimize(...))'
now returns the boolean truth value of the error (0 or 1) instead of the
original errno. While the current caller (arch_uprobe_optimize) only treats
the value as boolean, could this surprise a future caller that propagates the
return code?

Commit message:

>     uprobes/x86: Remove struct uprobe_trampoline object
>
>     Removing struct uprobe_trampoline object and it's tracking code,
>     because it's not needed. We can do same thing directly on top of
>     struct vm_area_struct objects.
>
>     This makes the code simpler and allows easy propagation of the
>     trampoline vma object into child process in following change.
>
>     Note the original code called destroy_uprobe_trampoline if the
>     optimiation failed, but it only freed the struct uprobe_trampoline
>     object, not the vma. The new vma leak is fixed in following change.
>
>     Acked-by: Andrii Nakryiko <andrii@kernel.org>
>     Signed-off-by: Jiri Olsa <jolsa@kernel.org>

CI bot suggested adding a Fixes tag in v3: 'Fixes: ba2bfc97b462
("uprobes/x86: Add support to optimize uprobes")' because the commit message
notes a VMA resource leak in the original code.

Reference: https://lore.kernel.org/bpf/8861cc7af8e0dfff2674ca4ae6381ea2d82c242dec115774e8e11048bde358c4@mail.kernel.org/

The author responded that 'the logic stays' and a separate fix is needed, but
did not explain why the Fixes tag should not be included. If this patch is
documenting a pre-existing bug (even if the full fix comes later), should it
include a Fixes tag for backporting purposes?



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: uprobes/x86: Remove struct uprobe_trampoline object
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

@kernel-patches-review-bot

Copy link
Copy Markdown
commit 4d4e8c1fc6fd397bb886030b1c28cc59e8bb91bc
Author: Jiri Olsa <jolsa@kernel.org>

libbpf: Detect uprobe syscall with new error

In the previous optimized uprobe fix we changed the syscall error used
for its detection from ENXIO to EPROTO. Changing related
probe_uprobe_syscall detection check.

> libbpf: Detect uprobe syscall with new error
>
> In the previous optimized uprobe fix we changed the syscall
> error used for its detection from ENXIO to EPROTO.
>
> Changing related probe_uprobe_syscall detection check.
>
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> Fixes: 05738da0efa1 ("libbpf: Add uprobe syscall feature detection")
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

The commit message states "In the previous optimized uprobe fix we
changed the syscall error used for its detection from ENXIO to EPROTO."

The Fixes tag points to commit 05738da0efa1 ("libbpf: Add uprobe syscall
feature detection"), but that commit originally added the ENXIO check
correctly. The bug was actually introduced by commit af734a7ddb4e
("uprobes/x86: Move optimized uprobe from nop5 to nop10"), which changed
the kernel to return EPROTO instead of ENXIO but did not update the
libbpf detection code to match.

Should the Fixes tag be:

  Fixes: af734a7ddb4e ("uprobes/x86: Move optimized uprobe from nop5 to nop10")



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: libbpf: Detect uprobe syscall with new error
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

@kernel-patches-review-bot

Copy link
Copy Markdown
commit 4f3aeeef678a649d7887c5e71ad5ad751ce9d264
Author: Jiri Olsa <jolsa@kernel.org>

selftests/bpf: Emit nop,nop10 instructions combo for x86_64 arch

Syncing latest usdt.h change with the new nop10 optimization support
in the kernel. The change updates the USDT_NOP macro on x86_64 to emit
nop,nop10 (11 bytes) instead of nop,nop5 (6 bytes).

> diff --git a/tools/testing/selftests/bpf/usdt.h b/tools/testing/selftests/bpf/usdt.h
> index c71e21df38b3..75687f50f4e2 100644
> --- a/tools/testing/selftests/bpf/usdt.h
> +++ b/tools/testing/selftests/bpf/usdt.h
> @@ -313,7 +313,7 @@ struct usdt_sema { volatile unsigned short active; };
>  #if defined(__ia64__) || defined(__s390__) || defined(__s390x__)
>  #define USDT_NOP			nop 0
>  #elif defined(__x86_64__)
> -#define USDT_NOP                       .byte 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x0 /* nop, nop5 */
> +#define USDT_NOP                       .byte 0x90, 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 /* nop, nop10 */
>  #else
>  #define USDT_NOP			nop
>  #endif

Does this change break the USDT selftest?  Multiple reviewers raised
concerns across v1 through v4 that subtest_optimized_attach() in
prog_tests/usdt.c searches for the old 6-byte nop1_nop5_combo sequence.
When usdt_2.c emits the new 11-byte nop,nop10 instead, find_instr() will
return NULL and fail the ASSERT_OK_PTR check.

You mentioned in v1/v2 that the test would be updated in a following commit,
preferring to keep changes separated for easier review.  However, this creates
a git bisect regression where the test breaks at this commit. Should the test
update be squashed in to avoid breaking bisectability?

The commit message contains a placeholder reference:

>     Syncing latest usdt.h change [1].
>
>     Now that we have nop10 optimization support in kernel, let's emit
>     nop,nop10 for usdt probe. We leave it up to the library to use
>     desirable nop instruction.
>
>     [1] TBD

This isn't a bug, but the "[1] TBD" should either be filled in or removed.



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: selftests/bpf: Emit nop,nop10 instructions combo for x86_64 arch
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

@kernel-patches-review-bot

Copy link
Copy Markdown
commit 68883203f0583fb1a8d79536f5c8945542974abd
Author: Jiri Olsa <jolsa@kernel.org>

uprobes/x86: Unmap trampoline vma object in case it's unused

This commit fixes a VMA leak when uprobe optimization fails by
unmapping newly created trampoline mappings.

> commit 68883203f0583fb1a8d79536f5c8945542974abd
> Author: Jiri Olsa <jolsa@kernel.org>
> Date:   Tue May 26 22:58:31 2026 +0200
>
>     uprobes/x86: Unmap trampoline vma object in case it's unused
>
>     In case the optimization fails, we leak new-ly created trampoline
>     vma mapping (in case we just created it), let's unmap it.
>
>     Fixes: ba2bfc97b462 ("uprobes/x86: Add support to optimize uprobes")
>     Signed-off-by: Jiri Olsa <jolsa@kernel.org>

Should this Fixes: tag point to a9fe6f54968c instead? The vma leak bug
was actually introduced by commit a9fe6f54968c ("uprobes/x86: Remove
struct uprobe_trampoline object"), not ba2bfc97b462. Commit a9fe6f54968c
removed the destroy_uprobe_trampoline() cleanup call, and its commit
message explicitly states: "The new vma leak is fixed in following
change."

Suggested: Fixes: a9fe6f54968c ("uprobes/x86: Remove struct
uprobe_trampoline object")

> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index c8af41ed681a..af5af7d67999 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -677,11 +677,14 @@ static unsigned long find_nearest_trampoline(unsigned long vaddr)
>  	return high_tramp;
>  }
>
> -static struct vm_area_struct *get_uprobe_trampoline(struct mm_struct *mm, unsigned long vaddr)
> +static struct vm_area_struct *get_uprobe_trampoline(struct mm_struct *mm, unsigned long vaddr,
> +						    bool *new_mapping)
>  {
>  	VMA_ITERATOR(vmi, mm, 0);
>  	struct vm_area_struct *vma;
>
> +	*new_mapping = false;
> +
>  	if (vaddr > TASK_SIZE || vaddr < PAGE_SIZE)
>  		return ERR_PTR(-EINVAL);
>

Does this fix address the VMA leak scenario across fork chains? In v2,
sashiko-bot raised a concern about inherited-but-not-tracked VMAs:

https://lore.kernel.org/bpf/20260518114208.CE128C2BCF6@smtp.kernel.org/

When VM_DONTCOPY is removed, child processes inherit trampoline VMAs
but their tracking list (mm->uprobes_state.head_tramps) is empty. When
the child executes a uprobe, get_uprobe_trampoline() fails to find the
inherited VMA and creates a duplicate trampoline.

Child fork:
  dup_mmap()
    copies trampoline VMA to child
  mm_init_uprobes_state()
    initializes head_tramps as empty (parent's state not copied)

Child executes uprobe:
  arch_uprobe_optimize()
    get_uprobe_trampoline()
      head_tramps is empty, fails to find inherited VMA
      create_uprobe_trampoline()
        find_nearest_trampoline() finds new unmapped area
        installs second duplicate trampoline VMA

In a deep fork chain, each generation inherits all previous VMAs, has an
empty head_tramps list, and creates a new VMA. Can this cause linear
accumulation of redundant VMAs, eventually exhausting vm.max_map_count?

While this patch addresses unmapping failed optimization attempts, it
does not prevent the inherited-but-not-tracked VMA leak scenario in fork
chains.

> @@ -696,6 +699,7 @@ static struct vm_area_struct *get_uprobe_trampoline(struct mm_struct *mm, unsign
>  	if (IS_ERR_VALUE(vaddr))
>  		return ERR_PTR(vaddr);
>
> +	*new_mapping = true;
>  	return _install_special_mapping(mm, vaddr, PAGE_SIZE,
>  				VM_READ|VM_EXEC|VM_MAYEXEC|VM_MAYREAD|VM_IO,
>  				&tramp_mapping);

[ ... ]



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: uprobes/x86: Unmap trampoline vma object in case it's unused
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

Upstream branch: 8496d90
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1101263
version: 1

@kernel-patches-daemon-bpf kernel-patches-daemon-bpf Bot force-pushed the series/1101263=>bpf-next branch from 5445400 to b14ffb3 Compare May 27, 2026 10:46
@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

Upstream branch: 8496d90
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1101263
version: 1

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

Upstream branch: e42e53a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1101263
version: 1

@kernel-patches-daemon-bpf kernel-patches-daemon-bpf Bot force-pushed the series/1101263=>bpf-next branch from cc27543 to 3404448 Compare May 27, 2026 21:10
@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

Upstream branch: c49f336
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1101263
version: 1

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

Upstream branch: 1444ee8
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1101263
version: 1

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

Upstream branch: 63a6f3b
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1101263
version: 1

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

Upstream branch: 50dff00
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1101263
version: 1

@kernel-patches-daemon-bpf kernel-patches-daemon-bpf Bot force-pushed the series/1101263=>bpf-next branch from 47226ae to 4d849ac Compare June 8, 2026 12:05
@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

Upstream branch: b9452b5
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1101263
version: 1

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

Upstream branch: dd0f968
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1101263
version: 1

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

Upstream branch: f1a660b
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1101263
version: 1

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

Upstream branch: 68f4e48
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1101263
version: 1

Kernel Patches Daemon and others added 12 commits June 9, 2026 12:50
In the unregister path we use __in_uprobe_trampoline check with
current->mm for the VMA lookup, which is wrong, because we are
in the tracer context, not the traced process.

Add mm_struct pointer argument to __in_uprobe_trampoline and
changing related callers to pass proper mm_struct pointer.

Fixes: ba2bfc9 ("uprobes/x86: Add support to optimize uprobes")
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Removing struct uprobe_trampoline object and it's tracking code,
because it's not needed. We can do same thing directly on top of
struct vm_area_struct objects.

This makes the code simpler and allows easy propagation of the
trampoline vma object into child process in following change.

Note the original code called destroy_uprobe_trampoline if the
optimiation failed, but it only freed the struct uprobe_trampoline
object, not the vma. The new vma leak is fixed in following change.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
When we do fork or clone without CLONE_VM the new process won't
have uprobe trampoline vma objects and at the same time it will
have optimized code calling that trampoline and crash.

Fixing this by allowing vma uprobe trampoline objects to be copied
on fork to the new process.

Fixes: ba2bfc9 ("uprobes/x86: Add support to optimize uprobes")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
In case the optimization fails, we leak new-ly created trampoline
vma mapping (in case we just created it), let's unmap it.

Fixes: ba2bfc9 ("uprobes/x86: Add support to optimize uprobes")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Andrii reported an issue with optimized uprobes [1] that can clobber
redzone area with call instruction storing return address on stack
where user code may keep temporary data without adjusting rsp.

Fixing this by moving the optimized uprobes on top of 10-bytes nop
instruction, so we can squeeze another instruction to escape the
redzone area before doing the call, like:

  lea -0x80(%rsp), %rsp
  call tramp

Note the lea instruction is used to adjust the rsp register without
changing the flags.

We use nop10 and following transformation to optimized instructions
above and back as suggested by Peterz [2].

Optimize path (int3_update_optimize):

  1) Initial state after set_swbp() installed the uprobe:
      cc 2e 0f 1f 84 00 00 00 00 00

     From offset 0 this is INT3 followed by the tail of the original
     10-byte NOP.

     After a previous unoptimization bytes 5..9 may still contain the
     old call instruction, which remains valid for threads already there.

  2) Rewrite the LEA tail and call displacement:
      cc [8d 64 24 80 e8 d0 d1 d2 d3]

     From offset 0 this traps on the uprobe INT3.  Bytes 1..9 are not
     executable entry points while byte 0 is trapped.

  3) Publish the first LEA byte:
      [48] 8d 64 24 80 e8 d0 d1 d2 d3

     From offset 0 this is:
        lea -0x80(%rsp), %rsp
        call <uprobe-trampoline>

Unoptimize path (int3_update_unoptimize):

  1) Initial optimized state:
      48 8d 64 24 80 e8 d0 d1 d2 d3
     Same as 3) above.

  2) Trap new entries before restoring the NOP bytes:
      [cc] 8d 64 24 80 e8 d0 d1 d2 d3

     From offset 0 this traps. A thread that had already executed the
     LEA can still reach the intact CALL at offset 5.

  3) Restore bytes 1..4 of the original NOP while keeping byte 0 trapped
     and byte 5 as CALL.
      cc [2e 0f 1f 84] e8 d0 d1 d2 d3

     From offset 0 this still traps. Offset 5 is still the CALL for any
     thread that was already past the first LEA byte.

  4) Publish the first byte of the original NOP:
      [66] 2e 0f 1f 84 e8 d0 d1 d2 d3

     From offset 0 this is the restored 10-byte NOP; the CALL opcode and
     displacement are now only NOP operands.  Offset 5 still decodes as
     CALL for a thread that was already there.

     Tthere is only a single target uprobe-trampoline for the given nop10
     instruction address, so the CALL instruction will not be changed across
     unoptimization/optimization cycles.
     Therefore, any task that is preempted at the CALL instruction is guaranteed
     to observe that CALL and not anything else.

Note as explained in [2] we need to use following nop10:
       PF1   PF2   ESC   NOPL  MOD   SIB   DISP32
NOP10: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 -- cs nopw 0x00000000(%rax,%rax,1)

which means we need to allow 0x2e prefix which maps to INAT_PFX_CS
attribute in is_prefix_bad function.

Also changing the uprobe syscall error when called out of uprobe
trampoline to -EPROTO, so we are able to detect the fixed kernel.

The optimized uprobe performance stays the same:

        uprobe-nop     :    3.129 ± 0.013M/s
        uprobe-push    :    3.045 ± 0.006M/s
        uprobe-ret     :    1.095 ± 0.004M/s
  -->   uprobe-nop10   :    7.170 ± 0.020M/s
        uretprobe-nop  :    2.143 ± 0.021M/s
        uretprobe-push :    2.090 ± 0.000M/s
        uretprobe-ret  :    0.942 ± 0.000M/s
  -->   uretprobe-nop10:    3.381 ± 0.003M/s
        usdt-nop       :    3.245 ± 0.004M/s
  -->   usdt-nop10     :    7.256 ± 0.023M/s

[1] https://lore.kernel.org/bpf/20260509003146.976844-1-andrii@kernel.org/
[2] https://lore.kernel.org/bpf/20260518104306.GU3102624@noisy.programming.kicks-ass.net/#t
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Closes: https://lore.kernel.org/bpf/20260509003146.976844-1-andrii@kernel.org/
Fixes: ba2bfc9 ("uprobes/x86: Add support to optimize uprobes")
Assisted-by: Codex:GPT-5.5
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
We now expect nop combo with 10 bytes nop instead of 5 bytes nop,
fixing has_nop_combo to reflect that.

Fixes: 41a5c7d ("libbpf: Add support to detect nop,nop5 instructions combo for usdt probe")
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
In the previous optimized uprobe fix we changed the syscall
error used for its detection from ENXIO to EPROTO.

Changing related probe_uprobe_syscall detection check.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Fixes: 05738da ("libbpf: Add uprobe syscall feature detection")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Syncing latest usdt.h change [1].

Now that we have nop10 optimization support in kernel, let's emit
nop,nop10 for usdt probe. We leave it up to the library to use
desirable nop instruction.

[1] TBD
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Optimized uprobes are now on top of 10-bytes nop instructions,
reflect that in existing tests.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Changing uprobe/usdt trigger bench code to use nop10 instead
of nop5. Also changing run_bench_uprobes.sh to use nop10 triggers.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Adding reattach tests for uprobe syscall tests to make sure
we can re-attach and optimize same uprobe multiple times.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

Upstream branch: c15261b
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1101263
version: 1

anakryiko and others added 2 commits June 9, 2026 12:53
The uprobe nop5 optimization used to replace a 5-byte NOP with a 5-byte
CALL to a trampoline. The CALL pushes a return address onto the stack at
[rsp-8], clobbering whatever was stored there.

On x86-64, the red zone is the 128 bytes below rsp that user code may use
for temporary storage without adjusting rsp. Compilers can place USDT
argument operands there, generating specs like "8@-8(%rbp)" when rbp ==
rsp. With the CALL-based optimization, the return address overwrites that
argument before the BPF-side USDT argument fetch runs.

Add two tests for this case. The uprobe_syscall subtest stores known values
at -8(%rsp), -16(%rsp), and -24(%rsp), executes an optimized nop10 uprobe,
and verifies the red-zone data is still intact. The USDT subtest triggers a
probe in a function where the compiler places three USDT operands in the
red zone and verifies that all 10 optimized invocations deliver the expected
argument values to BPF.

On an unfixed kernel, the first hit goes through the INT3 path and later
hits use the optimized CALL path, so the red-zone checks fail after
optimization.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
[ updates to use nop10 ]
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Adding tests for forked/cloned optimized uprobes and make
sure the child can properly execute optimized probe for
both fork (dups mm) and clone with CLONE_VM.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants