Skip to content

fix(spinlock): bound OptionalSharedLockGuard spin in signal handler paths#572

Open
jbachorik wants to merge 3 commits into
mainfrom
jb/bounded-optional-shared-lock
Open

fix(spinlock): bound OptionalSharedLockGuard spin in signal handler paths#572
jbachorik wants to merge 3 commits into
mainfrom
jb/bounded-optional-shared-lock

Conversation

@jbachorik
Copy link
Copy Markdown
Collaborator

What does this PR do?:
OptionalSharedLockGuard previously used an unbounded tryLockShared spin loop that could stall indefinitely inside signal handlers under heavy reader CAS contention. This PR replaces the unbounded variant with a bounded one (default budget: 256 CAS-retry iterations) and makes it the sole implementation.

Motivation:
Signal handlers must return promptly. Under heavy concurrent sampling (many threads recording events simultaneously), CAS contention on _rec_lock could cause signal handlers reaching FlightRecorder::recordEvent* to spin for an unbounded number of iterations. When the budget is exhausted, the sample is dropped and a new JVMTI_STACKS_DROPPED_REC_LOCK counter is incremented for observability.

Additional Notes:

  • OptionalSharedLockGuard now takes an optional max_spins parameter (default SpinLock::DEFAULT_SHARED_SPIN_BUDGET = 256). The budget bounds CAS-retry iterations, not wall-clock latency.
  • unlock() and unlockShared() gain debug-mode asserts to catch mismatched-variant misuse as the class grows.
  • All 8 existing OptionalSharedLockGuard usages in flightRecorder.cpp pick up the bounded behaviour automatically — no call-site changes needed.
  • BoundedOptionalSharedLockGuard was an intermediate name used during development; it is not present in this PR.

How to test the change?:
New gtest suite: ddprof-lib/src/test/cpp/spinLock_ut.cpp

  • OptionalGuard_UncontendedAcquire — acquires and releases correctly
  • OptionalGuard_ExclusiveHeld_ImmediateReturn — exclusive lock causes immediate false return (no spin)
  • OptionalGuard_SpinBudgetBound — tiny budget under reader CAS contention; guard returns without hanging
  • OptionalGuard_BudgetEnforced_ExclusivePath — deterministic: exclusive lock always returns false for any budget
  • OptionalGuard_SharedReentrancy — multiple shared guards coexist; exclusive try fails
  • TryLockShared_ExclusiveHeld_ReturnsFalse / TryLockShared_Free_ReturnsTrue — unbounded overload sanity

Run with:

./gradlew :ddprof-lib:gtestDebug

For Datadog employees:

  • This PR doesn't touch any of that.

Replace OptionalSharedLockGuard's unbounded tryLockShared spin with a
CAS-retry budget (DEFAULT_SHARED_SPIN_BUDGET=256). Under heavy reader
contention the guard now returns false after exhausting the budget rather
than spinning indefinitely inside a signal handler. Adds debug-mode
assertions to unlock()/unlockShared(), a new JVMTI_STACKS_DROPPED_REC_LOCK
observability counter, and a gtest suite for the new behaviour.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@jbachorik jbachorik added the AI label Jun 2, 2026
@datadog-official

This comment has been minimized.

@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Jun 2, 2026

CI Test Results

Run: #26804864801 | Commit: 7244af5 | Duration: 14m 10s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-06-02 07:38:14 UTC

Fixes TSAN data race: plain volatile reads in assert() raced with
concurrent atomic CAS writes from other threads. Use __atomic_load_n
(RELAXED) for assert invariant checks and __atomic_store_n for reset().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik jbachorik marked this pull request as ready for review June 2, 2026 06:32
@jbachorik jbachorik requested a review from a team as a code owner June 2, 2026 06:32
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f3ad4d1b91

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ddprof-lib/src/main/cpp/spinLock.h
Adds SAMPLES_DROPPED_REC_LOCK counter and increments it in the
recordEvent() else branch, mirroring what recordEventDelegated()
already does for JVMTI_STACKS_DROPPED_REC_LOCK.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant