fix(spinlock): bound OptionalSharedLockGuard spin in signal handler paths#572
Open
jbachorik wants to merge 3 commits into
Open
fix(spinlock): bound OptionalSharedLockGuard spin in signal handler paths#572jbachorik wants to merge 3 commits into
jbachorik wants to merge 3 commits into
Conversation
Replace OptionalSharedLockGuard's unbounded tryLockShared spin with a CAS-retry budget (DEFAULT_SHARED_SPIN_BUDGET=256). Under heavy reader contention the guard now returns false after exhausting the budget rather than spinning indefinitely inside a signal handler. Adds debug-mode assertions to unlock()/unlockShared(), a new JVMTI_STACKS_DROPPED_REC_LOCK observability counter, and a gtest suite for the new behaviour. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
Contributor
CI Test ResultsRun: #26804864801 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-06-02 07:38:14 UTC |
Fixes TSAN data race: plain volatile reads in assert() raced with concurrent atomic CAS writes from other threads. Use __atomic_load_n (RELAXED) for assert invariant checks and __atomic_store_n for reset(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f3ad4d1b91
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Adds SAMPLES_DROPPED_REC_LOCK counter and increments it in the recordEvent() else branch, mirroring what recordEventDelegated() already does for JVMTI_STACKS_DROPPED_REC_LOCK. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?:
OptionalSharedLockGuardpreviously used an unboundedtryLockSharedspin loop that could stall indefinitely inside signal handlers under heavy reader CAS contention. This PR replaces the unbounded variant with a bounded one (default budget: 256 CAS-retry iterations) and makes it the sole implementation.Motivation:
Signal handlers must return promptly. Under heavy concurrent sampling (many threads recording events simultaneously), CAS contention on
_rec_lockcould cause signal handlers reachingFlightRecorder::recordEvent*to spin for an unbounded number of iterations. When the budget is exhausted, the sample is dropped and a newJVMTI_STACKS_DROPPED_REC_LOCKcounter is incremented for observability.Additional Notes:
OptionalSharedLockGuardnow takes an optionalmax_spinsparameter (defaultSpinLock::DEFAULT_SHARED_SPIN_BUDGET = 256). The budget bounds CAS-retry iterations, not wall-clock latency.unlock()andunlockShared()gain debug-modeasserts to catch mismatched-variant misuse as the class grows.OptionalSharedLockGuardusages inflightRecorder.cpppick up the bounded behaviour automatically — no call-site changes needed.BoundedOptionalSharedLockGuardwas an intermediate name used during development; it is not present in this PR.How to test the change?:
New gtest suite:
ddprof-lib/src/test/cpp/spinLock_ut.cppOptionalGuard_UncontendedAcquire— acquires and releases correctlyOptionalGuard_ExclusiveHeld_ImmediateReturn— exclusive lock causes immediate false return (no spin)OptionalGuard_SpinBudgetBound— tiny budget under reader CAS contention; guard returns without hangingOptionalGuard_BudgetEnforced_ExclusivePath— deterministic: exclusive lock always returns false for any budgetOptionalGuard_SharedReentrancy— multiple shared guards coexist; exclusive try failsTryLockShared_ExclusiveHeld_ReturnsFalse/TryLockShared_Free_ReturnsTrue— unbounded overload sanityRun with:
For Datadog employees: