[#11726] refactor(core): harden Capability method calls with correct TCCL#11755
[#11726] refactor(core): harden Capability method calls with correct TCCL#11755JoegenUSTC wants to merge 2 commits into
Conversation
|
Thanks for the work on this! I dug into it a bit and want to share one concern — I could be missing something, so please correct me if I'm wrong. I think the proxy here mainly switches the thread context classloader (TCCL) through Also, What seems to actually trigger the error: the catalog cache I wrote a small standalone test to check this: So in the closed-loader case, wrapping the call in Based on this, my gentle suggestions:
What do you think? Happy to share the small repro code if it helps. |
Thanks @yuqi1129 — this is a thorough and well-evidenced analysis. Let me address each point and ask for your input on the path forward. On TCCL and The synthetic On the closed-classloader scenario — this is the relevant failure path here.
On the root fix — a note on complexity. The reference-counted deferred-close approach (only calling Revised scope of this PR (#11755) — would appreciate your assessment. Given your findings, I'd like to narrow this PR to what it actually delivers: a TCCL hardening for Would you consider this narrowed scope acceptable as a standalone improvement? Or would you prefer it be folded into the lifecycle fix in #11726? For the Please do share the repro code — it would be very valuable for #11726. Thank you again. |
Code Coverage Report
Files
|
|
The CI failure in :plugins:idp-basic:test is unrelated to this PR's changes. Root cause: test state leakage between repeated Gradle invocations with different testMode/jdbcBackend combinations — user1 created in a previous run is not cleaned up, causing a duplicate key conflict (HTTP 409) in the next run. Could a maintainer re-run the failed job, or confirm this is a known flaky test? |
There was a problem hiding this comment.
Pull request overview
This PR hardens catalog Capability usage by wrapping the Capability delegate returned from CatalogManager.CatalogWrapper#capabilities() in a JDK dynamic proxy that executes each method invocation inside the catalog’s IsolatedClassLoader.withClassLoader(...) scope, ensuring the correct TCCL is set for every Capability call.
Changes:
- Add a
CapabilityHelpers.getCapability(...)proxy wrapper that switches/restores TCCL per method invocation. - Expose the catalog wrapper’s
IsolatedClassLoadervia a newCatalogWrapper#classLoader()accessor to support proxying. - Add unit tests validating proxy creation and TCCL restoration/exception behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| core/src/main/java/org/apache/gravitino/catalog/CapabilityHelpers.java | Wrap returned Capability in a JDK proxy that runs each invocation within IsolatedClassLoader.withClassLoader(...). |
| core/src/main/java/org/apache/gravitino/catalog/CatalogManager.java | Add CatalogWrapper#classLoader() accessor to expose the isolated loader for proxy construction. |
| core/src/test/java/org/apache/gravitino/catalog/TestCapabilityHelpers.java | Add unit tests for proxy existence and TCCL switching/restoration semantics. |
|
I think the core issue is not just whether the TCCL is set correctly. The bigger boundary violation is that CatalogWrapper returns a Capability object created by the isolated classloader and lets callers invoke it outside the wrapper/classloader boundary. A safer fix would be to avoid letting plugin-loaded executable objects escape from CatalogWrapper. Instead of returning Capability, CatalogWrapper should expose a doWithCapability(...) method and execute all Capability method calls inside classLoader.withClassLoader(...). Callers should only receive safe values/results, not the plugin Capability instance itself. On the other hand, the format of this PR does not align with our style guidelines. Please update it accordingly. |
…JDK proxy Wrap the Capability delegate returned by CapabilityHelpers.getCapability() in a JDK dynamic proxy whose InvocationHandler executes each method call inside classLoader.withClassLoader(), ensuring the thread context classloader (TCCL) is always set correctly before any Capability method runs. Motivation: Some Capability implementations may use TCCL-dependent patterns such as ServiceLoader.load() or Class.forName(name) without an explicit classloader argument. Without this proxy, those calls would run with the server classloader as TCCL and fail to locate classes from the catalog plugin jars. Scope: This PR provides TCCL hardening for the above patterns only. The NoClassDefFoundError triggered by switch-on-enum synthetic classes is a separate closed-classloader lifecycle issue, tracked in apache#11726 and fixed at the catalog level in apache#11706. Changes: - CapabilityHelpers.getCapability() wraps delegate in JDK dynamic proxy; unwraps InvocationTargetException so callers see the original exception; returns delegate directly when classLoader is null (built-in catalogs). - CatalogManager.CatalogWrapper.classLoader() added (package-private) to expose the IsolatedClassLoader for proxy construction; placed after public methods per project member ordering convention. - TestCapabilityHelpers: proxy existence, TCCL switching/restoration, return value propagation, and exception TCCL restoration are all tested; @AfterEach closes IsolatedClassLoader to prevent resource leak. Fixes apache#11726
cad6324 to
1f08400
Compare
Thanks for the thorough review, @diqiu50. On the format: I've updated the PR description to follow the standard On the architectural concern: You're right — the idiomatic pattern in The proxy in this PR patches the symptom (any invocation on the escaped instance I see two paths:
I'm happy to go with option 1 if that's what you prefer — just want to align on One additional note from @yuqi1129's earlier analysis: the |
What changes were proposed in this pull request?
Wrap the
Capabilitydelegate returned byCapabilityHelpers.getCapability()in a JDK dynamic proxy whose
InvocationHandlerexecutes each method callinside
classLoader.withClassLoader(), ensuring the TCCL is always setcorrectly before any
Capabilitymethod runs.CapabilityHelpers.getCapability()now returns a classloader-aware proxy;falls back to the raw delegate when
classLoaderis null (built-in catalogs).InvocationTargetExceptionin the proxy so callers see the originalexception, consistent with the rest of the codebase.
CatalogWrapper.classLoader()(package-private) added to expose theIsolatedClassLoaderfor proxy construction; placed afterpublicmethodsper project member ordering convention.
Why are the changes needed?
CatalogWrapper.capabilities()returns aCapabilityobject loaded by thecatalog's
IsolatedClassLoader. This object is currently used outside anywithClassLoader()boundary (e.g., inTableNormalizeDispatcher,SchemaNormalizeDispatcher). AnyCapabilityimplementation that usesTCCL-dependent patterns — such as
ServiceLoader.load()orClass.forName(name)without an explicit classloader argument — would run with the server classloader
as TCCL and silently fail to locate classes from the catalog plugin jars.
This PR implements Expected Fix option 2 from #11726: proxy
Capabilitymethod calls through the isolated classloader automatically, so callers are
always within the correct classloader context regardless of where the invocation
occurs.
Note: this PR does not address the
NoClassDefFoundErrortriggered byswitch-on-enumsynthetic classes in existingCapabilityimplementations.That is a separate closed-classloader lifecycle issue handled by:
switch-on-enumwithif-elseinHiveCatalogCapability(workaround, already merged)
IsolatedClassLoader.close()until allCapabilityreferencesare released (root fix, tracked separately)
Related: #11726
Does this PR introduce any user-facing change?
No. This change is internal to the
coremodule. No public APIs orconfiguration properties are affected.
How was this patch tested?
Added
TestCapabilityHelperswith unit tests covering:testGetCapabilityReturnsProxy— returned object is a JDK dynamic proxytestProxyExecutesMethodWithinCatalogClassloader— TCCL is set to thecatalog's
IsolatedClassLoaderduring method executiontestProxyRestoresTcclAfterMethodCall— TCCL is restored after normal returntestProxyRestoresTcclAfterException— TCCL is restored even when thedelegate throws
testProxyPropagatesReturnValue— return value from delegate is unchanged