[Fix][Connector-V2] Fix Arrow memory leak by correcting close order in ArrowToSeatunnelRowReader#10958
Open
davidzollo wants to merge 1 commit into
Open
[Fix][Connector-V2] Fix Arrow memory leak by correcting close order in ArrowToSeatunnelRowReader#10958davidzollo wants to merge 1 commit into
davidzollo wants to merge 1 commit into
Conversation
…n ArrowToSeatunnelRowReader Close arrowStreamReader before rootAllocator to prevent memory leak. ArrowStreamReader internally closes VectorSchemaRoot and releases all Arrow buffer allocations back to the allocator. The previous order closed rootAllocator while arrowStreamReader still held unreleased buffers, causing IllegalStateException: Memory was leaked by query (issue apache#9863). Fix: apache#9863
DanielLeens
reviewed
May 26, 2026
Contributor
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for the contribution. I reviewed the full diff on the current head and retraced the close path instead of only reading the PR description.
What this PR fixes
- User pain: the Arrow reader can throw a leaked-memory exception during shutdown even after the read path itself succeeds.
- Fix approach: close
ArrowStreamReaderbefore closing the root allocator. - One-line summary: this fixes the real Arrow resource-release order on the normal close path.
Runtime chain I checked
Arrow source read path
-> ArrowStreamReader owns VectorSchemaRoot buffers
-> ArrowToSeatunnelRowReader.close()
-> ArrowStreamReader.close()
-> VectorSchemaRoot buffers released back to allocator
-> RootAllocator.close()
Findings
- The normal close path definitely hits
ArrowToSeatunnelRowReader.close(). - The new order at
ArrowToSeatunnelRowReader.java:320-334is the correct one for Arrow ownership semantics. - I do not see a reopened source-level blocker on this revision.
Merge conclusion
Conclusion: can merge after fixes
- Blocking items
- No code blocker from my side.
- Please let the latest GitHub Build finish green before merging.
- Suggested non-blocking follow-up
- None from my side on the source path.
Overall, this is a focused and correct fix for the allocator-leak shutdown path.
Collaborator
|
+1 |
Contributor
|
Thanks for taking a look, @nzw921rx. On the unchanged head There is no new code on the PR head, so I am not starting another full review in this round. Happy to re-check if the branch changes again. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Fix
IllegalStateException: Memory was leaked by querywhen using the Doris connector (and any connector backed byArrowToSeatunnelRowReader).Closes #9863
Problem
In
ArrowToSeatunnelRowReader.close(), the previous close order was:root.close()— closeVectorSchemaRootrootAllocator.close()— closeRootAllocator← wrong: arrowStreamReader still holds unreleased buffersarrowStreamReader.close()— too late, allocator already closedApache Arrow's memory management requires that all child allocations are released before the
RootAllocatoris closed. TheArrowStreamReaderinternally allocates Arrow buffers via theRootAllocator. Closing the allocator while the reader still holds references causes the allocator to detect leaked memory (~64 bytes of internal metadata) and throw:This exception propagated up through
DorisValueReader.hasNext()→DorisSourceReader.pollNext()→SourceFlowLifeCycle.collect()causing the task to fail with state FAILED.Fix
Corrected the close order to:
arrowStreamReader.close()— releases all Arrow buffers and internally closesVectorSchemaRootrootAllocator.close()— now safe, all allocations already releasedThe separate
root.close()call is removed sinceArrowStreamReader.close()already handles it.Impact
ArrowToSeatunnelRowReaderArrowToSeatunnelRowReaderinstance