Wire cooperative cancellation into the reduce and QTF fetch native streams#22248
Conversation
…reams A cancelled coordinator-reduce or QTF fetch was either abort()-killed mid-send (skipping drop+drain, leaking the aggregate's GroupValues) or left with no cancel signal at all (stranding its stream_next task and DataFusion pool reservation). Make cancel cooperative. - cross_rt_stream: optional CancellationToken on the producer loop; a cancel select-breaks the loop and falls through to drop(stream)+drain instead of a JoinSet abort. Existing call sites pass None (unchanged). Covered by 3 cargo tests. - api.rs reduce paths: pass the token, do not register the abort handle. - query_executor.rs wrap_stream_as_handle (QTF fetch-by-rowid): build with the cancellable variant and register the abort + CPU runtime handles, matching execute_query. - AnalyticsSearchBackendPlugin.cancelByContext SPI (default no-op) + datafusion impl; AnalyticsSearchService registers a fetch task cancellation listener that fires it. Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
PR Reviewer Guide 🔍(Review updated until commit 8368f14)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 8368f14 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 3296c94
Suggestions up to commit 92b17a6
Suggestions up to commit 4dbd6e1
Suggestions up to commit a6edd63
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #22248 +/- ##
==========================================
Coverage 73.35% 73.35%
- Complexity 75937 76005 +68
==========================================
Files 6071 6075 +4
Lines 344993 345282 +289
Branches 49638 49697 +59
==========================================
+ Hits 253080 253295 +215
- Misses 71710 71757 +47
- Partials 20203 20230 +27 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
- cross_rt_stream: remove the post-drop `for _ in 0..2 { yield_now }` loop.
A fixed yield count is not sufficient across varying CPU cores; the deferred
child-task drops are reaped by the bounded flush_cpu_runtime in stream_close
(4 workers x 32 yields, 500ms-capped), and drop(stream) itself frees the
aggregate's GroupValues. The loop was a redundant best-effort nudge.
- query_tracker test: move test_cancel_query_flushes_deferred_drops to a unique
context id (70_001 -> 80_001) so it no longer collides with
test_top_n_picks_highest_current_bytes in the process-wide registry under
parallel test execution.
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
|
Persistent review updated to latest commit 4dbd6e1 |
…qtf-fetch Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
|
Persistent review updated to latest commit 92b17a6 |
|
❌ Gradle check result for 92b17a6: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 3296c94.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit 3296c94 |
|
❌ Gradle check result for 3296c94: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…qtf-fetch Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
|
Persistent review updated to latest commit 8368f14 |
|
❌ Gradle check result for 8368f14: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
A cancelled coordinator-reduce or QTF fetch was either abort()-killed mid-send (skipping drop+drain, leaking the aggregate's GroupValues) or left with no cancel signal at all (stranding its stream_next task and DataFusion pool reservation). Make cancel cooperative.
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.