Cooperatively cancel indexed scan when query task is cancelled#22198
Cooperatively cancel indexed scan when query task is cancelled#22198aravindsagar wants to merge 1 commit into
Conversation
PR Reviewer Guide 🔍(Review updated until commit 5647683)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 5647683
Previous suggestionsSuggestions up to commit 8b87043
|
|
How are we ensuring that all data is getting dropped in this flow when query is cancelled? Did we do any memory leak analysis with this? |
|
❌ Gradle check result for 8b87043: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
The indexed scan had no cancellation checkpoints, so a cancelled query kept
running every remaining row group to completion. The per-row-group evaluator
runs inside tokio::task::spawn_blocking (non-abortable), so the scan held
and grew its native (Arrow/DataFusion) memory until natural completion.
Thread the per-query CancellationToken (from the global QUERY_REGISTRY) down
through IndexedTableConfig -> IndexedExec -> IndexReader and add cooperative
checkpoints:
- IndexReader::poll_next_row_group: bail before dispatching the next row group.
- IndexReader::fetch_row_group: check before the evaluator call so a queued
blocking job that starts after cancel skips its work.
- IndexedStream::poll_inner: stop draining so decoded batches are released.
Cancellation surfaces as a query-level DataFusionError ("query cancelled")
which propagates as a clean fragment failure (never partial results).
Signed-off-by: Aravind Sagar <sagarara@amazon.com>
8b87043 to
751f50c
Compare
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 751f50c.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
751f50c to
5647683
Compare
|
Persistent review updated to latest commit 5647683 |
This is not targeted towards fixing memory leak, but to ensure that queries stop execution when cancelled. Did some test runs where the memory leak wasn't any worse with this change. |
|
❌ Gradle check result for 5647683: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
The indexed scan had no cancellation checkpoints. When a query task was cancelled,
IndexReaderkept dispatching and evaluating every remaining row group to completion, holding its native (Arrow/DataFusion) memory the entire time. The per-row-group evaluator runs insidetokio::task::spawn_blocking, which cannot be aborted once started.This change threads the per-query
CancellationToken(fromQUERY_REGISTRY) down throughIndexedTableConfig → IndexedExec → IndexReaderand adds three cooperative checkpoints:IndexReader::poll_next_row_group— bails before dispatching the next row group, so no newspawn_blockingjobs are admitted after cancellation.IndexReader::fetch_row_group— checks before the evaluator call, so a queued job that starts after cancellation skips its work.IndexedStream::poll_inner— stops draining, so decoded batches in the current segment are released immediately.Cancellation surfaces as a
DataFusionError("query cancelled")and propagates as a clean fragment failure — never partial results.Queries with
context_id == 0(untracked) and all unit tests passNonefor the token, leaving existing behaviour unchanged.Testing
Unit tests:
cancel_stops_row_group_dispatch: verifies that after cancellation, at most one already-in-flightspawn_blockingjob (non-abortable) completes, the reader terminates via the cancellation error path, and fewer than all 8 row groups are evaluated.cancellation_token: None.End-to-end on a 100M-doc ClickBench cluster (ec2, r8g.2xlarge):
Manual cancellation via
_tasks/_cancel:Ran
stats count() by UserID(baseline ~1.4s). Fired_tasks/_cancelat random offsets (166–675ms into the query). All 5 trials returned the query within 8–10ms of the cancel, withtasks_cancelled=3(coordinator + shard + fragment) andHTTP 500 TaskCancelledException[query cancelled]. Never a partial result.SBP auto-cancellation:
Configured
search_backpressure.mode=enforced,elapsed_time_millis_threshold=500ms, and a non-zeronode_duress.native_memory_limitso native-memory duress can trip and populate SBP's candidate task list. Ran memory-heavy concurrent queries. Baseline (SBP disabled): ~8,500ms. Under enforced SBP: all 10 queries cancelled and returned in ~743–1001ms — stopped within one SBP poll cycle of becoming eligible. Error:query cancelledon every trial.Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.