planner, executor: support merge sort for IN conditions in IndexMerge partial paths#67771
Conversation
… partial paths When an IndexMerge query has a partial path with IN conditions (e.g., `b IN (1,2)`), the partial path could not keep order because IN conditions produce multiple disjoint ranges that are individually sorted but not globally sorted. This caused IndexMerge to fall back to a global TopN without Limit pushdown, resulting in much worse performance (e.g., 0.01s -> 1.53s). This PR leverages the merge sort infrastructure introduced in pingcap#62694 to support `PropMatchedNeedMergeSort` for IndexMerge partial paths. The IN condition ranges are split into groups, each producing a sorted stream, which are then merged via the existing `sortedSelectResults` mechanism. Changes: - planner: Accept `PropMatchedNeedMergeSort` from `matchProperty()` for IndexMerge partial paths, and propagate `GroupedRanges`/`GroupByColIdxs` to physical scans. - executor: Replace separate `keyRanges` and `partitionKeyRanges` fields with a unified `partialWorkerKVRanges` that handles partitions, grouped ranges, or both. Rebuild `GroupedRanges` for correlated columns. Pass grouped ranges to `TableReaderExecutor` for table scan partial paths. - executor: Update `memIndexMergeReader` (UnionScan) to use the unified field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Review Complete Findings: 0 issues ℹ️ Learn more details on Pantheon AI. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIndexMerge execution now unifies per-partial-plan KV-range handling via ChangesIndexMerge Grouped KV-Ranges and Worker Execution Flow
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/integrationtest/t/index_merge.test (1)
241-285: Add oneprimary IN (...)/table-partial case here as well.These cases only drive secondary-index partials (
idx_a_c/idx_b_c). The planner change also added grouped-range propagation inconvertToPartialTableScan(), but nothing in this block makes IndexMerge choose a table/primary partial path that needs the same merge-sort treatment. A small case likeid in (...) or b in (...) order by id limit ...would cover that new branch and keep it from regressing silently.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integrationtest/t/index_merge.test` around lines 241 - 285, Add a test that exercises a primary/table-partial path by including a case like "id IN (...) or b IN (...) ORDER BY id LIMIT ..." so IndexMerge must consider a primary-key partial and be merge-sorted; specifically insert a new block in index_merge.test similar to other cases that uses the /*+ use_index_merge(t_im_in) */ hint and queries "select ... from t_im_in where id in (1,2) or b in (1,2) order by id limit 5" (also include an EXPLAIN and a SELECT, and a PREPARE/EXECUTE variant if desired) to cover the convertToPartialTableScan() branch and prevent regressions in primary/table-partial handling by IndexMerge.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/executor/index_merge_reader.go`:
- Around line 210-216: The code only extracts GroupedRanges from
PhysicalIndexScan, missing the case where plan[0] is a
*physicalop.PhysicalTableScan; update the branch in index_merge_reader.go (the
block around plan[0] handling) to check for *physicalop.PhysicalTableScan as
well and set groupedRanges = x.GroupedRanges when that type matches (same as for
PhysicalIndexScan), so memIndexMergeReader/partialWorkerKVRanges will use
table-scan grouped ranges (consistent with rebuildRangeForCorCol behavior).
In `@pkg/planner/core/find_best_task.go`:
- Around line 1433-1435: The code currently reduces the detailed result from
matchProperty(...) to a boolean via .Matched(), losing the PropMatched vs
PropMatchedNeedMergeSort distinction and allowing stale
AccessPath.GroupedRanges/GroupByColIdxs to leak; update the logic in the loop
that iterates oneAlternative to either (a) preserve the full
PhysicalPropMatchResult returned by matchProperty(...) and use its Matched/Kind
to decide match, or (b) after calling matchProperty(...) and finding it is NOT
PropMatchedNeedMergeSort, explicitly clear the mutable grouping fields on the
AccessPath (AccessPath.GroupedRanges and AccessPath.GroupByColIdxs) so stale
grouping metadata cannot affect subsequent direct-match plans; apply the same
fix to the similar block around match checks at the other location mentioned
(the block around lines referenced as 1543-1545).
---
Nitpick comments:
In `@tests/integrationtest/t/index_merge.test`:
- Around line 241-285: Add a test that exercises a primary/table-partial path by
including a case like "id IN (...) or b IN (...) ORDER BY id LIMIT ..." so
IndexMerge must consider a primary-key partial and be merge-sorted; specifically
insert a new block in index_merge.test similar to other cases that uses the /*+
use_index_merge(t_im_in) */ hint and queries "select ... from t_im_in where id
in (1,2) or b in (1,2) order by id limit 5" (also include an EXPLAIN and a
SELECT, and a PREPARE/EXECUTE variant if desired) to cover the
convertToPartialTableScan() branch and prevent regressions in
primary/table-partial handling by IndexMerge.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 681e8948-d1b3-46a2-b1d7-eb5ffbc906b8
📒 Files selected for processing (7)
pkg/executor/index_merge_reader.gopkg/executor/mem_reader.gopkg/executor/test/indexmergereadtest/index_merge_reader_test.gopkg/planner/core/find_best_task.gopkg/planner/core/operator/physicalop/physical_index_scan.gotests/integrationtest/r/index_merge.resulttests/integrationtest/t/index_merge.test
| for _, oneAccessPath := range oneAlternative { | ||
| // Satisfying the property by a merge sort is not supported for partial paths of index merge. | ||
| if !noSortItem && matchProperty(ds, oneAccessPath, prop) != property.PropMatched { | ||
| if !noSortItem && !matchProperty(ds, oneAccessPath, prop).Matched() { | ||
| match = false |
There was a problem hiding this comment.
Preserve the exact partial-path match result instead of collapsing it to .Matched().
This drops the distinction between PropMatched and PropMatchedNeedMergeSort, but the later partial-scan conversion recovers that distinction from AccessPath.GroupedRanges / GroupByColIdxs. Those fields are mutable planner scratch state, and matchProperty() only populates them on the merge-sort path; it does not clear them on plain matches in the code shown here. If the same AccessPath is reused across property explorations, stale grouping metadata can leak into a later direct-match plan and force unnecessary grouped cop tasks / merge-sort. Please either clear the grouping fields whenever the current match is not PropMatchedNeedMergeSort, or carry the per-partial PhysicalPropMatchResult through instead of reducing it to a bool.
Also applies to: 1543-1545
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/planner/core/find_best_task.go` around lines 1433 - 1435, The code
currently reduces the detailed result from matchProperty(...) to a boolean via
.Matched(), losing the PropMatched vs PropMatchedNeedMergeSort distinction and
allowing stale AccessPath.GroupedRanges/GroupByColIdxs to leak; update the logic
in the loop that iterates oneAlternative to either (a) preserve the full
PhysicalPropMatchResult returned by matchProperty(...) and use its Matched/Kind
to decide match, or (b) after calling matchProperty(...) and finding it is NOT
PropMatchedNeedMergeSort, explicitly clear the mutable grouping fields on the
AccessPath (AccessPath.GroupedRanges and AccessPath.GroupByColIdxs) so stale
grouping metadata cannot affect subsequent direct-match plans; apply the same
fix to the similar block around match checks at the other location mentioned
(the block around lines referenced as 1543-1545).
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #67771 +/- ##
================================================
- Coverage 77.7059% 77.0182% -0.6877%
================================================
Files 1991 1973 -18
Lines 552094 552883 +789
================================================
- Hits 429010 425821 -3189
- Misses 122164 127055 +4891
+ Partials 920 7 -913
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
/retest |
|
/retest |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/integrationtest/t/planner/core/grouped_ranges_order_by.test (1)
320-408: 💤 Low valueVerify integration test recording workflow.
Confirm that the
.resultfile was generated using the recording command specified indocs/agents/testing-flow.md→Integration testssection, rather than using a-recordflag. As per coding guidelines, integration tests undertests/integrationtest/**must follow the documented recording workflow.The test coverage itself is comprehensive and well-structured, covering:
- Basic IndexMerge with
ORDER BY ... LIMIT(ASC/DESC)- Prepared statements with parameterized
INvalues andLIMIT- UnionScan behavior with uncommitted rows
- Correlated subqueries with
no_decorrelate()- Partitioned table support
As per coding guidelines, integration tests in
tests/integrationtest/**must use the recording command documented indocs/agents/testing-flow.md→Integration tests(not-record).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integrationtest/t/planner/core/grouped_ranges_order_by.test` around lines 320 - 408, The recorded .result for the integration test grouped_ranges_order_by.test was created with the deprecated -record flag instead of the documented recording command; re-run the test recording using the exact command from docs/agents/testing-flow.md → Integration tests (not -record), regenerate the .result artifact for the test that covers the statements around stmt_im and the IndexMerge queries on t_im_in/t_im_part (including the prepared statement 'stmt_im' and the correlated subquery using no_decorrelate()), replace the existing .result with the newly recorded file, and commit the updated artifact ensuring no -record usage remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/integrationtest/t/planner/core/grouped_ranges_order_by.test`:
- Around line 320-408: The recorded .result for the integration test
grouped_ranges_order_by.test was created with the deprecated -record flag
instead of the documented recording command; re-run the test recording using the
exact command from docs/agents/testing-flow.md → Integration tests (not
-record), regenerate the .result artifact for the test that covers the
statements around stmt_im and the IndexMerge queries on t_im_in/t_im_part
(including the prepared statement 'stmt_im' and the correlated subquery using
no_decorrelate()), replace the existing .result with the newly recorded file,
and commit the updated artifact ensuring no -record usage remains.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 0b01707e-548a-49e6-97c6-5a20c5b8d215
📒 Files selected for processing (2)
tests/integrationtest/r/planner/core/grouped_ranges_order_by.resulttests/integrationtest/t/planner/core/grouped_ranges_order_by.test
There was a problem hiding this comment.
Pull request overview
This PR improves IndexMerge planning/execution for ORDER BY ... LIMIT queries when some partial paths contain IN ranges that are only locally ordered (per-range) but not globally ordered, by enabling merge-sort across range groups for partial paths (using the existing PropMatchedNeedMergeSort infrastructure).
Changes:
- Planner: allow
PropMatchedNeedMergeSortfor IndexMerge partial paths; propagateGroupedRanges/GroupByColIdxsfromAccessPathinto partial physical scans. - Executor: unify partial-path kv-range construction into
partialWorkerKVRangesto support partitions and grouped ranges (including UnionScan/mem readers), and rebuild grouped ranges after correlated-range rebuilds. - Tests: add/extend integration tests to validate plans and correctness across plain queries, DESC, prepared statements, UnionScan, correlated subqueries, and partitioned tables.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/integrationtest/t/planner/core/grouped_ranges_order_by.test | Adds integration coverage for IndexMerge + IN + ORDER BY LIMIT across multiple scenarios (including partition/UnionScan/correlation). |
| tests/integrationtest/t/index_merge.test | Adds targeted IndexMerge regression tests for IN-driven merge sort behavior with ORDER BY LIMIT. |
| tests/integrationtest/r/planner/core/grouped_ranges_order_by.result | Updates expected plans/results to reflect limit pushdown + ordered partial paths. |
| tests/integrationtest/r/planner/core/casetest/physicalplantest/physical_plan.result | Updates expected physical plans to reflect embedded limits + keep order on partial paths. |
| tests/integrationtest/r/index_merge.result | Updates expected output/plans for new regression test cases. |
| pkg/planner/core/operator/physicalop/physical_index_scan.go | Allows partial index scans to carry GroupedRanges/GroupByColIdxs for merge sort within a partial path. |
| pkg/planner/core/find_best_task.go | Accepts .Matched() for IndexMerge partial paths, resets grouped-range fields per matchProperty() call, and propagates grouped-range metadata into partial scans. |
| pkg/executor/mem_reader.go | Updates mem IndexMerge reader to use unified partialWorkerKVRanges (including partition handle wrapping via physical table IDs). |
| pkg/executor/index_merge_reader.go | Builds unified partialWorkerKVRanges, rebuilds grouped ranges after correlated-range rebuild, and passes grouped-range metadata to table readers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| physTblID int64 | ||
| isCommonHdl bool | ||
| } | ||
| var tables []tblInfo |
There was a problem hiding this comment.
Can we pre-allocate the memory for the slice?
| if err != nil { | ||
| return err | ||
| } | ||
| combined := append(firstKV.FirstPartitionRange(), secondKV.FirstPartitionRange()...) |
There was a problem hiding this comment.
pre-alloc for the combined too.
There was a problem hiding this comment.
I think the current implementation is optimal. The compiler should handle this well.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: qw4990, winoros The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
In response to a cherrypick label: new pull request created to branch |
… partial paths (pingcap#67771) ref pingcap#65712, close pingcap#67775
What problem does this PR solve?
Issue Number: ref #65712 close #67775
Problem Summary:
For queries like
SELECT * FROM t WHERE (a = 1 OR b IN (1,2)) ORDER BY c LIMIT 5, where indexesidx_a_c(a, c)andidx_b_c(b, c)are available:a = 1partial path onidx_a_ccan directly satisfyORDER BY c(PropMatched).b IN (1,2)partial path onidx_b_ccannot directly satisfyORDER BY c, because the ranges[1,1]and[2,2]are individually ordered bycbut not globally ordered.Previously,
PropMatchedNeedMergeSort(introduced in #62694) was rejected for IndexMerge partial paths, causing the plan to fall back to a globalTopNwithoutLimitpushdown. This could scan all matching rows (e.g., ~750K) before sorting, resulting in much worse performance.This is Solution 3 described in the issue.
What changed and how does it work?
This PR leverages the merge sort infrastructure from #62694 to support
PropMatchedNeedMergeSortfor IndexMerge partial paths. The IN condition ranges are split into groups, each producing a sorted stream, which are then merged.After this fix, the plan becomes:
matchPropForIndexMergeAlternatives()andisMatchPropForIndexMerge(): AcceptPropMatchedNeedMergeSortfrom individual partial paths by using.Matched()instead of== PropMatched. The IndexMerge-level result remainsPropMatched(the IndexMerge itself satisfies the order via its own inter-path merge sort).ConvertToPartialIndexScan()andconvertToPartialTableScan(): Remove the assertion that blockedPropMatchedNeedMergeSort, and copyGroupedRanges/GroupByColIdxsfrom theAccessPathto the physical scan when the path needs merge sort.keyRangesandpartitionKeyRangesfields with a unifiedpartialWorkerKVRanges [][]*kvRangesWithPhysicalTblID. Each entry carries aPhysicalTableIDalongside its kv ranges, unifying partition mode, grouped ranges (from IN conditions), and the combination of both. This follows the same pattern asIndexLookUpExecutor.groupedKVRangesfrom planner, executor: support access pathkeep orderwithINconditions using merge sort #62694.buildPartialWorkerKVRanges()method inOpen()builds kv ranges for each partial plan considering both partitions and grouped ranges.rebuildRangeForCorCol(): After rebuilding ranges for correlated columns, also rebuildGroupedRangesusingGroupRangesByCols().startPartialTableWorker(): PassGroupedRanges/GroupByColIdxsto theTableReaderExecutor, which already has full support for grouped ranges.memIndexMergeReader(UnionScan): Updated to use the unifiedpartialWorkerKVRangesfield, replacing the previouspartitionKVRanges/partitionTablesand directkeyRangesaccess. ThePhysicalTableIDfrom each entry is used for partition handle conversion.Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit