Skip to content

planner: support pushing Limit and TopN to individual partial paths of IndexMerge#68772

Merged
ti-chi-bot[bot] merged 16 commits into
pingcap:masterfrom
time-and-fate:2604-65712-2
Jun 2, 2026
Merged

planner: support pushing Limit and TopN to individual partial paths of IndexMerge#68772
ti-chi-bot[bot] merged 16 commits into
pingcap:masterfrom
time-and-fate:2604-65712-2

Conversation

@time-and-fate
Copy link
Copy Markdown
Member

@time-and-fate time-and-fate commented May 29, 2026

What problem does this PR solve?

Issue Number: ref #65712 close #68773

Problem Summary:

For queries like SELECT * FROM t WHERE (a = 1 OR b > 5) ORDER BY c LIMIT 5, where indexes idx_ac(a, c) and idx_bc(b, c) are available:

  • The a = 1 partial path on idx_ac can satisfy ORDER BY c (single value on the leading column a).
  • The b > 5 partial path on idx_bc cannot satisfy ORDER BY c (range scan on the leading column b).

Previously, if not all partial paths of an IndexMerge could satisfy the ORDER BY, the plan would fall back to a global TopN without any pushdown, scanning all matching rows before sorting.

This is Solution 2 described in the issue.

What changed and how does it work?

create table t(a int, b int, c int, d int, index idx_ac(a,c), index idx_bc(b,c));
explain select * from t where a = 1 or b > 5 order by c limit 3;
-- Before (global TopN, no pushdown):
+----------------------------------+---------+-----------+-----------------------------+------------------------------------------------+
| id                               | estRows | task      | access object               | operator info                                  |
+----------------------------------+---------+-----------+-----------------------------+------------------------------------------------+
| TopN_9                           | 3.00    | root      |                             | test.t.c, offset:0, count:3                    |
| └─IndexMerge_21                  | 3.00    | root      |                             | type: union                                    |
|   ├─IndexRangeScan_17(Build)     | 10.00   | cop[tikv] | table:t, index:idx_ac(a, c) | range:[1,1], keep order:false, stats:pseudo    |
|   ├─IndexRangeScan_18(Build)     | 3333.33 | cop[tikv] | table:t, index:idx_bc(b, c) | range:(5,+inf], keep order:false, stats:pseudo |
|   └─TopN_20(Probe)               | 3.00    | cop[tikv] |                             | test.t.c, offset:0, count:3                    |
|     └─TableRowIDScan_19          | 3340.00 | cop[tikv] | table:t                     | keep order:false, stats:pseudo                 |
+----------------------------------+---------+-----------+-----------------------------+------------------------------------------------+

-- After (Limit to ordered paths, TopN to others):
+-------------------------------+---------+-----------+-----------------------------+------------------------------------------------+
| id                            | estRows | task      | access object               | operator info                                  |
+-------------------------------+---------+-----------+-----------------------------+------------------------------------------------+
| TopN_9                        | 3.00    | root      |                             | test.t.c, offset:0, count:3                    |
| └─IndexMerge_25               | 3.00    | root      |                             | type: union                                    |
|   ├─Limit_22(Build)           | 3.00    | cop[tikv] |                             | offset:0, count:3                              |
|   │ └─IndexRangeScan_19       | 10.00   | cop[tikv] | table:t, index:idx_ac(a, c) | range:[1,1], keep order:false, stats:pseudo    |
|   ├─TopN_23(Build)            | 3.00    | cop[tikv] |                             | test.t.c, offset:0, count:3                    |
|   │ └─IndexRangeScan_20       | 3333.33 | cop[tikv] | table:t, index:idx_bc(b, c) | range:(5,+inf], keep order:false, stats:pseudo |
|   └─TopN_24(Probe)            | 3.00    | cop[tikv] |                             | test.t.c, offset:0, count:3                    |
|     └─TableRowIDScan_21       | 3340.00 | cop[tikv] | table:t                     | keep order:false, stats:pseudo                 |
+-------------------------------+---------+-----------+-----------------------------+------------------------------------------------+
  1. A new field SortItemsHints is added to PhysicalProperty. When TopN sits directly above a DataSource, the ORDER BY columns are passed as advisory sort items through the property.
  2. DataSource uses these advisory sort items when selecting alternatives for OR-type IndexMerge, preferring partial paths that can satisfy the sort order.
  3. When not all partial paths satisfy the advisory sort order, Limit is pushed to the matching ones while TopN is pushed to the rest.

Key changes:

  • PhysicalProperty: new SortItemsHints field for advisory sort preferences (pkg/planner/property/physical_property.go)
  • getPhysTopN: sets SortItemsHints when the child is a DataSource (pkg/planner/core/operator/physicalop/physical_topn.go)
  • candidatePath: new matchWithAdvisorySortItems flag (true when matching used SortItemsHints) and partialPathMatchResults field storing each partial path's matchProperty result, replacing the simpler boolean satisfaction slice (pkg/planner/core/find_best_task.go)
  • CopTask: new IdxMergeMatchWithAdvisorySortItems flag and IdxMergePartPlansMatchResults field storing each partial path's matchProperty result (pkg/planner/core/operator/physicalop/task_base.go)
  • matchPropForIndexMergeAlternatives / isMatchPropForIndexMerge: collect per-path matchProperty results inline during matching rather than recomputing afterwards (pkg/planner/core/find_best_task.go)
  • convertToIndexMergeScan: passes each partial path's own match result (and the effective prop — hints-based or original) when building partial scans (pkg/planner/core/find_best_task.go)
  • attach2Task4PhysicalTopN: new handleAdvisorySortItemsForIndexMerge that pushes Limit to satisfying partial paths and TopN to others. Only applies when some (but not all) paths satisfy, since the existing Limit pushdown handles the all-satisfying case. (pkg/planner/core/task.go)

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

…f IndexMerge using SortItemsHints

Introduce SortItemsHints on PhysicalProperty so that when TopN sits directly
above a DataSource, the ORDER BY columns can be passed as soft hints. DataSource
uses these hints when selecting alternatives for OR-type IndexMerge, preferring
partial paths that can satisfy the sort order. When not all partial paths satisfy
the hints, Limit is pushed to the ordered ones while TopN is pushed to the rest.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ti-chi-bot ti-chi-bot Bot added the release-note-none Denotes a PR that doesn't merit a release note. label May 29, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented May 29, 2026

@time-and-fate I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/planner SIG: Planner labels May 29, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a soft sort-hints flow: TopN extracts simple-column sort hints into PhysicalProperty.SortItemsHints; the index-merge matcher records per-partial-path hint satisfaction on candidatePath and CopTask; attach2Task4PhysicalTopN uses that info to push Limit to hint-satisfying partial readers and TopN to others. Tests and expected outputs updated.

Changes

Index-Merge Soft Sort Hints

Layer / File(s) Summary
SortItemsHints field and property methods
pkg/planner/property/physical_property.go
PhysicalProperty gains SortItemsHints []SortItem; HashCode(), CloneEssentialFields(), and MemoryUsage() include hints.
TopN hint extraction and propagation
pkg/planner/core/operator/physicalop/physical_topn.go
getPhysTopN builds sortItemsHints from simple column ByItems when child is DataSource and sets PhysicalProperty.SortItemsHints.
candidatePath hint satisfaction recording
pkg/planner/core/find_best_task.go
candidatePath adds sortItemsHintsSatisfied []bool to cache per-partial-path satisfaction.
Index-merge property matching and hint satisfaction
pkg/planner/core/find_best_task.go
matchPropForIndexMergeAlternatives now returns chosen access path plus per-branch hint-satisfaction []bool and uses SortItemsHints as a soft criterion when SortItems is empty; results flow through convergeIndexMergeCandidate and getIndexMergeCandidate.
CopTask field and hint satisfaction recording
pkg/planner/core/operator/physicalop/task_base.go, pkg/planner/core/find_best_task.go
Add CopTask.IdxMergePartPlansSatisfySortHints []bool and set it from candidate data in convertToIndexMergeScan.
Hint-driven Limit vs TopN planning
pkg/planner/core/task.go
attach2Task4PhysicalTopN detects partially-satisfied hints for IndexMerge and delegates to handleSortItemsHintsForIndexMerge, which pushes PhysicalLimit to hint-matching partial readers and conditionally pushes TopN to others.
Integration tests and expected outputs
tests/integrationtest/t/planner/core/indexmerge_path.test, tests/integrationtest/r/planner/core/indexmerge_path.result, pkg/planner/cardinality/testdata/cardinality_suite_out.json
Add TestIndexMergeSortItemsHints and update expected explain-tree outputs to reflect Limit wrappers on hint-satisfying branches.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • pingcap/tidb#67771: Modifies overlapping IndexMerge planning logic in find_best_task.go, including property-matching and partial-path ordering.

Suggested reviewers

  • qw4990
  • winoros
  • terry1purcell

Poem

🐰
I sniff the hints in TopN's trail,
Some paths leap forward, others pale,
Limits tuck tidy rows in tight,
TopN chases where hints take flight,
Merge them gentle—hop, perfect right.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.15% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately describes the main change: adding support for pushing Limit and TopN to individual partial paths of IndexMerge, which is the core feature implemented across the changeset.
Linked Issues check ✅ Passed The PR implements Solution 2 from issue #65712 as specified, with all key objectives met: SortItemsHints propagation, IndexMerge alternative selection based on hints, mixed Limit/TopN pushdown for partial paths, and integration tests.
Out of Scope Changes check ✅ Passed All code changes directly support the core objective of enabling Limit and TopN pushdown to IndexMerge partial paths. Changes to cardinality test output are expected side effects of the optimization strategy.
Description check ✅ Passed The PR description is comprehensive and well-structured. It clearly outlines the problem, the solution approach, specific code changes, test coverage, and includes SQL examples demonstrating the improvement.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/planner/core/operator/physicalop/task_base.go (1)

462-495: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update MemoryUsage to account for the new IdxMergePartPlansSatisfySortHints field.

The MemoryUsage() method does not account for the newly added IdxMergePartPlansSatisfySortHints slice field. This causes the memory usage to be underreported.

📊 Proposed fix to update memory calculation
-	sum = size.SizeOfInterface*(2+int64(cap(t.IdxMergePartPlans)+cap(t.RootTaskConds))) + size.SizeOfBool*3 + size.SizeOfUint64 +
-		size.SizeOfPointer*(3+int64(cap(t.CommonHandleCols)+cap(t.TblCols))) + size.SizeOfSlice*4 + t.PhysPlanPartInfo.MemoryUsage()
+	sum = size.SizeOfInterface*(2+int64(cap(t.IdxMergePartPlans)+cap(t.RootTaskConds))) + size.SizeOfBool*3 + size.SizeOfUint64 +
+		size.SizeOfPointer*(3+int64(cap(t.CommonHandleCols)+cap(t.TblCols))) + size.SizeOfSlice*5 + 
+		size.SizeOfBool*int64(cap(t.IdxMergePartPlansSatisfySortHints)) + t.PhysPlanPartInfo.MemoryUsage()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/planner/core/operator/physicalop/task_base.go` around lines 462 - 495,
The MemoryUsage method on CopTask omits the new
IdxMergePartPlansSatisfySortHints slice; update CopTask.MemoryUsage to include
the slice header and elements: add the slice header size (size.SizeOfSlice) and
account for its capacity in the initial capacity sum (similar to how
IdxMergePartPlans or RootTaskConds are counted), and then iterate over
t.IdxMergePartPlansSatisfySortHints to add each element's memory (call
element.MemoryUsage() if elements are structs with MemoryUsage, otherwise add
the appropriate primitive size such as size.SizeOfBool). Ensure the field name
IdxMergePartPlansSatisfySortHints and function CopTask.MemoryUsage are the
referenced locations to modify.
🧹 Nitpick comments (1)
pkg/planner/core/operator/physicalop/task_base.go (1)

415-418: 💤 Low value

Consider enforcing the documented length invariant.

The documentation states "Length equals len(IdxMergePartPlans)," but downstream code in task.go uses defensive bounds checking (i < len(copTask.IdxMergePartPlansSatisfySortHints)) before accessing elements. This suggests the invariant may not always hold or is not enforced at construction time.

Consider either:

  1. Adding a check when the field is set to ensure lengths match, or
  2. Updating the documentation to clarify that the lengths may differ in edge cases

Based on context snippet 3 from pkg/planner/core/task.go:1458-1494, which uses bounds-checked access patterns.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/planner/core/operator/physicalop/task_base.go` around lines 415 - 418,
The field IdxMergePartPlansSatisfySortHints claims its length equals
len(IdxMergePartPlans) but callers (e.g., task.go using
copTask.IdxMergePartPlansSatisfySortHints) defensively check bounds; enforce the
invariant at assignment: in convertToIndexMergeScan where
IdxMergePartPlansSatisfySortHints is set, ensure its slice is resized to exactly
len(IdxMergePartPlans) (truncate if longer, append false values if shorter) so
downstream code can assume equal length; alternatively, if a resize is
inappropriate, update the documentation comment on
IdxMergePartPlansSatisfySortHints to state lengths may differ and callers must
bounds-check.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/planner/core/task.go`:
- Around line 1459-1472: When creating the pushedDownLimit
(physicalop.PhysicalLimit) for hint-satisfied partial plans in the loop over
copTask.IdxMergePartPlans, preserve the partitioning by copying the PartitionBy
from the source partialPlan into pushedDownLimit (e.g. call the appropriate
setter or assign PartitionBy using
partialPlan.PartitionBy()/partialPlan.SchemaPartition info) before assigning it
into copTask.IdxMergePartPlans[i]; apply the same copy to the other similar
branch around the lines handling root TopN for partitioned flows (the other
pushedDownLimit creation at lines ~1488-1490) so partition semantics are not
lost.

---

Outside diff comments:
In `@pkg/planner/core/operator/physicalop/task_base.go`:
- Around line 462-495: The MemoryUsage method on CopTask omits the new
IdxMergePartPlansSatisfySortHints slice; update CopTask.MemoryUsage to include
the slice header and elements: add the slice header size (size.SizeOfSlice) and
account for its capacity in the initial capacity sum (similar to how
IdxMergePartPlans or RootTaskConds are counted), and then iterate over
t.IdxMergePartPlansSatisfySortHints to add each element's memory (call
element.MemoryUsage() if elements are structs with MemoryUsage, otherwise add
the appropriate primitive size such as size.SizeOfBool). Ensure the field name
IdxMergePartPlansSatisfySortHints and function CopTask.MemoryUsage are the
referenced locations to modify.

---

Nitpick comments:
In `@pkg/planner/core/operator/physicalop/task_base.go`:
- Around line 415-418: The field IdxMergePartPlansSatisfySortHints claims its
length equals len(IdxMergePartPlans) but callers (e.g., task.go using
copTask.IdxMergePartPlansSatisfySortHints) defensively check bounds; enforce the
invariant at assignment: in convertToIndexMergeScan where
IdxMergePartPlansSatisfySortHints is set, ensure its slice is resized to exactly
len(IdxMergePartPlans) (truncate if longer, append false values if shorter) so
downstream code can assume equal length; alternatively, if a resize is
inappropriate, update the documentation comment on
IdxMergePartPlansSatisfySortHints to state lengths may differ and callers must
bounds-check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d923f04a-b5d2-4843-805b-26c9d1562ad3

📥 Commits

Reviewing files that changed from the base of the PR and between 1c3f9eb and e1e457f.

📒 Files selected for processing (7)
  • pkg/planner/core/find_best_task.go
  • pkg/planner/core/operator/physicalop/physical_topn.go
  • pkg/planner/core/operator/physicalop/task_base.go
  • pkg/planner/core/task.go
  • pkg/planner/property/physical_property.go
  • tests/integrationtest/r/planner/core/indexmerge_path.result
  • tests/integrationtest/t/planner/core/indexmerge_path.test

Comment thread pkg/planner/core/task.go
@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

❌ Patch coverage is 88.61386% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.3037%. Comparing base (a9add5c) to head (11c60c7).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #68772        +/-   ##
================================================
- Coverage   76.3085%   75.3037%   -1.0049%     
================================================
  Files          2041       2025        -16     
  Lines        563262     567579      +4317     
================================================
- Hits         429817     427408      -2409     
- Misses       132529     140138      +7609     
+ Partials        916         33       -883     
Flag Coverage Δ
integration 41.3125% <88.6138%> (+1.5340%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4610% <ø> (ø)
parser ∅ <ø> (∅)
br 49.8023% <ø> (-13.0287%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ti-chi-bot ti-chi-bot Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 30, 2026
Comment thread pkg/planner/core/operator/physicalop/physical_topn.go Outdated
Comment thread pkg/planner/core/find_best_task.go
Copy link
Copy Markdown
Member

@winoros winoros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more comments, except for the mentioned naming issue.

@ti-chi-bot ti-chi-bot Bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jun 1, 2026
@ti-chi-bot ti-chi-bot Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 1, 2026
@qw4990
Copy link
Copy Markdown
Contributor

qw4990 commented Jun 2, 2026

/retest

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 2, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qw4990, winoros

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jun 2, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 2, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-06-01 14:47:30.540844079 +0000 UTC m=+193751.611161469: ☑️ agreed by winoros.
  • 2026-06-02 01:36:22.245642089 +0000 UTC m=+232683.315959479: ☑️ agreed by qw4990.

@ti-chi-bot ti-chi-bot Bot merged commit 2310c3d into pingcap:master Jun 2, 2026
36 checks passed
@ti-chi-bot
Copy link
Copy Markdown
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #68858.
But this PR has conflicts, please resolve them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support pushing Limit and TopN to individual partial paths of IndexMerge

4 participants