planner: support using nested IN to build IndexMerge path (#68962)#69003
planner: support using nested IN to build IndexMerge path (#68962)#69003ti-chi-bot wants to merge 1 commit into
IN to build IndexMerge path (#68962)#69003Conversation
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
This cherry pick PR is for a release branch and has not yet been approved by triage owners. To merge this cherry pick:
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@time-and-fate This PR has conflicts, I have hold it. |
|
@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughThis PR extends TiDB's IndexMerge planner to support ChangesIndexMerge IN Predicate Support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@ti-chi-bot: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@tests/integrationtest/r/planner/core/casetest/physicalplantest/physical_plan.result`:
- Around line 3664-3684: The test result file physical_plan.result contains
unresolved Git conflict markers mixing 'brief' and 'plan_tree' outputs; fix by
removing the conflict markers around the two explain blocks and ensure the
inputs in physical_plan.test match the intended "explain format = 'plan_tree'
select ... order by c limit 2" and "explain format = 'plan_tree' select ...
order by b limit 2" statements, then regenerate the golden output so the
plan_tree outputs in physical_plan.result correspond exactly to those two
explain queries (verify the IndexMerge/IndexRangeScan/TopN blocks for each query
are present and consistent).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: fabe6aa3-8ef1-49e0-a199-3213811217ed
📒 Files selected for processing (5)
pkg/planner/core/indexmerge_path.gopkg/planner/core/indexmerge_unfinished_path.gotests/integrationtest/r/planner/core/casetest/physicalplantest/physical_plan.resulttests/integrationtest/r/planner/core/indexmerge_path.resulttests/integrationtest/t/planner/core/indexmerge_path.test
| explain format = 'plan_tree' select * from t where (a = 1 or b = 2) and c in (1, 2, 3) order by c limit 2; | ||
| id task access object operator info | ||
| Limit root offset:0, count:2 | ||
| └─Projection root planner__core__casetest__physicalplantest__physical_plan.t.a, planner__core__casetest__physicalplantest__physical_plan.t.b, planner__core__casetest__physicalplantest__physical_plan.t.c | ||
| └─IndexMerge root type: union | ||
| ├─IndexRangeScan(Build) cop[tikv] table:t, index:idx(a, c) range:[1 1,1 1], [1 2,1 2], [1 3,1 3], keep order:true, stats:pseudo | ||
| ├─IndexRangeScan(Build) cop[tikv] table:t, index:idx2(b, c) range:[2 1,2 1], [2 2,2 2], [2 3,2 3], keep order:true, stats:pseudo | ||
| └─Selection(Probe) cop[tikv] in(planner__core__casetest__physicalplantest__physical_plan.t.c, 1, 2, 3) | ||
| └─TableRowIDScan cop[tikv] table:t keep order:false, stats:pseudo | ||
| show warnings; | ||
| Level Code Message | ||
| explain format = 'plan_tree' select * from t where (a = 1 or b = 2) and c in (1, 2, 3) order by b limit 2; | ||
| id task access object operator info | ||
| TopN root planner__core__casetest__physicalplantest__physical_plan.t.b, offset:0, count:2 | ||
| └─IndexMerge root type: union | ||
| ├─IndexRangeScan(Build) cop[tikv] table:t, index:idx(a, c) range:[1 1,1 1], [1 2,1 2], [1 3,1 3], keep order:false, stats:pseudo | ||
| ├─IndexRangeScan(Build) cop[tikv] table:t, index:idx2(b, c) range:[2 1,2 1], [2 2,2 2], [2 3,2 3], keep order:false, stats:pseudo | ||
| └─TopN(Probe) cop[tikv] planner__core__casetest__physicalplantest__physical_plan.t.b, offset:0, count:2 | ||
| └─Selection cop[tikv] in(planner__core__casetest__physicalplantest__physical_plan.t.c, 1, 2, 3) | ||
| └─TableRowIDScan cop[tikv] table:t keep order:false, stats:pseudo | ||
| >>>>>>> d568a8528e9 (planner: support using nested `IN` to build IndexMerge path (#68962)) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Locate paired integration test files
fd -i 'physical_plan.test' tests/integrationtest/t
fd -i 'physical_plan.result' tests/integrationtest/r
# 2) Ensure these two queries are now plan_tree in test input
rg -nF "explain format = 'plan_tree' select * from t where (a = 1 or b = 2) and c in (1, 2, 3) order by c limit 2;" tests/integrationtest/t
rg -nF "explain format = 'plan_tree' select * from t where (a = 1 or b = 2) and c in (1, 2, 3) order by b limit 2;" tests/integrationtest/t
# 3) Sanity check old brief-format variants are not still present for the same two queries
if rg -nF "explain format = 'brief' select * from t where (a = 1 or b = 2) and c in (1, 2, 3) order by c limit 2;" tests/integrationtest/t; then
echo "Found old brief-format c-order query in .test; please confirm whether dual coverage is intentional."
fi
if rg -nF "explain format = 'brief' select * from t where (a = 1 or b = 2) and c in (1, 2, 3) order by b limit 2;" tests/integrationtest/t; then
echo "Found old brief-format b-order query in .test; please confirm whether dual coverage is intentional."
fiRepository: pingcap/tidb
Length of output: 336
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Show the exact input/output context around the reported result lines
echo "=== physicalplantest/physical_plan.result: lines 3630-3720 ==="
sed -n '3630,3720p' tests/integrationtest/r/planner/core/casetest/physicalplantest/physical_plan.result | nl -ba | sed -n '1,200p'
# 2) Search the corresponding .test file for the unique WHERE clause and order-by/limit substrings
echo "=== Search in tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test ==="
rg -n --fixed-string "(a = 1 or b = 2) and c in (1, 2, 3)" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true
rg -n --fixed-string "order by c limit 2" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true
rg -n --fixed-string "order by b limit 2" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true
# 3) Check whether brief-format variants exist near those lines
echo "=== Search brief format occurrences in that .test file ==="
rg -n "explain format = 'brief'" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test | head -n 50 || true
# Narrow: search for brief format that also contains the WHERE clause
echo "=== brief-format occurrences with the WHERE clause ==="
rg -n "explain format = 'brief'.{0,200}\(a = 1 or b = 2\).*c in \(1, 2, 3\)" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || trueRepository: pingcap/tidb
Length of output: 180
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show context around the reported result lines with line numbers via awk
echo "=== physicalplantest/physical_plan.result: lines 3630-3720 ==="
sed -n '3630,3720p' tests/integrationtest/r/planner/core/casetest/physicalplantest/physical_plan.result \
| awk '{printf "%5d\t%s\n", (3630+NR-1), $0}'
echo "=== Search in tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test ==="
rg -n --fixed-string "(a = 1 or b = 2) and c in (1, 2, 3)" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true
rg -n --fixed-string "order by c limit 2" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true
rg -n --fixed-string "order by b limit 2" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true
echo "=== brief format occurrences in that .test file ==="
rg -n "explain format = 'brief'" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test | head -n 200 || true
echo "=== brief occurrences with the WHERE clause (narrow) ==="
rg -n "explain format = 'brief'.{0,300}\(a = 1 or b = 2\).*c in \(1, 2, 3\)" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true
echo "=== plan_tree occurrences with the WHERE clause (narrow) ==="
rg -n "explain format = 'plan_tree'.{0,300}\(a = 1 or b = 2\).*c in \(1, 2, 3\)" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || trueRepository: pingcap/tidb
Length of output: 36093
Fix unresolved conflict markers and align plan_tree inputs/golden outputs
tests/integrationtest/r/planner/core/casetest/physicalplantest/physical_plan.resultstill contains merge conflict markers (<<<<<<< HEAD/=======/>>>>>>> ...) around this block, mixingbriefandplan_treeoutputs (e.g.,plan_treequeries start atexplain format = 'plan_tree' ...).- The paired input
tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.teststill contains only theexplain format = 'brief' ... order by c/b limit 2statements for the same queries; noplan_treecounterparts were found.
Remove the conflict markers, update the .test statements to the intended plan_tree form (for both queries), and regenerate the corresponding .result.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@tests/integrationtest/r/planner/core/casetest/physicalplantest/physical_plan.result`
around lines 3664 - 3684, The test result file physical_plan.result contains
unresolved Git conflict markers mixing 'brief' and 'plan_tree' outputs; fix by
removing the conflict markers around the two explain blocks and ensure the
inputs in physical_plan.test match the intended "explain format = 'plan_tree'
select ... order by c limit 2" and "explain format = 'plan_tree' select ...
order by b limit 2" statements, then regenerate the golden output so the
plan_tree outputs in physical_plan.result correspond exactly to those two
explain queries (verify the IndexMerge/IndexRangeScan/TopN blocks for each query
are present and consistent).
Source: Learnings
|
@ti-chi-bot: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This is an automated cherry-pick of #68962
What problem does this PR solve?
Issue Number: ref #65822
Problem Summary:
For queries like
SELECT * FROM t1 WHERE e = 1 AND (a IN (1,2,3) OR b IN (2,3,4) OR c IN (3,4,5)), TiDB previously could not build an IndexMerge path when there are IN expressions in the nested OR list. The query would fall back to a plain IndexLookUp with a residual Selection, which is much less efficient.This is the first optimization described in the issue.
What changed and how does it work?
checkAccessFilter4IdxCol()(pkg/planner/core/indexmerge_path.go): Add support forast.Inexpressions in the non-virtual column branch. Previously onlyast.EQwas recognized, so IN expressions likea IN (1,2,3)could not be collected as partial access filters in the "gradual collection" path (case 3 ininitUnfinishedPathsFromExpr()). Now they are collected and later combined with top-level AND conditions (e.g.,e = 1) byhandleTopLevelANDList()to build valid ranges for composite indexes.eqOnNonMVColTptoeqOrInOnNonMVColTpto reflect that it now covers both EQ and IN expressions.After this fix, the plan becomes:
Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit
Performance Improvements
Tests