Skip to content

planner: support using nested IN to build IndexMerge path (#68962)#69003

Open
ti-chi-bot wants to merge 1 commit into
pingcap:release-8.5from
ti-chi-bot:cherry-pick-68962-to-release-8.5
Open

planner: support using nested IN to build IndexMerge path (#68962)#69003
ti-chi-bot wants to merge 1 commit into
pingcap:release-8.5from
ti-chi-bot:cherry-pick-68962-to-release-8.5

Conversation

@ti-chi-bot
Copy link
Copy Markdown
Member

@ti-chi-bot ti-chi-bot commented Jun 6, 2026

This is an automated cherry-pick of #68962

What problem does this PR solve?

Issue Number: ref #65822

Problem Summary:

For queries like SELECT * FROM t1 WHERE e = 1 AND (a IN (1,2,3) OR b IN (2,3,4) OR c IN (3,4,5)), TiDB previously could not build an IndexMerge path when there are IN expressions in the nested OR list. The query would fall back to a plain IndexLookUp with a residual Selection, which is much less efficient.

This is the first optimization described in the issue.

What changed and how does it work?

  • planner
    • checkAccessFilter4IdxCol() (pkg/planner/core/indexmerge_path.go): Add support for ast.In expressions in the non-virtual column branch. Previously only ast.EQ was recognized, so IN expressions like a IN (1,2,3) could not be collected as partial access filters in the "gradual collection" path (case 3 in initUnfinishedPathsFromExpr()). Now they are collected and later combined with top-level AND conditions (e.g., e = 1) by handleTopLevelANDList() to build valid ranges for composite indexes.
    • Rename eqOnNonMVColTp to eqOrInOnNonMVColTp to reflect that it now covers both EQ and IN expressions.

After this fix, the plan becomes:

IndexMerge
├─IndexRangeScan  index:iea(e, a)  range:[1 1,1 1], [1 2,1 2], [1 3,1 3]
├─IndexRangeScan  index:ieb(e, b)  range:[1 2,1 2], [1 3,1 3], [1 4,1 4]
├─IndexRangeScan  index:iec(e, c)  range:[1 3,1 3], [1 4,1 4], [1 5,1 5]
└─TableRowIDScan(Probe)

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • Performance Improvements

    • Enhanced query optimization for statements using IN predicates with index merging, enabling more efficient index selection for complex query patterns.
  • Tests

    • Added comprehensive test coverage for advanced index merge scenarios with IN expressions.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot ti-chi-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/M Denotes a PR that changes 30-99 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR. labels Jun 6, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 6, 2026

This cherry pick PR is for a release branch and has not yet been approved by triage owners.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick:

  1. It must be LGTMed and approved by the reviewers firstly.
  2. For pull requests to TiDB-x branches, it must have no failed tests.
  3. AFTER it has lgtm and approved labels, please wait for the cherry-pick merging approval from triage owners.
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Copy Markdown
Member Author

@time-and-fate This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 6, 2026

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 6, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 6, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign qw4990 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 6, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR extends TiDB's IndexMerge planner to support IN predicates with constant values as access filters on non-MV columns, treating them equivalently to equality predicates. The change introduces a new combined access-filter type, updates predicate validation logic, modifies filter collection in unfinished-path construction, and adds comprehensive test coverage.

Changes

IndexMerge IN Predicate Support

Layer / File(s) Summary
Combined eqOrIn access-filter type definition and validation
pkg/planner/core/indexmerge_path.go
New eqOrInOnNonMVColTp constant represents access predicates that are either = or IN (...) on non-MV columns. checkAccessFilter4IdxCol now validates IN expressions where the left argument is the target column and all right arguments are constants, returning the combined type. Both EQ and IN cases return eqOrInOnNonMVColTp. CollectFilters4MVIndex propagates this combined type when overwriting accessTp.
Filter collection in unfinished-path builder
pkg/planner/core/indexmerge_unfinished_path.go
initUnfinishedPathsFromExpr updated to accept eqOrInOnNonMVColTp predicates and route qualifying CNF items into usableFilters with idxColHasUsableFilter tracking instead of the prior accessFilters/idxColHasAccessFilter behavior.
Test cases and expected outputs
tests/integrationtest/t/planner/core/indexmerge_path.test, tests/integrationtest/r/planner/core/indexmerge_path.result, tests/integrationtest/r/planner/core/casetest/physicalplantest/physical_plan.result
New test cases cover Issue #58361 (AND+OR filter combinations), MV-index OR with ORDER BY/LIMIT, sort-item hint interactions, and nested IN expressions within OR lists (Issue #68962). Expected outputs for c in (1, 2, 3) queries with ORDER BY and LIMIT updated to plan-tree format, showing detailed operator structure including IndexMerge, IndexRangeScan, Selection, and TopN nodes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • pingcap/tidb#68753: Directly aligned with this PR's IN predicate handling for IndexMerge merge-sort and order-by partial paths.

Suggested labels

size/M, cherry-pick-approved, ok-to-test

Suggested reviewers

  • terry1purcell
  • qw4990

Poem

🐰 Nested IN predicates now bloom,
IndexMerge plans light up the room,
Constants validate with care,
Filter types beyond compare,
Access paths take to the air!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description is mostly complete, covering the problem statement, solution details, and test coverage. However, it has merge-conflict markers present in the test result files mentioned in the changeset. Resolve the merge-conflict markers found in test files (<<<<<<< HEAD, >>>>>>> markers in physical_plan.result and indexmerge_path.result/test) before merging to ensure clean test outputs.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically describes the main change: adding support for nested IN expressions in IndexMerge path planning, directly addressing issue #68962.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Jun 6, 2026

@ti-chi-bot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
fast_test_tiprow_for_release 05b393b link true /test fast_test_tiprow_for_release

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@tests/integrationtest/r/planner/core/casetest/physicalplantest/physical_plan.result`:
- Around line 3664-3684: The test result file physical_plan.result contains
unresolved Git conflict markers mixing 'brief' and 'plan_tree' outputs; fix by
removing the conflict markers around the two explain blocks and ensure the
inputs in physical_plan.test match the intended "explain format = 'plan_tree'
select ... order by c limit 2" and "explain format = 'plan_tree' select ...
order by b limit 2" statements, then regenerate the golden output so the
plan_tree outputs in physical_plan.result correspond exactly to those two
explain queries (verify the IndexMerge/IndexRangeScan/TopN blocks for each query
are present and consistent).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fabe6aa3-8ef1-49e0-a199-3213811217ed

📥 Commits

Reviewing files that changed from the base of the PR and between de0a5bc and 05b393b.

📒 Files selected for processing (5)
  • pkg/planner/core/indexmerge_path.go
  • pkg/planner/core/indexmerge_unfinished_path.go
  • tests/integrationtest/r/planner/core/casetest/physicalplantest/physical_plan.result
  • tests/integrationtest/r/planner/core/indexmerge_path.result
  • tests/integrationtest/t/planner/core/indexmerge_path.test

Comment on lines +3664 to +3684
explain format = 'plan_tree' select * from t where (a = 1 or b = 2) and c in (1, 2, 3) order by c limit 2;
id task access object operator info
Limit root offset:0, count:2
└─Projection root planner__core__casetest__physicalplantest__physical_plan.t.a, planner__core__casetest__physicalplantest__physical_plan.t.b, planner__core__casetest__physicalplantest__physical_plan.t.c
└─IndexMerge root type: union
├─IndexRangeScan(Build) cop[tikv] table:t, index:idx(a, c) range:[1 1,1 1], [1 2,1 2], [1 3,1 3], keep order:true, stats:pseudo
├─IndexRangeScan(Build) cop[tikv] table:t, index:idx2(b, c) range:[2 1,2 1], [2 2,2 2], [2 3,2 3], keep order:true, stats:pseudo
└─Selection(Probe) cop[tikv] in(planner__core__casetest__physicalplantest__physical_plan.t.c, 1, 2, 3)
└─TableRowIDScan cop[tikv] table:t keep order:false, stats:pseudo
show warnings;
Level Code Message
explain format = 'plan_tree' select * from t where (a = 1 or b = 2) and c in (1, 2, 3) order by b limit 2;
id task access object operator info
TopN root planner__core__casetest__physicalplantest__physical_plan.t.b, offset:0, count:2
└─IndexMerge root type: union
├─IndexRangeScan(Build) cop[tikv] table:t, index:idx(a, c) range:[1 1,1 1], [1 2,1 2], [1 3,1 3], keep order:false, stats:pseudo
├─IndexRangeScan(Build) cop[tikv] table:t, index:idx2(b, c) range:[2 1,2 1], [2 2,2 2], [2 3,2 3], keep order:false, stats:pseudo
└─TopN(Probe) cop[tikv] planner__core__casetest__physicalplantest__physical_plan.t.b, offset:0, count:2
└─Selection cop[tikv] in(planner__core__casetest__physicalplantest__physical_plan.t.c, 1, 2, 3)
└─TableRowIDScan cop[tikv] table:t keep order:false, stats:pseudo
>>>>>>> d568a8528e9 (planner: support using nested `IN` to build IndexMerge path (#68962))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Locate paired integration test files
fd -i 'physical_plan.test' tests/integrationtest/t
fd -i 'physical_plan.result' tests/integrationtest/r

# 2) Ensure these two queries are now plan_tree in test input
rg -nF "explain format = 'plan_tree' select * from t where (a = 1 or b = 2) and c in (1, 2, 3) order by c limit 2;" tests/integrationtest/t
rg -nF "explain format = 'plan_tree' select * from t where (a = 1 or b = 2) and c in (1, 2, 3) order by b limit 2;" tests/integrationtest/t

# 3) Sanity check old brief-format variants are not still present for the same two queries
if rg -nF "explain format = 'brief' select * from t where (a = 1 or b = 2) and c in (1, 2, 3) order by c limit 2;" tests/integrationtest/t; then
  echo "Found old brief-format c-order query in .test; please confirm whether dual coverage is intentional."
fi
if rg -nF "explain format = 'brief' select * from t where (a = 1 or b = 2) and c in (1, 2, 3) order by b limit 2;" tests/integrationtest/t; then
  echo "Found old brief-format b-order query in .test; please confirm whether dual coverage is intentional."
fi

Repository: pingcap/tidb

Length of output: 336


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Show the exact input/output context around the reported result lines
echo "=== physicalplantest/physical_plan.result: lines 3630-3720 ==="
sed -n '3630,3720p' tests/integrationtest/r/planner/core/casetest/physicalplantest/physical_plan.result | nl -ba | sed -n '1,200p'

# 2) Search the corresponding .test file for the unique WHERE clause and order-by/limit substrings
echo "=== Search in tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test ==="
rg -n --fixed-string "(a = 1 or b = 2) and c in (1, 2, 3)" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true
rg -n --fixed-string "order by c limit 2" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true
rg -n --fixed-string "order by b limit 2" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true

# 3) Check whether brief-format variants exist near those lines
echo "=== Search brief format occurrences in that .test file ==="
rg -n "explain format = 'brief'" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test | head -n 50 || true

# Narrow: search for brief format that also contains the WHERE clause
echo "=== brief-format occurrences with the WHERE clause ==="
rg -n "explain format = 'brief'.{0,200}\(a = 1 or b = 2\).*c in \(1, 2, 3\)" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true

Repository: pingcap/tidb

Length of output: 180


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show context around the reported result lines with line numbers via awk
echo "=== physicalplantest/physical_plan.result: lines 3630-3720 ==="
sed -n '3630,3720p' tests/integrationtest/r/planner/core/casetest/physicalplantest/physical_plan.result \
| awk '{printf "%5d\t%s\n", (3630+NR-1), $0}'

echo "=== Search in tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test ==="
rg -n --fixed-string "(a = 1 or b = 2) and c in (1, 2, 3)" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true
rg -n --fixed-string "order by c limit 2" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true
rg -n --fixed-string "order by b limit 2" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true

echo "=== brief format occurrences in that .test file ==="
rg -n "explain format = 'brief'" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test | head -n 200 || true

echo "=== brief occurrences with the WHERE clause (narrow) ==="
rg -n "explain format = 'brief'.{0,300}\(a = 1 or b = 2\).*c in \(1, 2, 3\)" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true

echo "=== plan_tree occurrences with the WHERE clause (narrow) ==="
rg -n "explain format = 'plan_tree'.{0,300}\(a = 1 or b = 2\).*c in \(1, 2, 3\)" tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test || true

Repository: pingcap/tidb

Length of output: 36093


Fix unresolved conflict markers and align plan_tree inputs/golden outputs

  • tests/integrationtest/r/planner/core/casetest/physicalplantest/physical_plan.result still contains merge conflict markers (<<<<<<< HEAD / ======= / >>>>>>> ...) around this block, mixing brief and plan_tree outputs (e.g., plan_tree queries start at explain format = 'plan_tree' ...).
  • The paired input tests/integrationtest/t/planner/core/casetest/physicalplantest/physical_plan.test still contains only the explain format = 'brief' ... order by c/b limit 2 statements for the same queries; no plan_tree counterparts were found.

Remove the conflict markers, update the .test statements to the intended plan_tree form (for both queries), and regenerate the corresponding .result.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tests/integrationtest/r/planner/core/casetest/physicalplantest/physical_plan.result`
around lines 3664 - 3684, The test result file physical_plan.result contains
unresolved Git conflict markers mixing 'brief' and 'plan_tree' outputs; fix by
removing the conflict markers around the two explain blocks and ensure the
inputs in physical_plan.test match the intended "explain format = 'plan_tree'
select ... order by c limit 2" and "explain format = 'plan_tree' select ...
order by b limit 2" statements, then regenerate the golden output so the
plan_tree outputs in physical_plan.result correspond exactly to those two
explain queries (verify the IndexMerge/IndexRangeScan/TopN blocks for each query
are present and consistent).

Source: Learnings

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 6, 2026

@ti-chi-bot: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/check_dev_2 05b393b link true /test check-dev2
idc-jenkins-ci-tidb/check_dev 05b393b link true /test check-dev
idc-jenkins-ci-tidb/mysql-test 05b393b link true /test mysql-test
idc-jenkins-ci-tidb/unit-test 05b393b link true /test unit-test
idc-jenkins-ci-tidb/build 05b393b link true /test build

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/cherry-pick-not-approved do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants