Skip to content

planner: correlate subquery rule#66206

Merged
ti-chi-bot[bot] merged 29 commits into
pingcap:masterfrom
terry1purcell:correlate
Apr 21, 2026
Merged

planner: correlate subquery rule#66206
ti-chi-bot[bot] merged 29 commits into
pingcap:masterfrom
terry1purcell:correlate

Conversation

@terry1purcell
Copy link
Copy Markdown
Contributor

@terry1purcell terry1purcell commented Feb 11, 2026

What problem does this PR solve?

Issue Number: close #66320

Problem Summary:

What changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • New Features

    • Introduces an alternative optimization round that can convert eligible semi-joins into correlated apply-style plans, improving planner choice between join vs apply and preserving correct NULL semantics for correlated subqueries.
  • Tests

    • Adds comprehensive test suites and fixtures covering correlated subquery behaviors (IN/EXISTS/NOT IN), null semantics, plan shapes (EXPLAIN), parallel apply, and cost-factor-driven plan selection across planner modes.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. sig/planner SIG: Planner labels Feb 11, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented Feb 11, 2026

General (non-inline) review: systemic optimizer issues surfaced by the new correlate rule

This PR adds a new logical rule CorrelateSolver (semi-join -> correlated Apply) gated by tidb_opt_enable_correlate_subquery. While reviewing the optimizer plumbing around it, I verified two pre-existing issues in pkg/planner/core/optimizer.go that are not touched by this PR diff but become more important with another late-stage rewrite rule.

P1: Predicate push down cannot reliably participate in follow-up or interaction optimizations

  • Registration and order: pkg/planner/core/optimizer.go:97 (&PPDSolver{} runs before the new correlate rewrite)
  • Change signal bug: pkg/planner/core/rule_predicate_push_down.go:43 always returns planChanged=false even when PredicatePushDown mutates or rebuilds the plan
    Why it matters here: CorrelateSolver introduces new inner Selection predicates after PPD has already run, but the framework has no reliable way to trigger a second PPD (or any interaction) pass based on PPD changes. This risks leaving the inner side with avoidable full scans or missing access condition derivation when the correlate rule is enabled.

Recommendation:

  • Make PPDSolver.Optimize return an accurate planChanged (or conservatively true), or extend LogicalPlan.PredicatePushDown to return a change flag.
  • Consider a follow-up pass when FlagCorrelate is enabled (for example rerun PPD on the rewritten subtree) so the new correlated predicates can be used for access path selection.

P2: FlagPruneColumnsAgain derivation is fragile and computed before conditional flags are added

  • pkg/planner/core/optimizer.go:347 computes FlagPruneColumnsAgain via flag-rule.FlagPruneColumns > rule.FlagPruneColumns early in adjustOptimizationFlags.
    Why it matters here: this PR adds another conditional flag (FlagCorrelate) inside adjustOptimizationFlags. Any future conditional flag added after the prune-again check can make the final flag set inconsistent with the intended "something after prune columns" rule, and the arithmetic check is hard to reason about when maintaining flag order.

Recommendation:

  • Recompute the prune-again condition at the end of adjustOptimizationFlags using explicit bit operations, after all flag |= ... and flag &= ... mutations.
  • Add a small unit test that asserts len(optRuleList) matches the number and order of rule.Flag* bits to prevent silent misalignment when adding new rules.

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Feb 11, 2026

Hi @terry1purcell. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 79.33884% with 100 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.1728%. Comparing base (ca3c902) to head (be086e8).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #66206        +/-   ##
================================================
+ Coverage   77.7850%   78.1728%   +0.3877%     
================================================
  Files          1983       1976         -7     
  Lines        549122     550035       +913     
================================================
+ Hits         427135     429978      +2843     
+ Misses       121067     119627      -1440     
+ Partials        920        430       -490     
Flag Coverage Δ
integration 43.6751% <6.6666%> (+3.8779%) ⬆️
unit 76.7157% <79.3388%> (+0.3746%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 66.1193% <ø> (+4.6127%) ⬆️
parser ∅ <ø> (∅)
br 50.0681% <ø> (-13.0316%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@terry1purcell
Copy link
Copy Markdown
Contributor Author

/ok-to-test

@ti-chi-bot ti-chi-bot Bot added the ok-to-test Indicates a PR is ready to be tested. label Feb 11, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an (optional) logical-optimizer rule to “re-correlate” certain semi-joins back into correlated Apply, gated by a new session/global system variable. This aims to enable correlated index lookups when they’re expected to outperform a hash semi-join.

Changes:

  • Introduce CorrelateSolver logical rule and wire it into the logical optimizer behind a new FlagCorrelate.
  • Add sysvar + session var plumbing for tidb_opt_enable_correlate_subquery.
  • Add casetests for the rule and a unit test to ensure optRuleList and Flag* alignment.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/sessionctx/variable/sysvar.go Registers tidb_opt_enable_correlate_subquery sysvar and applies it to SessionVars.
pkg/sessionctx/variable/session.go Adds EnableCorrelateSubquery to SessionVars and initializes it.
pkg/sessionctx/vardef/tidb_vars.go Defines sysvar name constant and default value.
pkg/planner/core/rule_correlate.go Implements the new semi-join → correlated Apply transformation rule.
pkg/planner/core/rule/logical_rules.go Adds FlagCorrelate to the logical rule flag bitset.
pkg/planner/core/optimizer.go Appends CorrelateSolver to optRuleList and conditionally enables FlagCorrelate; adjusts FlagPruneColumnsAgain computation.
pkg/planner/core/optimizer_test.go Adds a test to detect misalignment between optRuleList and Flag* constants.
pkg/planner/core/casetest/rule/testdata/correlate_suite_in.json Input SQLs for correlate rule casetests.
pkg/planner/core/casetest/rule/testdata/correlate_suite_out.json Expected plans/results (non-xunit) for casetests.
pkg/planner/core/casetest/rule/testdata/correlate_suite_xut.json Expected plans/results (xunit) for casetests.
pkg/planner/core/casetest/rule/rule_correlate_test.go Adds correlate rule casetest runner and enables the sysvar during tests.
pkg/planner/core/casetest/rule/main_test.go Loads the new correlate test suite data.
pkg/planner/core/casetest/rule/BUILD.bazel Includes the new test file and updates shard count.
pkg/planner/core/BUILD.bazel Adds rule_correlate.go to the core library build.

Comment thread pkg/planner/core/rule_correlate.go
Comment thread pkg/sessionctx/variable/sysvar.go Outdated
@ti-chi-bot ti-chi-bot Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 11, 2026
@terry1purcell
Copy link
Copy Markdown
Contributor Author

@pantheon-bot please review

@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented Feb 11, 2026

Re-review note: this PR has 3 new commits since my last pass. I rechecked the correlate-subquery rewrite path (gated by tidb_opt_enable_correlate_subquery / EnableCorrelateSubquery). The issues below are verified on the current diff.

Severity note: all findings are P2 mainly because the feature is default OFF. For the executor-side memory risk, the statement memory tracker (default 1GB, action CANCEL) provides a safety net, but the amplification can still be severe for sessions that enable the flag.


P2-1: OOM/DoS risk - Apply materializes full inner result per outer row

What happens
CorrelateSolver rewrites eligible semi-joins to LogicalApply and does not introduce a Limit or other short-circuit for EXISTS/SEMI semantics. At execution time, NestedLoopApplyExec can read and materialize the entire inner side for each outer row (or outer cache key) into a chunk.List.

Evidence

  • pkg/planner/core/rule_correlate.go:116 (builds LogicalApply with the inner plan; no Limit/short-circuit introduced)
  • pkg/executor/join/hash_join_v1.go:1295 (NestedLoopApplyExec.fetchAllInners loops exec.Next and InnerList.AppendRow for all selected inner rows)

Why it matters
For non-selective correlation keys, duplicate outer keys, or missing/ineffective indexes on the inner side, this becomes O(outer_rows * inner_rows) work and can amplify memory usage per outer row/key. Even with the statement memory tracker, this is a CPU and memory DoS vector for workloads that enable the flag.

Recommendations

  • For EXISTS/SEMI semantics, stop reading inner after the first match and avoid materializing all inner rows.
  • Consider inserting/propagating a logical Limit 1 (or equivalent physical short-circuit) for EXISTS-style inner plans.
  • Add guardrails: avoid Apply when the chosen inner access path is a full scan or when estimated inner rows per key is large; keep/fall back to the set-based semi join.
  • Add stress/regression tests that cover duplicated outer keys and non-selective correlations.

P2-2: Unconditional rewrite without cost or access-path guard

What happens
Once the shape checks pass, the rule always returns the rewritten LogicalApply, without considering cost, selectivity, or whether the inner subtree has a viable access path driven by the correlated predicates.

Evidence

  • pkg/planner/core/rule_correlate.go:120 (returns ap, true unconditionally after checks)

Why it matters
This can regress decorrelatable EXISTS/NOT EXISTS from a set-based hash semi-join to per-row Apply, especially when the inner side cannot use the correlated predicate to pick an index/range access path. That is a large performance risk for any session that enables the knob.

Recommendations

  • Add a cost-based decision: only rewrite when estimated outer rows are small and/or the inner plan has an index/range access using the correlated columns.
  • If stats are missing/unknown, be conservative and keep the original semi-join.
  • Consider keeping both alternatives through logical/physical optimization and selecting by cost.

P2-3: Join hint / preference metadata dropped during transform

What happens
The transform constructs a fresh LogicalApply and copies only JoinType/CorCols/children/schema/output names. Join hint state and join-preference metadata from the original semi-join are not propagated.

Evidence

  • pkg/planner/core/rule_correlate.go:113 (new LogicalApply{}.Init(...) without copying hint fields)

Why it matters
When the correlate-subquery rewrite is enabled, user-provided join hints or join ordering preferences can be silently ignored, leading to surprising plans and making performance tuning harder.

Recommendations

  • Copy join hint related fields from the original join node to LogicalApply (for example HintInfo, PreferJoinType, PreferJoinOrder), or ensure they are stored in shared state that survives transforms.
  • Add a planner test asserting hints remain effective after the rewrite.

P2-4: Missing relevant optimizer variable tracking

What happens
The session variable tidb_opt_enable_correlate_subquery / EnableCorrelateSubquery affects optimization flags, but it is not recorded via RecordRelevantOptVar.

Evidence

  • pkg/planner/core/optimizer.go:363 (reads EnableCorrelateSubquery to set rule.FlagCorrelate, but does not record it)

Why it matters
Plan replayer dumps all sysvars, so the debuggability impact is reduced, but the relevant optimizer vars reporting becomes incomplete, which can hinder triage when comparing plans.

Recommendations

  • Add RecordRelevantOptVar for this variable when it influences optimization flags, consistent with other optimizer knobs.

If helpful, I can sketch a minimal fix set: (a) propagate hint fields in the rewrite, (b) add RecordRelevantOptVar, and (c) add a conservative cost/access-path check plus executor early-exit for EXISTS semantics to reduce worst-case amplification.

@terry1purcell terry1purcell changed the title planner: correlate subquery rule (wip) planner: correlate subquery rule Feb 21, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

@yudongusa yudongusa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please open a document PR on this

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 21, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qw4990, yudongusa

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added approved lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 21, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 21, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-04-20 08:41:58.826073601 +0000 UTC m=+1982524.031433658: ☑️ agreed by qw4990.
  • 2026-04-21 04:13:24.972335572 +0000 UTC m=+2052810.177695629: ☑️ agreed by yudongusa.

@terry1purcell
Copy link
Copy Markdown
Contributor Author

/retest-required

4 similar comments
@terry1purcell
Copy link
Copy Markdown
Contributor Author

/retest-required

@terry1purcell
Copy link
Copy Markdown
Contributor Author

/retest-required

@terry1purcell
Copy link
Copy Markdown
Contributor Author

/retest-required

@terry1purcell
Copy link
Copy Markdown
Contributor Author

/retest-required

@terry1purcell
Copy link
Copy Markdown
Contributor Author

/retest-required

1 similar comment
@terry1purcell
Copy link
Copy Markdown
Contributor Author

/retest-required

@ti-chi-bot ti-chi-bot Bot merged commit 7357a2e into pingcap:master Apr 21, 2026
35 checks passed
@terry1purcell terry1purcell deleted the correlate branch April 22, 2026 00:03
premal pushed a commit to premal/tidb that referenced this pull request Apr 30, 2026
AilinKid pushed a commit to AilinKid/tidb that referenced this pull request May 14, 2026
AilinKid pushed a commit to AilinKid/tidb that referenced this pull request May 14, 2026
AilinKid pushed a commit to AilinKid/tidb that referenced this pull request May 15, 2026
AilinKid pushed a commit to AilinKid/tidb that referenced this pull request May 15, 2026
@ti-chi-bot ti-chi-bot Bot added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label May 29, 2026
@ti-chi-bot
Copy link
Copy Markdown
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #68752.
But this PR has conflicts, please resolve them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Planner: correlate a non-correlated IN subquery

7 participants