planner: rewrite FTS predicates to LIKE for evaluation of non-TiCI query plan#65626
Conversation
|
Hi @terry1purcell. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
There was a problem hiding this comment.
Pull request overview
This pull request adds functionality to rewrite MATCH...AGAINST (fulltext search) predicates to LIKE predicates when no fulltext index is available. This is controlled by a new session variable tidb_opt_fulltext_search_fallback which can be set to either 'like' (convert to LIKE predicates, default) or 'error' (throw an error if no fulltext index exists).
Changes:
- Added new session variable
tidb_opt_fulltext_search_fallbackto control fulltext search fallback behavior - Implemented conversion logic from MATCH...AGAINST to LIKE predicates supporting both natural language mode and Boolean mode
- Added unit and integration tests for the conversion logic
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/sessionctx/vardef/tidb_vars.go | Defines the new session variable constant and default value |
| pkg/sessionctx/variable/sysvar.go | Registers the new session variable with its configuration |
| pkg/sessionctx/variable/session.go | Adds the FulltextSearchFallback field to SessionVars struct |
| pkg/planner/core/fulltext_to_like.go | Implements the conversion logic from MATCH...AGAINST to LIKE predicates |
| pkg/planner/core/fulltext_to_like_test.go | Unit tests for parsing Boolean search strings |
| pkg/planner/core/expression_rewriter.go | Integrates MATCH...AGAINST handling into the expression rewriter |
| pkg/planner/core/BUILD.bazel | Adds new source and test files to the build configuration |
| tests/integrationtest/t/planner/core/fulltext_search.test | Integration test cases for the fulltext to LIKE conversion |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #65626 +/- ##
================================================
- Coverage 77.7123% 76.3877% -1.3247%
================================================
Files 1991 1992 +1
Lines 552087 562593 +10506
================================================
+ Hits 429040 429752 +712
- Misses 122127 132315 +10188
+ Partials 920 526 -394
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| // perfectly enforce word-start boundaries. We use %term% which may produce false positives | ||
| // (matching mid-word like "reOptimizing"), but avoids false negatives. This is an acceptable | ||
| // limitation for a fallback implementation. | ||
| var pattern string | ||
| // Both prefix and general matches use %term% to find the term anywhere in text | ||
| pattern = "%" + escapedTerm + "%" |
There was a problem hiding this comment.
The comment explains that prefix matching (with * wildcard) uses the same %term% pattern as regular matching, making the isPrefixMatch parameter effectively unused. While this is documented as an acceptable limitation, the parameter serves no functional purpose in the current implementation and could be removed for clarity, or the implementation could be enhanced to use %term (without trailing %) for prefix matches to better approximate the intended behavior.
| // perfectly enforce word-start boundaries. We use %term% which may produce false positives | |
| // (matching mid-word like "reOptimizing"), but avoids false negatives. This is an acceptable | |
| // limitation for a fallback implementation. | |
| var pattern string | |
| // Both prefix and general matches use %term% to find the term anywhere in text | |
| pattern = "%" + escapedTerm + "%" | |
| // perfectly enforce word-start boundaries. We previously used %term% for both prefix and | |
| // general matches, which may produce false positives (matching mid-word like "reOptimizing"), | |
| // but avoids false negatives. Here we approximate prefix semantics by using a different | |
| // pattern when isPrefixMatch is true. | |
| var pattern string | |
| if isPrefixMatch { | |
| // For prefix matches, use %term (no trailing %) to better approximate the intended behavior. | |
| pattern = "%" + escapedTerm | |
| } else { | |
| // For general matches, keep using %term% to find the term anywhere in the text. | |
| pattern = "%" + escapedTerm + "%" | |
| } |
…hange The substitution comment said "AGAINST(NULL) and empty-string search produce a constant substitute (Constant 0)" but commit b0b04c4 changed BuildFTSToILikeExpressionFromBuiltin to emit Constant(NULL) for AGAINST(NULL) (empty-string search still uses Constant(0) via ftsZeroIntConst). Update the comment to spell out which shape comes from which case and to note that the constant-folding pass below handles both shapes equivalently via its IsNull and ToBool branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…token ValidateFTSSearchStringForLikeFallback indexed body[0] inside the boolean- mode operator-strip branch without first checking len(body). The indexing is safe today because strings.Fields never returns an empty token, but the guard makes the bound explicit at the call site and protects against any future change to the tokenization that could produce an empty substring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/hold I found a problem in the cost competition |
Round 1 already runs with the native FTSMysqlMatchAgainst builtin; the
fts-like-fallback alternative round previously fired only when round 1's
plan was non-executable. Queries where both plans were valid (alphanumeric
word search in a direct-boolean predicate position) had no competitor —
the native plan won by default even when an ILIKE plan would have been
cheaper given a selective non-FTS predicate.
Split the planBuilder signal into two:
- HasNonViableFTSMatch (existing): round 1's plan cannot execute;
driver discards it and forces FTSLikeFallback across all subsequent
rounds.
- HasPredicateMatch (new): round 1 saw a direct-boolean-context MATCH.
Driver propagates this into stmtctx as
AlternativeLogicalPlanHasPredicateContextMatch, enabling the
fts-like-fallback round for cost competition.
The fts-like-fallback round now:
- enables on (FTSLikeFallback || HasPredicateContextMatch),
- via setup/cleanup saves and forces AlternativeLogicalPlanFTSLikeFallback
true during its own build so the rewriter emits ILIKE,
- restores the flag after the round.
Behavior matrix:
| native viable | predicate MATCH | LIKE round | outcome |
| yes | yes | yes | strict-< cost compare |
| yes | no (scoring) | no | native wins |
| no | yes | yes | LIKE wins (discard) |
| no | yes, LIKE bad | yes(error) | surface LIKE error |
The existing fulltext_search integration suite (95 cases, all on a
TiKV-only / no-FTS-index environment) exercises only the discard path and
continues to pass unchanged. The new cost-competition branch requires a
TiFlash replica + public FTS index in test fixtures and is not exercised
by the current integration test environment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/retest-required |
|
/unhold |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AilinKid, Benjamin2037, qw4990, windtalker The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
In response to a cherrypick label: new pull request created to branch |
What problem does this PR solve?
Issue Number: close #68153
Problem Summary:
What changed and how does it work?
Summary
When tidb_opt_enable_alternative_logical_plans=ON, adds a fallback that rewrites MATCH ... AGAINST to case-insensitive ILIKE predicates so full-text-search queries can execute without an FTS index. The rewrite is intentionally conservative: it only fires for a strict subset of search strings in direct-boolean predicate positions, and reaches the plan only when the round-1 native path is not viable. Anything outside that envelope either keeps the native builtin (errors at execution without an FTS index) or is rejected at plan time, never silently producing wrong rows.
Architecture
Round 1 (default, matches Alt-disabled behavior) — emits the native FTSMysqlMatchAgainst builtin. The expression rewriter records nonViableFTSMatch on the PlanBuilder when a direct-boolean-context MATCH cannot be served natively (no FTS index on a TiFlash replica, or a non-pushdown-safe modifier).
Round 2: fts-like-fallback — fires only when round 1 reported a non-viable MATCH. The driver discards round 1's plan and re-runs the build with AlternativeLogicalPlanFTSLikeFallback=true, which switches the rewriter to ILIKE in direct-boolean positions. If this round also errors (e.g. unsupported search string with no FTS-index rescue), lastAltRoundErr propagates the message instead of the generic "failed to build logical plan" sentinel.
A single flag (AlternativeLogicalPlanFTSLikeFallback) drives the dispatch; viability state stays local to the build on PlanBuilder.nonViableFTSMatch.
Where the rewrite applies
inDirectMatchBooleanContext (modeled on the existing canTreatInSubqueryAsExistsForFilter) walks the AST ancestor stack and accepts only:
Everything else — MATCH ... > 0.5, MATCH ... = 0, MATCH ... IS NULL, MATCH inside CASE WHEN, arithmetic, scoring (SELECT field list, ORDER BY), etc. — keeps the native builtin, which preserves the float relevance score and errors at execution if no FTS index exists. Substituting a 0/1 integer in those positions would silently corrupt the comparison or sort.
Strict search-string subset
ValidateFTSSearchStringForLikeFallback rejects anything that would tokenize differently in MySQL FTS than a substring ILIKE match. Accepted by mode:
Rejected at plan time with error 1235 (ErrNotSupportedYet): phrases "...", prefix wildcard term*, relevance modifiers > < ~, grouping (...), mid-word punctuation like xx-yy, and any token containing %, _, , ,, ., :, etc. WITH QUERY EXPANSION is likewise rejected (no ILIKE approximation exists).
Modifier handling
The tipb pushdown protocol does not serialize the FTS modifier (see distsql_builtin.go), so a Boolean-mode or WITH QUERY EXPANSION MATCH pushed down to TiFlash would silently execute as natural-language mode. To prevent this, matchAgainstToBuiltin rejects non-default modifiers at plan time unless matchHasLikeFallbackRescue is true (alt enabled + direct-boolean context, where the alt-rounds driver will discard the native plan and rebuild via fts-like-fallback). In practice:
NULL search handling
MATCH(c) AGAINST(NULL) matches nothing in MySQL FTS semantics, but three-valued logic matters under NOT: native evalReal returns NULL, so NOT NULL = NULL filters the row. The rewrite emits Constant(NULL) rather than Constant(0) so the same semantics hold under NOT, IS NULL, etc. The plan-cache skip is set before the NULL fast-path, so a prepared statement bound to NULL first followed by a non-NULL bind re-plans and returns correct rows instead of reusing a cached constant-false plan.
Boolean-mode operator support (post-strict-subset)
Plan cache
Marks the plan non-cacheable when the AGAINST argument is mutable across executions (? parameter marker, user variable, deferred expression). Literal AGAINST values keep the plan cacheable. The skip runs before the NULL fast-path so a NULL first bind can't bake a constant plan that gets reused later.
Selectivity
BuildFTSToILikeExpressionFromBuiltin substitutes the equivalent ILIKE form for the opaque FTSMysqlMatchAgainst builtin so the round 1 native plan's cost reflects column histogram/TopN rather than the flat SelectivityFactor (0.8). Restricted to single-column MATCH because GetSelectivityByFilter declines multi-column expressions — a multi-column substitute would fall through to the same str-match default anyway.
Known semantic differences
These apply to ILIKE round queries only; the native path preserves full MySQL semantics:
Files changed
Test plan
parenthesized MATCH, scalar positions (IS NULL, > 0.5, = 0, CASE), non-default modifiers in scoring/scalar/alt-disabled contexts, prepared statements (literal cacheable,
parameter-marker non-cacheable, NULL-first-bind re-plans).
🤖 Generated with https://claude.com/claude-code
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
New Features
Tests
Chores