Skip to content

expression: fix isBinCollation to recognize utf8mb4_0900_bin#68846

Open
tiancaiamao wants to merge 2 commits into
masterfrom
fix-0900-bin-collation
Open

expression: fix isBinCollation to recognize utf8mb4_0900_bin#68846
tiancaiamao wants to merge 2 commits into
masterfrom
fix-0900-bin-collation

Conversation

@tiancaiamao
Copy link
Copy Markdown
Contributor

@tiancaiamao tiancaiamao commented Jun 1, 2026

What problem does this PR solve?

Issue Number: close #68845

Problem Summary:

CONCAT_WS over a utf8mb4_0900_bin column + a utf8mb4_unicode_ci column + a mediumblob column fails with ERROR 3854 (HY000): Cannot convert string from binary to utf8mb4. The root cause is that isBinCollation() in pkg/expression/collation.go does not recognize utf8mb4_0900_bin as a binary collation, causing inferCollation to degrade to CoercibilityNone instead of keeping the _bin collation. This leads to an incorrect from_binary() cast on the blob.

What changed and how does it work?

  1. Added CollationUTF8MB40900Bin = "utf8mb4_0900_bin" constant to pkg/parser/charset/charset.go.
  2. Added charset.CollationUTF8MB40900Bin to the isBinCollation() check in pkg/expression/collation.go.
  3. Added 3 regression test cases in pkg/expression/collation_test.go:
    • utf8mb4_0900_bin vs utf8mb4_unicode_ci (both orders) → _bin wins
    • Three-column case (utf8mb4_0900_bin + utf8mb4_unicode_ci + binary) → binary wins without from_binary() cast

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Fix a bug where CONCAT_WS with utf8mb4_0900_bin and a blob column incorrectly returned ERROR 3854 instead of returning binary result

Summary by CodeRabbit

  • Bug Fixes

    • Improved collation handling to properly recognize utf8mb4_0900_bin as a binary collation, ensuring correct collation inference when mixed with other collations.
  • Tests

    • Added regression test coverage for utf8mb4_0900_bin collation handling scenarios.
  • Documentation

    • Clarified binary-collation recognition behavior to reduce ambiguity in collation conflict resolution.

…Collation

The isBinCollation function did not include utf8mb4_0900_bin (MySQL 8.0
binary collation, ID 309). When inferCollation aggregates a column using
utf8mb4_0900_bin with another non-bin utf8mb4 collation, the result
incorrectly degrades to CoercibilityNone instead of keeping the _bin
collation. This causes subsequent aggregation with a binary/blob column
to produce a from_binary() cast that fails on non-UTF-8 bytes (ERROR 3854).

Fix: add charset.CollationUTF8MB40900Bin to isBinCollation and add
regression tests covering the two-column and three-column cases.

Closes #68845
@ti-chi-bot ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/needs-triage-completed labels Jun 1, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented Jun 1, 2026

@tiancaiamao I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 1, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented Jun 1, 2026

Hi @tiancaiamao. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6b4cdabf-7c30-4701-a18c-383d22a01a01

📥 Commits

Reviewing files that changed from the base of the PR and between 1ce5433 and 593e5ed.

📒 Files selected for processing (2)
  • pkg/expression/collation.go
  • pkg/util/collate/collate.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/expression/collation.go

📝 Walkthrough

Walkthrough

Adds an exported constant for utf8mb4_0900_bin, treats that collation as binary in both utility and expression-level predicates, and adds regression tests ensuring correct collation inference when mixed with non-UTF8 blobs.

Changes

UTF8MB4 0900 Binary Collation Support

Layer / File(s) Summary
Collation constant definition
pkg/parser/charset/charset.go
New exported constant CollationUTF8MB40900Bin = "utf8mb4_0900_bin".
IsBinCollation update
pkg/util/collate/collate.go
IsBinCollation doc and predicate updated to use the charset constant and clarify included/excluded collations.
Expression-level binary check
pkg/expression/collation.go
isBinCollation extended to recognize CollationUTF8MB40900Bin, affecting collation derivation logic.
Collation inference test coverage
pkg/expression/collation_test.go
Three new cases in TestInferCollation validate selection of utf8mb4_0900_bin over non-binary collations and binary handling with blob values.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

size/S, ok-to-test

Suggested reviewers

  • qw4990
  • hawkingrei
  • AilinKid

Poem

🐰 I found a bin that hid from sight,
I added a name, gave it its right,
Now mixed blobs and 0900 sing,
No more errors when they bring,
A tidy, happy charset night.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the primary change: adding utf8mb4_0900_bin recognition to the isBinCollation function to fix a specific bug.
Linked Issues check ✅ Passed The PR addresses all coding objectives from issue #68845: added CollationUTF8MB40900Bin constant, updated isBinCollation to recognize utf8mb4_0900_bin, and added comprehensive regression tests covering the bug scenario.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the isBinCollation recognition of utf8mb4_0900_bin; no unrelated modifications are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-0900-bin-collation

Warning

Tools execution failed with the following error:

Failed to run tools: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error)


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.3104%. Comparing base (5a38505) to head (593e5ed).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #68846        +/-   ##
================================================
- Coverage   76.3105%   75.3104%   -1.0001%     
================================================
  Files          2041       2025        -16     
  Lines        563393     568815      +5422     
================================================
- Hits         429928     428377      -1551     
- Misses       132549     140347      +7798     
+ Partials        916         91       -825     
Flag Coverage Δ
integration 41.2874% <100.0000%> (+1.5089%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4610% <ø> (ø)
parser ∅ <ø> (∅)
br 49.8023% <ø> (-13.0287%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tiancaiamao
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Jun 1, 2026

@tiancaiamao: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tiancaiamao
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Jun 1, 2026

@tiancaiamao: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 2, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign bb7133, zanmato1984 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tiancaiamao tiancaiamao requested review from YangKeao and bb7133 June 2, 2026 03:48
@tiancaiamao
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Jun 2, 2026

@tiancaiamao: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

expression: CONCAT_WS with utf8mb4_0900_bin + utf8mb4_unicode_ci + blob returns ERROR 3854

1 participant