Skip to content

[#11590] fix(doris): fix partition parsing for Doris 3.0+ format#11732

Open
jiangxt2 wants to merge 3 commits into
apache:mainfrom
jiangxt2:feat/doris-partition
Open

[#11590] fix(doris): fix partition parsing for Doris 3.0+ format#11732
jiangxt2 wants to merge 3 commits into
apache:mainfrom
jiangxt2:feat/doris-partition

Conversation

@jiangxt2

Copy link
Copy Markdown

What changes were proposed in this pull request?

Partition Regex Fix (DorisUtils.java)

  • Added \\s* to PARTITION_INFO_PATTERN to tolerate whitespace between LIST/RANGE and ( in Doris 3.0+ SHOW CREATE TABLE output (PARTITION BY LIST ( vs PARTITION BY LIST()

Partition Column Extraction Fix (Copilot C3)

  • Changed split(", ") to split(",") + trim() to handle commas with or without trailing space
  • Added backtick detection and stripping for column names

Multi-column LIST Partition Parsing

  • Replaced regex-based parsing with bracket-depth-aware manual scan to correctly handle nested parentheses in VALUES IN (("a", 1), ("b", 2))

Backtick Partition Name Support (Copilot C4)

  • Updated regex from (\\w+) to (?:\\x60(\\w+)\\x60|(\\w+)) to match both backtick-quoted and bare partition names
  • Partition name extracted from group(1) (backtick) or group(2) (bare), whichever is non-null

Test Coverage (Copilot C5)

  • Added test case with backtick-quoted partition names (PARTITION \\x60p1\\x60 VALUES IN (...))

Does this PR introduce any user-facing change?

No. Partition parsing improvements are internal to Gravitino metadata loading.

How was this patch tested?

Unit tests in TestDorisUtils:

  • testExtractPartitionInfoFromSql: RANGE/LIST with/without spaces, multi-column LIST, backtick partition names, non-partitioned tables
  • testGeneratePartitionSqlFragment: RANGE MAXVALUE, LIST single/multi/multi-column values

Integration tests (Doris 4.0.6 and 3.0.6.2):

  • LIST partition with backtick names: Gravitino correctly reads partition info
  • RANGE partition: Gravitino correctly identifies range strategy
  • Partition column/value/name extraction verified

Related to #11590

@github-actions

Copy link
Copy Markdown

Code Coverage Report

Overall Project 67.17% +0.13% 🟢
Files changed 85.27% 🟢

Module Coverage
aliyun 1.72% 🔴
api 46.82% 🟢
authorization-common 85.96% 🟢
aws 3.66% 🔴
azure 2.47% 🔴
catalog-common 10.4% 🔴
catalog-fileset 80.23% 🟢
catalog-glue 66.91% 🟢
catalog-hive 79.44% +2.43% 🟢
catalog-jdbc-clickhouse 80.02% 🟢
catalog-jdbc-common 44.22% 🟢
catalog-jdbc-doris 81.16% +2.91% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.29% 🟢
catalog-jdbc-starrocks 78.51% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 58.53% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 85.86% 🟢
catalog-lakehouse-paimon 82.14% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 78.01% 🟢
common 50.17% 🟢
core 82.59% 🟢
filesystem-hadoop3 77.27% 🟢
flink 0.0% 🔴
flink-common 47.12% 🟢
flink-runtime 0.0% 🔴
gcp 14.12% 🔴
hadoop-common 10.88% 🔴
hive-metastore-common 53.77% 🟢
iceberg-common 58.15% 🟢
iceberg-rest-server 73.9% 🟢
idp-basic 86.2% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 20.83% 🔴
lance-rest-server 60.13% 🟢
lineage 53.02% 🟢
optimizer 82.87% 🟢
optimizer-api 21.95% 🔴
server 85.96% 🟢
server-common 74.18% 🟢
spark 28.57% 🔴
spark-common 41.66% 🟢
trino-connector 40.13% 🟢
Files
Module File Coverage
catalog-hive HiveCatalogOperations.java 81.74% 🟢
catalog-jdbc-doris DorisUtils.java 94.38% 🟢

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves Apache Doris catalog compatibility (Doris 3.0+ SHOW CREATE TABLE output) by making partition parsing more tolerant to whitespace and more robust for LIST partition definitions, with corresponding unit test updates.

Changes:

  • Relaxed partition header regex to tolerate whitespace and updated partition column splitting/backtick handling.
  • Added LIST partition assignment parsing that can handle nested parentheses for multi-column LIST values.
  • Updated/expanded TestDorisUtils cases and migrated assertions to JUnit Jupiter Assertions.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
catalogs/catalog-jdbc-doris/src/main/java/org/apache/gravitino/catalog/doris/utils/DorisUtils.java Updates partition regex and adds LIST partition assignment parsing logic.
catalogs/catalog-jdbc-doris/src/test/java/org/apache/gravitino/catalog/doris/utils/TestDorisUtils.java Expands partition parsing tests (including Doris 3.0+ spacing) and switches to JUnit Jupiter assertions.

} else if (RANGE_PARTITION.equals(partitionType)) {
return Optional.of(Transforms.range(new String[] {columns[0]}));
// Merge all lines to handle multi-line partition definitions
String mergedSql = createTableSql.replaceAll("\\n", " ");
Comment on lines +118 to +125
String[][] filedNames =
Arrays.stream(columns).map(s -> new String[] {s}).toArray(String[][]::new);
// Try to extract partition assignments
ListPartition[] assignments = extractListPartitionAssignments(mergedSql);
if (assignments.length > 0) {
return Optional.of(Transforms.list(filedNames, assignments));
}
return Optional.of(Transforms.list(filedNames));
// Locate "PARTITION <name> VALUES IN (" and extract the outer paren content manually
// to correctly handle multi-column partitions: VALUES IN (("a", 1), ("b", 2))
Pattern headerPattern =
Pattern.compile("PARTITION\\s+(?:`(\\w+)`|(\\w+))\\s+VALUES\\s+IN\\s*\\(");
Comment on lines +153 to 156
transform = DorisUtils.extractPartitionInfoFromSql(createTableSql);
Assertions.assertTrue(transform.isPresent());
Assertions.assertEquals("list", transform.get().name());
}
- Add whitespace tolerance in partition regex (PARTITION BY LIST ( vs LIST()
- Fix partition column extraction: split by comma + trim + backtick stripping
- Support backtick-quoted partition names (PARTITION `p1` VALUES IN ...)
- Add bracket-depth-aware parsing for multi-column LIST partitions

Related to apache#11590
Co-Authored-By: Chang-Tong <zdcheerful@hotmail.com>
Co-Authored-By: ArtificialIdoit <bill.sea@hotmail.com>
Co-Authored-By: cwq222 <15503804976@163.com>

Signed-off-by: jiangxt2 <jiangxt2@vip.qq.com>
@jiangxt2 jiangxt2 force-pushed the feat/doris-partition branch from a36570f to badba44 Compare June 22, 2026 09:50
- Replace regex-based replaceAll with char-level replace for newline
  merging
- Fix typo: filedNames -> fieldNames
- Allow backtick-quoted partition names with hyphens, dots and spaces
  by using [^`]+ instead of \w+ in headerPattern
- Add partition name extraction assertions and boundary case tests:
  empty assignment list, multi-line SQL, backtick-quoted column names
  with special characters, partition names with spaces

Signed-off-by: jiangxt2 <jiangxt2@vip.qq.com>
Co-Authored-By: Chang-Tong <zdcheerful@hotmail.com>
Co-Authored-By: ArtificialIdoit <bill.sea@hotmail.com>
Co-Authored-By: cwq222 <15503804976@163.com>
@jiangxt2 jiangxt2 force-pushed the feat/doris-partition branch from badba44 to b62a75e Compare June 22, 2026 13:16
@jiangxt2

Copy link
Copy Markdown
Author

Thanks for the review. After checking each comment against the actual PR diff:

  1. The code uses createTableSql.replace('\, '\) (the char overload), not replaceAll("\\n", " "). No regex engine is involved.
  2. The variable is already named fieldNames in the PR — the typo from the original code was fixed.
  3. The headerPattern uses ([^]+)for the backtick-quoted branch, which accepts any non-backtick character. The\w+branch only applies to unquoted identifiers. Tests cover names likep-2024_07andp-2024.08` inside backticks.
  4. The test does assert extracted partition names — listTransform.assignments()[0].name() is checked against "p1", "p-2024_07", `"p beijing", etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants