Skip to content

[#6490] feat(fileset): Add Tencent COS support for fileset catalog#11713

Open
whua3 wants to merge 1 commit into
apache:mainfrom
whua3:feat/6490-cos-hadoop-catalog
Open

[#6490] feat(fileset): Add Tencent COS support for fileset catalog#11713
whua3 wants to merge 1 commit into
apache:mainfrom
whua3:feat/6490-cos-hadoop-catalog

Conversation

@whua3

@whua3 whua3 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Add Tencent Cloud Object Storage (COS) support for the fileset catalog. The layout follows the existing OSS / S3 / GCS / Azure modules.

  • bundles/tencent: source module containing
    • COSFileSystemProvider: builds a Hadoop CosNFileSystem for the cosn:// scheme.
    • COSCredentialsProvider: Hadoop-side provider that bridges Gravitino-vended COSSecretKey credentials to hadoop-cos.
    • COSSecretKeyProvider: server-side CredentialProvider that returns long-lived COS Secret Key credentials.
    • COSUtils: helper that maps Gravitino properties to Hadoop Configuration.
  • bundles/tencent-bundle: shaded fat-jar (gravitino-tencent-bundle-<version>.jar) that bundles hadoop-cos and cos_api-bundle with relocated packages.
  • api: new COSSecretKeyCredential type.
  • catalogs/catalog-common: COSProperties and COSCredentialConfig.
  • clients/filesystem-hadoop3-runtime: include bundles/tencent in the GVFS runtime jar.
  • docs: new fileset-catalog-with-cos.md and a cross-link from fileset-catalog.md.
  • Tests: TestCOSFileSystemProvider, TestCOSCredentialProvider, and FilesetCOSCatalogIT. The IT is skipped on CI when no real COS credentials are provided, the same way FilesetOSSCatalogIT is.

Why are the changes needed?

#6490 tracks adding Tencent COS support. Without it, users on Tencent Cloud have to either rely on S3-compatible workarounds, which do not work with Gravitino's credential vending, or maintain their own fork.

This PR is the first of a chain of PRs against main:

  • PR-A (this PR): Java side, with the static cos-secret-key credential.
  • PR-B: server-side STS credential vending (cos-token).
  • PR-C: Python GVFS support (COSStorageHandler and Python-side COSSecretKeyCredential).
  • PR-D: Python COSTokenCredential wiring. Can be folded into PR-C if reviewers prefer fewer PRs.

Part of #6490
Fix: #11748

Does this PR introduce any user-facing change?

Yes, additive only. Existing fileset / catalog behavior is unchanged. New surface:

  • New fileset location scheme: cosn://<bucket>/<path>.
  • New fileset catalog properties: cos-region, cos-endpoint (optional, defaults to cos.${region}.myqcloud.com), cos-access-key-id, cos-secret-access-key.
  • New credential provider value: credential-providers=cos-secret-key (static AK/SK; STS will follow in PR-B).
  • New deployable artifact: gravitino-tencent-bundle-<version>.jar.
  • New documentation page: Fileset Catalog with COS.

How was this patch tested?

  • Unit tests: :bundles:tencent:test, :catalogs:catalog-common:test, :catalogs:catalog-fileset:test, :api:test all pass locally.
  • Build: :bundles:tencent-bundle:shadowJar, :clients:filesystem-hadoop3-runtime:shadowJar, and assemble for all touched modules pass locally.
  • Spotless and Checkstyle pass.
  • FilesetCOSCatalogIT passes locally against a real COS bucket in ap-guangzhou. It is skipped on CI when no credentials are provided.
  • Manual end-to-end verification on a local Gravitino server with the new bundle jars: created a fileset catalog backed by cosn://, then exercised hadoop fs -ls, -put, -get, -cat, -rm over gvfs://fileset/..., plus schema/fileset CRUD via the REST API and direct cosn://... access.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown

Code Coverage Report

Overall Project 67.11% -0.04% 🟢
Files changed 38.94% 🔴

Module Coverage
aliyun 1.72% 🔴
api 46.52% -0.3% 🟢
authorization-common 85.96% 🟢
aws 3.66% 🔴
azure 2.47% 🔴
catalog-common 9.92% -0.47% 🔴
catalog-fileset 80.23% 🟢
catalog-glue 66.91% 🟢
catalog-hive 79.42% 🟢
catalog-jdbc-clickhouse 80.02% 🟢
catalog-jdbc-common 44.22% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.29% 🟢
catalog-jdbc-starrocks 78.51% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 58.53% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 85.94% 🟢
catalog-lakehouse-paimon 82.14% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 78.01% 🟢
common 50.17% 🟢
core 82.58% 🟢
filesystem-hadoop3 77.27% 🟢
flink 0.0% 🔴
flink-common 47.12% 🟢
flink-runtime 0.0% 🔴
gcp 14.12% 🔴
hadoop-common 10.88% -0.04% 🔴
hive-metastore-common 53.77% 🟢
iceberg-common 58.15% 🟢
iceberg-rest-server 73.9% 🟢
idp-basic 86.2% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 20.83% 🔴
lance-rest-server 60.13% 🟢
lineage 53.02% 🟢
optimizer 82.95% 🟢
optimizer-api 21.95% 🔴
server 85.96% 🟢
server-common 74.18% 🟢
spark 28.57% 🔴
spark-common 41.66% 🟢
tencent 69.84% 🟢
trino-connector 40.13% 🟢
Files
Module File Coverage
api COSSecretKeyCredential.java 0.0% 🔴
catalog-common COSCredentialConfig.java 0.0% 🔴
COSProperties.java 0.0% 🔴
common ConfigConstants.java 0.0% 🔴
hadoop-common Constants.java 0.0% 🔴
tencent COSUtils.java 100.0% 🟢
COSSecretKeyProvider.java 88.89% 🟢
COSCredentialsProvider.java 80.77% 🟢
COSFileSystemProvider.java 45.83% 🔴

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Tencent Cloud Object Storage (COS) support to the Fileset catalog/GVFS stack, following the existing cloud-provider bundle pattern (AWS/GCP/Aliyun/Azure). This introduces COS-specific filesystem + credential vending integration, a shaded “tencent-bundle” artifact, and accompanying docs/tests.

Changes:

  • Introduce new COS bundle modules (:bundles:tencent, :bundles:tencent-bundle) and wire them into build/runtime packaging.
  • Add COS credential type + server/client-side credential provider plumbing for static SecretId/SecretKey vending.
  • Document COS setup and add COS integration/unit tests.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
settings.gradle.kts Includes new Tencent bundle modules in the Gradle build.
gradle/libs.versions.toml Adds hadoop-cos version and hadoop3-cos dependency coordinate.
docs/fileset-catalog.md Mentions COS support and bundle jar name.
docs/fileset-catalog-with-cos.md New end-to-end documentation for COS-backed Fileset catalogs (incl. credential vending).
clients/filesystem-hadoop3-runtime/build.gradle.kts Pulls :bundles:tencent into the Hadoop3 runtime client jar.
catalogs/hadoop-common/src/main/java/org/apache/gravitino/catalog/hadoop/fs/Constants.java Adds COS-specific Hadoop configuration keys.
catalogs/catalog-fileset/src/test/java/org/apache/gravitino/catalog/fileset/integration/test/FilesetCOSCatalogIT.java Adds COS integration test (gated by env vars).
catalogs/catalog-fileset/build.gradle.kts Adds tencent-bundle shadow jar to test classpath and task deps.
catalogs/catalog-common/src/main/java/org/apache/gravitino/storage/COSProperties.java Defines COS catalog property keys (region/endpoint/AK/SK).
catalogs/catalog-common/src/main/java/org/apache/gravitino/credential/config/COSCredentialConfig.java Adds COS credential config wrapper for static AK/SK.
bundles/tencent/src/test/java/org/apache/gravitino/cos/fs/TestCOSFileSystemProvider.java Unit tests for COS FS provider mapping and credential conf injection.
bundles/tencent/src/test/java/org/apache/gravitino/cos/credential/TestCOSCredentialProvider.java Unit tests for COS credential provider behavior.
bundles/tencent/src/main/resources/META-INF/services/org.apache.gravitino.credential.CredentialProvider Registers COS credential provider via ServiceLoader.
bundles/tencent/src/main/resources/META-INF/services/org.apache.gravitino.catalog.hadoop.fs.FileSystemProvider Registers COS filesystem provider via ServiceLoader.
bundles/tencent/src/main/java/org/apache/gravitino/cos/fs/COSUtils.java Helper for selecting a suitable COS credential from vended credentials.
bundles/tencent/src/main/java/org/apache/gravitino/cos/fs/COSFileSystemProvider.java Implements COS cosn:// filesystem provider + default tuning.
bundles/tencent/src/main/java/org/apache/gravitino/cos/fs/COSCredentialsProvider.java Bridges Gravitino vended COS credentials into hadoop-cos auth.
bundles/tencent/src/main/java/org/apache/gravitino/cos/credential/COSSecretKeyProvider.java Server-side credential provider for static COS AK/SK.
bundles/tencent/build.gradle.kts Build config for the thin COS integration module.
bundles/tencent-bundle/build.gradle.kts Shaded fat-jar bundling hadoop-cos + dependencies with relocations.
api/src/main/java/org/apache/gravitino/credential/COSSecretKeyCredential.java New credential type for COS static SecretId/SecretKey.

Comment thread docs/fileset-catalog.md Outdated
Comment thread docs/fileset-catalog-with-cos.md
Comment thread docs/fileset-catalog-with-cos.md Outdated

@yuqi1129 yuqi1129 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the COS support — the layout cleanly mirrors the existing OSS/S3/GCS bundles, and the ServiceLoader registration, relocation rules and credential-vending wiring all look correct. A few inline comments below.

One general note (not tied to a line): the PR description mentions "wire cosn:// into FilesetCatalogPropertiesMetadata" and "register the cosn scheme in hadoop-common", but neither file is actually changed that way — providers are auto-loaded via META-INF/services, and hadoop-common only got the two timeout/retry constants. The change itself is fine; just the description is slightly off.

Comment thread docs/fileset-catalog-with-cos.md Outdated
Add Tencent Cloud Object Storage (COS) support for the fileset catalog,
following the same layout as the existing OSS / S3 / GCS / Azure modules.

This is the first PR in a chain that together addresses apache#6490. Subsequent
PRs will add server-side STS credential vending, Python GVFS support, and
Python-side STS wiring. The issue will remain open until the final PR in
the chain is merged.

Modules added or changed:

- bundles/tencent: FileSystemProvider for the cosn:// scheme and a
  CredentialProvider that returns long-lived COS Secret Key credentials.
- bundles/tencent-bundle: shaded fat-jar that bundles hadoop-cos and the
  COS Java SDK with relocated packages.
- api: COSSecretKeyCredential type.
- catalogs/catalog-common: COSProperties and COSCredentialConfig.
- catalogs/catalog-fileset: wire cosn:// into the fileset catalog
  properties metadata.
- catalogs/hadoop-common: register the cosn scheme alongside s3a, oss,
  abfss, and gs.
- clients/filesystem-hadoop3-runtime: include bundles/tencent in the GVFS
  runtime jar so cosn:// works out of the box.
- docs: new fileset-catalog-with-cos.md, plus a cross-link from
  fileset-catalog.md.

Tests:

- TestCOSFileSystemProvider and TestCOSCredentialProvider as unit tests.
- FilesetCOSCatalogIT as an integration test, skipped on CI when no
  credentials are provided (same pattern as FilesetOSSCatalogIT).

Part of apache#6490
@whua3 whua3 force-pushed the feat/6490-cos-hadoop-catalog branch from 86af744 to 451a93d Compare June 18, 2026 16:56
@whua3

whua3 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Pushed an updated commit addressing all review comments.

@jerryshao

Copy link
Copy Markdown
Contributor

I suggest you to create subtasks for this epic issue #6490 . And each PR should have a related issue for that dedicated thing. Not tracking all the PRs in one issue.

:::

:::note
`cos-region` is mandatory for hadoop-cos: signing requests, building the default endpoint and selecting the right CAM scope all require the region. Even if you also set `cos-endpoint`, please keep `cos-region` set.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, please just merge these two blocks


:::note
Since version 1.4.0, the `gravitino-tencent` JAR is no longer required separately, as it is included in the `gravitino-filesystem-hadoop3-runtime` JAR.
:::

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this block.

- [`gravitino-tencent-bundle-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-tencent-bundle): A "fat" JAR that includes `gravitino-tencent` functionality and all necessary dependencies like `hadoop-cos` and the Tencent Cloud COS Java SDK. Use this if your Spark environment doesn't have a pre-existing Hadoop setup.
- [`gravitino-filesystem-hadoop3-runtime-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-filesystem-hadoop3-runtime): A "fat" JAR that bundles Gravitino's virtual filesystem client and includes the functionality of `gravitino-tencent`. It is required for accessing Gravitino filesets.
- `hadoop-cos-3.3.0-8.3.23.jar` and `cos_api-bundle-5.6.227.jar`: The Tencent-Cloud-published HCFS adapter and Java SDK for COS. If you are running in an existing Hadoop environment, you need to provide these JARs yourself; they are not part of the standard Apache Hadoop distribution and must be downloaded from Maven Central or Tencent's release page.
- [`gravitino-tencent-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-tencent): A "thin" JAR that only provides the COS integration code. Its functionality is already included in the `gravitino-tencent-bundle` and `gravitino-filesystem-hadoop3-runtime` JARs, so you do not need to add it as a direct dependency unless you want to manage all Hadoop and COS dependencies manually.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need hadoop-cos-3.3.0-8.3.23.jar, gravitino-tencent-${gravitino-version}.jar here again?

gravitino-tencent-bundle-${gravitino-version}.jar and gravitino-filesystem-hadoop3-runtime-${gravitino-version}.jar should be enough.

Comment thread docs/fileset-catalog.md
the storage location of the fileset. It supports the local filesystem and HDFS. Since
0.7.0-incubating, Gravitino supports [S3](fileset-catalog-with-s3.md), [GCS](fileset-catalog-with-gcs.md),
[OSS](fileset-catalog-with-oss.md) and [Azure Blob Storage](fileset-catalog-with-adls.md) through Fileset catalog.
Since 1.3.0, Gravitino also supports [Tencent Cloud COS](fileset-catalog-with-cos.md).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.4.0?

* required environment variables (COS_*) are present, mirroring the OSS / AWS counterparts.
*/
@EnabledIf(value = "cosIsConfigured", disabledReason = "Tencent Cloud COS is not configured.")
public class FilesetCOSCatalogIT extends FilesetCatalogIT {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add GVFS-related ITs to cover the changes.


This document explains how to configure a Fileset catalog with Tencent Cloud COS (Cloud Object Storage) in Gravitino.

## Prerequisites

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested manually based on the document?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have manually tested the code functionality, but this document was generated by AI. My apologies for not reviewing it thoroughly. I will fix these issues later and attach screenshots of the manual tests.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

Comment thread docs/fileset-catalog.md
the storage location of the fileset. It supports the local filesystem and HDFS. Since
0.7.0-incubating, Gravitino supports [S3](fileset-catalog-with-s3.md), [GCS](fileset-catalog-with-gcs.md),
[OSS](fileset-catalog-with-oss.md) and [Azure Blob Storage](fileset-catalog-with-adls.md) through Fileset catalog.
Since 1.3.0, Gravitino also supports [Tencent Cloud COS](fileset-catalog-with-cos.md).
Comment on lines +113 to +116
Preconditions.checkArgument(
StringUtils.isNotBlank(accessKeyId), "COS access key Id should not empty");
Preconditions.checkArgument(
StringUtils.isNotBlank(secretAccessKey), "COS secret access key should not empty");
Comment thread gradle/libs.versions.toml
Comment on lines 44 to +48
hadoop3 = "3.3.6"
hadoop3-gcs = "1.9.4-hadoop3"
hadoop3-abs = "3.3.6"
hadoop3-aliyun = "3.3.6"
hadoop-cos = "3.3.0-8.3.23"
Comment on lines +94 to +106
@AfterAll
public void stop() throws IOException {
Catalog catalog = metalake.loadCatalog(catalogName);
catalog.asSchemas().dropSchema(schemaName, true);
metalake.dropCatalog(catalogName, true);
client.dropMetalake(metalakeName, true);

try {
closer.close();
} catch (Exception e) {
LOG.error("Failed to close CloseableGroup", e);
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Subtask] Support Tencent COS in fileset catalog

4 participants