[#6490] feat(fileset): Add Tencent COS support for fileset catalog#11713
[#6490] feat(fileset): Add Tencent COS support for fileset catalog#11713whua3 wants to merge 1 commit into
Conversation
Code Coverage Report
Files
|
There was a problem hiding this comment.
Pull request overview
Adds Tencent Cloud Object Storage (COS) support to the Fileset catalog/GVFS stack, following the existing cloud-provider bundle pattern (AWS/GCP/Aliyun/Azure). This introduces COS-specific filesystem + credential vending integration, a shaded “tencent-bundle” artifact, and accompanying docs/tests.
Changes:
- Introduce new COS bundle modules (
:bundles:tencent,:bundles:tencent-bundle) and wire them into build/runtime packaging. - Add COS credential type + server/client-side credential provider plumbing for static SecretId/SecretKey vending.
- Document COS setup and add COS integration/unit tests.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| settings.gradle.kts | Includes new Tencent bundle modules in the Gradle build. |
| gradle/libs.versions.toml | Adds hadoop-cos version and hadoop3-cos dependency coordinate. |
| docs/fileset-catalog.md | Mentions COS support and bundle jar name. |
| docs/fileset-catalog-with-cos.md | New end-to-end documentation for COS-backed Fileset catalogs (incl. credential vending). |
| clients/filesystem-hadoop3-runtime/build.gradle.kts | Pulls :bundles:tencent into the Hadoop3 runtime client jar. |
| catalogs/hadoop-common/src/main/java/org/apache/gravitino/catalog/hadoop/fs/Constants.java | Adds COS-specific Hadoop configuration keys. |
| catalogs/catalog-fileset/src/test/java/org/apache/gravitino/catalog/fileset/integration/test/FilesetCOSCatalogIT.java | Adds COS integration test (gated by env vars). |
| catalogs/catalog-fileset/build.gradle.kts | Adds tencent-bundle shadow jar to test classpath and task deps. |
| catalogs/catalog-common/src/main/java/org/apache/gravitino/storage/COSProperties.java | Defines COS catalog property keys (region/endpoint/AK/SK). |
| catalogs/catalog-common/src/main/java/org/apache/gravitino/credential/config/COSCredentialConfig.java | Adds COS credential config wrapper for static AK/SK. |
| bundles/tencent/src/test/java/org/apache/gravitino/cos/fs/TestCOSFileSystemProvider.java | Unit tests for COS FS provider mapping and credential conf injection. |
| bundles/tencent/src/test/java/org/apache/gravitino/cos/credential/TestCOSCredentialProvider.java | Unit tests for COS credential provider behavior. |
| bundles/tencent/src/main/resources/META-INF/services/org.apache.gravitino.credential.CredentialProvider | Registers COS credential provider via ServiceLoader. |
| bundles/tencent/src/main/resources/META-INF/services/org.apache.gravitino.catalog.hadoop.fs.FileSystemProvider | Registers COS filesystem provider via ServiceLoader. |
| bundles/tencent/src/main/java/org/apache/gravitino/cos/fs/COSUtils.java | Helper for selecting a suitable COS credential from vended credentials. |
| bundles/tencent/src/main/java/org/apache/gravitino/cos/fs/COSFileSystemProvider.java | Implements COS cosn:// filesystem provider + default tuning. |
| bundles/tencent/src/main/java/org/apache/gravitino/cos/fs/COSCredentialsProvider.java | Bridges Gravitino vended COS credentials into hadoop-cos auth. |
| bundles/tencent/src/main/java/org/apache/gravitino/cos/credential/COSSecretKeyProvider.java | Server-side credential provider for static COS AK/SK. |
| bundles/tencent/build.gradle.kts | Build config for the thin COS integration module. |
| bundles/tencent-bundle/build.gradle.kts | Shaded fat-jar bundling hadoop-cos + dependencies with relocations. |
| api/src/main/java/org/apache/gravitino/credential/COSSecretKeyCredential.java | New credential type for COS static SecretId/SecretKey. |
yuqi1129
left a comment
There was a problem hiding this comment.
Thanks for the COS support — the layout cleanly mirrors the existing OSS/S3/GCS bundles, and the ServiceLoader registration, relocation rules and credential-vending wiring all look correct. A few inline comments below.
One general note (not tied to a line): the PR description mentions "wire cosn:// into FilesetCatalogPropertiesMetadata" and "register the cosn scheme in hadoop-common", but neither file is actually changed that way — providers are auto-loaded via META-INF/services, and hadoop-common only got the two timeout/retry constants. The change itself is fine; just the description is slightly off.
Add Tencent Cloud Object Storage (COS) support for the fileset catalog, following the same layout as the existing OSS / S3 / GCS / Azure modules. This is the first PR in a chain that together addresses apache#6490. Subsequent PRs will add server-side STS credential vending, Python GVFS support, and Python-side STS wiring. The issue will remain open until the final PR in the chain is merged. Modules added or changed: - bundles/tencent: FileSystemProvider for the cosn:// scheme and a CredentialProvider that returns long-lived COS Secret Key credentials. - bundles/tencent-bundle: shaded fat-jar that bundles hadoop-cos and the COS Java SDK with relocated packages. - api: COSSecretKeyCredential type. - catalogs/catalog-common: COSProperties and COSCredentialConfig. - catalogs/catalog-fileset: wire cosn:// into the fileset catalog properties metadata. - catalogs/hadoop-common: register the cosn scheme alongside s3a, oss, abfss, and gs. - clients/filesystem-hadoop3-runtime: include bundles/tencent in the GVFS runtime jar so cosn:// works out of the box. - docs: new fileset-catalog-with-cos.md, plus a cross-link from fileset-catalog.md. Tests: - TestCOSFileSystemProvider and TestCOSCredentialProvider as unit tests. - FilesetCOSCatalogIT as an integration test, skipped on CI when no credentials are provided (same pattern as FilesetOSSCatalogIT). Part of apache#6490
86af744 to
451a93d
Compare
|
Pushed an updated commit addressing all review comments. |
|
I suggest you to create subtasks for this epic issue #6490 . And each PR should have a related issue for that dedicated thing. Not tracking all the PRs in one issue. |
| ::: | ||
|
|
||
| :::note | ||
| `cos-region` is mandatory for hadoop-cos: signing requests, building the default endpoint and selecting the right CAM scope all require the region. Even if you also set `cos-endpoint`, please keep `cos-region` set. |
There was a problem hiding this comment.
If possible, please just merge these two blocks
|
|
||
| :::note | ||
| Since version 1.4.0, the `gravitino-tencent` JAR is no longer required separately, as it is included in the `gravitino-filesystem-hadoop3-runtime` JAR. | ||
| ::: |
There was a problem hiding this comment.
You can remove this block.
| - [`gravitino-tencent-bundle-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-tencent-bundle): A "fat" JAR that includes `gravitino-tencent` functionality and all necessary dependencies like `hadoop-cos` and the Tencent Cloud COS Java SDK. Use this if your Spark environment doesn't have a pre-existing Hadoop setup. | ||
| - [`gravitino-filesystem-hadoop3-runtime-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-filesystem-hadoop3-runtime): A "fat" JAR that bundles Gravitino's virtual filesystem client and includes the functionality of `gravitino-tencent`. It is required for accessing Gravitino filesets. | ||
| - `hadoop-cos-3.3.0-8.3.23.jar` and `cos_api-bundle-5.6.227.jar`: The Tencent-Cloud-published HCFS adapter and Java SDK for COS. If you are running in an existing Hadoop environment, you need to provide these JARs yourself; they are not part of the standard Apache Hadoop distribution and must be downloaded from Maven Central or Tencent's release page. | ||
| - [`gravitino-tencent-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-tencent): A "thin" JAR that only provides the COS integration code. Its functionality is already included in the `gravitino-tencent-bundle` and `gravitino-filesystem-hadoop3-runtime` JARs, so you do not need to add it as a direct dependency unless you want to manage all Hadoop and COS dependencies manually. |
There was a problem hiding this comment.
Why do we need hadoop-cos-3.3.0-8.3.23.jar, gravitino-tencent-${gravitino-version}.jar here again?
gravitino-tencent-bundle-${gravitino-version}.jar and gravitino-filesystem-hadoop3-runtime-${gravitino-version}.jar should be enough.
| the storage location of the fileset. It supports the local filesystem and HDFS. Since | ||
| 0.7.0-incubating, Gravitino supports [S3](fileset-catalog-with-s3.md), [GCS](fileset-catalog-with-gcs.md), | ||
| [OSS](fileset-catalog-with-oss.md) and [Azure Blob Storage](fileset-catalog-with-adls.md) through Fileset catalog. | ||
| Since 1.3.0, Gravitino also supports [Tencent Cloud COS](fileset-catalog-with-cos.md). |
| * required environment variables (COS_*) are present, mirroring the OSS / AWS counterparts. | ||
| */ | ||
| @EnabledIf(value = "cosIsConfigured", disabledReason = "Tencent Cloud COS is not configured.") | ||
| public class FilesetCOSCatalogIT extends FilesetCatalogIT { |
There was a problem hiding this comment.
Please also add GVFS-related ITs to cover the changes.
|
|
||
| This document explains how to configure a Fileset catalog with Tencent Cloud COS (Cloud Object Storage) in Gravitino. | ||
|
|
||
| ## Prerequisites |
There was a problem hiding this comment.
Have you tested manually based on the document?
There was a problem hiding this comment.
I have manually tested the code functionality, but this document was generated by AI. My apologies for not reviewing it thoroughly. I will fix these issues later and attach screenshots of the manual tests.
| the storage location of the fileset. It supports the local filesystem and HDFS. Since | ||
| 0.7.0-incubating, Gravitino supports [S3](fileset-catalog-with-s3.md), [GCS](fileset-catalog-with-gcs.md), | ||
| [OSS](fileset-catalog-with-oss.md) and [Azure Blob Storage](fileset-catalog-with-adls.md) through Fileset catalog. | ||
| Since 1.3.0, Gravitino also supports [Tencent Cloud COS](fileset-catalog-with-cos.md). |
| Preconditions.checkArgument( | ||
| StringUtils.isNotBlank(accessKeyId), "COS access key Id should not empty"); | ||
| Preconditions.checkArgument( | ||
| StringUtils.isNotBlank(secretAccessKey), "COS secret access key should not empty"); |
| hadoop3 = "3.3.6" | ||
| hadoop3-gcs = "1.9.4-hadoop3" | ||
| hadoop3-abs = "3.3.6" | ||
| hadoop3-aliyun = "3.3.6" | ||
| hadoop-cos = "3.3.0-8.3.23" |
| @AfterAll | ||
| public void stop() throws IOException { | ||
| Catalog catalog = metalake.loadCatalog(catalogName); | ||
| catalog.asSchemas().dropSchema(schemaName, true); | ||
| metalake.dropCatalog(catalogName, true); | ||
| client.dropMetalake(metalakeName, true); | ||
|
|
||
| try { | ||
| closer.close(); | ||
| } catch (Exception e) { | ||
| LOG.error("Failed to close CloseableGroup", e); | ||
| } | ||
| } |
What changes were proposed in this pull request?
Add Tencent Cloud Object Storage (COS) support for the fileset catalog. The layout follows the existing OSS / S3 / GCS / Azure modules.
bundles/tencent: source module containingCOSFileSystemProvider: builds a HadoopCosNFileSystemfor thecosn://scheme.COSCredentialsProvider: Hadoop-side provider that bridges Gravitino-vendedCOSSecretKeycredentials tohadoop-cos.COSSecretKeyProvider: server-sideCredentialProviderthat returns long-lived COS Secret Key credentials.COSUtils: helper that maps Gravitino properties to HadoopConfiguration.bundles/tencent-bundle: shaded fat-jar (gravitino-tencent-bundle-<version>.jar) that bundleshadoop-cosandcos_api-bundlewith relocated packages.api: newCOSSecretKeyCredentialtype.catalogs/catalog-common:COSPropertiesandCOSCredentialConfig.clients/filesystem-hadoop3-runtime: includebundles/tencentin the GVFS runtime jar.docs: newfileset-catalog-with-cos.mdand a cross-link fromfileset-catalog.md.TestCOSFileSystemProvider,TestCOSCredentialProvider, andFilesetCOSCatalogIT. The IT is skipped on CI when no real COS credentials are provided, the same wayFilesetOSSCatalogITis.Why are the changes needed?
#6490 tracks adding Tencent COS support. Without it, users on Tencent Cloud have to either rely on S3-compatible workarounds, which do not work with Gravitino's credential vending, or maintain their own fork.
This PR is the first of a chain of PRs against
main:cos-secret-keycredential.cos-token).COSStorageHandlerand Python-sideCOSSecretKeyCredential).COSTokenCredentialwiring. Can be folded into PR-C if reviewers prefer fewer PRs.Part of #6490
Fix: #11748
Does this PR introduce any user-facing change?
Yes, additive only. Existing fileset / catalog behavior is unchanged. New surface:
cosn://<bucket>/<path>.cos-region,cos-endpoint(optional, defaults tocos.${region}.myqcloud.com),cos-access-key-id,cos-secret-access-key.credential-providers=cos-secret-key(static AK/SK; STS will follow in PR-B).gravitino-tencent-bundle-<version>.jar.How was this patch tested?
:bundles:tencent:test,:catalogs:catalog-common:test,:catalogs:catalog-fileset:test,:api:testall pass locally.:bundles:tencent-bundle:shadowJar,:clients:filesystem-hadoop3-runtime:shadowJar, andassemblefor all touched modules pass locally.FilesetCOSCatalogITpasses locally against a real COS bucket inap-guangzhou. It is skipped on CI when no credentials are provided.cosn://, then exercisedhadoop fs -ls,-put,-get,-cat,-rmovergvfs://fileset/..., plus schema/fileset CRUD via the REST API and directcosn://...access.