Skip to content

[#11642] improvement(fileset): Hide credential properties from fileset catalog properties#11674

Open
yuqi1129 wants to merge 8 commits into
apache:mainfrom
yuqi1129:fileset-hide-credential-properties-11642
Open

[#11642] improvement(fileset): Hide credential properties from fileset catalog properties#11674
yuqi1129 wants to merge 8 commits into
apache:mainfrom
yuqi1129:fileset-hide-credential-properties-11642

Conversation

@yuqi1129

@yuqi1129 yuqi1129 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Hide cloud storage credential properties for the fileset catalog, and make the credentials reachable to clients (e.g. GVFS/Spark) through credential vending so they are never returned in plaintext.

  1. Hide static credentialsFilesetCatalogPropertiesMetadata, FilesetSchemaPropertiesMetadata and FilesetPropertiesMetadata now reuse the shared hidden PropertyEntry definitions in core (S3PropertiesMetadata/OSSPropertiesMetadata/AzurePropertiesMetadata/GCSPropertiesMetadata), so S3/OSS/Azure credentials are filtered out of catalog/schema/fileset.properties(). Consistent with the JDBC catalog ([#11148] feat(authz): Support credential vending for the JDBC MYSQL connector #11149), Hive/Iceberg/Glue ([#11263] feat(authz): Extend credential vending to Hive/Iceberg/Glue/JDBC catalog types #11264) and Paimon ([Bug report] Paimon catalog exposes sensitive properties (jdbc-password, S3 secret key) in REST API responses #11644).

  2. Vend static credentials at the fileset level — Fileset access uses path-based credential vending, which previously only honored an explicit credential-providers setting and did not infer a provider from static credentials (unlike catalog-level vending). The detection logic is extracted into CredentialUtils.getStorageCredentialProviders and reused by BaseCatalog; CredentialUtils.getCredentialProvidersByOrder now falls back to inferring the provider (e.g. s3-secret-key) from static credentials when no explicit credential-providers is configured. This lets a catalog configured with only static credentials vend them at the fileset level, so a GVFS client that provides no credentials (with fs.gravitino.enableCredentialVending=true) obtains the server-side credentials.

  3. Actionable error — When a GVFS operation is denied because the client provides no credentials and credential vending is disabled, GVFS now surfaces an AccessDeniedException whose message points the user to set fs.gravitino.enableCredentialVending=true or provide the storage credentials in the client configuration.

Why are the changes needed?

For fileset catalogs, credential properties (S3/OSS/Azure keys) were not hidden, so catalog.properties() exposed them in plaintext. The JDBC catalog already hides jdbc-user/jdbc-password; the fileset catalog should follow the same approach. Hiding alone would break the "client without credentials relies on server-side credentials" flow, so the fileset-level vending is extended to infer providers from static credentials.

Fix: #11642

Does this PR introduce any user-facing change?

  • The listed credential properties are no longer returned by catalog/schema/fileset.properties().
  • A GVFS client that does not configure storage credentials must set fs.gravitino.enableCredentialVending=true to obtain the server-side credentials. Clients that configure their own credentials are unaffected. fs.gravitino.enableCredentialVending keeps its default of false.
  • For legacy connectors without vending support, the server flag gravitino.catalog.credential.backfillToProperties=true restores the previous behavior.

How was this patch tested?

  • Unit: TestFilesetCatalogPropertiesMetadata (credentials hidden across catalog/schema/fileset, non-sensitive props visible); TestFilesetCatalogCredential (properties() filters credentials while propertiesWithCredentialProviders() retains them); TestCredentialUtils (static credential inference + explicit providers take precedence).
  • Integration (LocalStack, gravitino-docker-test): FilesetS3CatalogIT (catalog.properties() hides credentials, getCredentials() returns S3SecretKeyCredential); GravitinoVirtualFileSystemS3IT.testS3CredentialVendingWithServerStaticCredentials (catalog with only static credentials + client with enableCredentialVending=true and no credentials reads/writes S3 via vended credentials), plus the existing 9 GVFS S3 tests still pass.

Copilot AI review requested due to automatic review settings June 16, 2026 08:31

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR prevents sensitive cloud storage credentials from leaking through outward-facing properties() APIs by marking those keys as hidden in fileset-related property metadata, while keeping credential vending behavior intact via propertiesWithCredentialProviders().

Changes:

  • Introduces a shared hidden metadata map for storage credential keys and reuses it across catalog/schema/fileset metadata.
  • Adds unit tests verifying credential keys are hidden from properties() but still available for credential vending.
  • Confirms non-sensitive connection properties remain visible.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
catalogs/catalog-fileset/src/main/java/org/apache/gravitino/catalog/fileset/FilesetCatalogPropertiesMetadata.java Adds shared hidden property entries for storage credentials and wires them into catalog properties metadata.
catalogs/catalog-fileset/src/main/java/org/apache/gravitino/catalog/fileset/FilesetSchemaPropertiesMetadata.java Reuses the shared hidden storage-credential entries at the schema level.
catalogs/catalog-fileset/src/main/java/org/apache/gravitino/catalog/fileset/FilesetPropertiesMetadata.java Reuses the shared hidden storage-credential entries at the fileset level.
catalogs/catalog-fileset/src/test/java/org/apache/gravitino/catalog/fileset/TestFilesetCatalogPropertiesMetadata.java Adds tests asserting credential keys are hidden and non-credential keys remain visible.
catalogs/catalog-fileset/src/test/java/org/apache/gravitino/catalog/fileset/TestFilesetCatalogCredential.java Adds end-to-end tests for hidden properties() vs. vending-friendly propertiesWithCredentialProviders().

@yuqi1129 yuqi1129 force-pushed the fileset-hide-credential-properties-11642 branch 2 times, most recently from c98adec to c75ec86 Compare June 16, 2026 09:19
@yuqi1129 yuqi1129 self-assigned this Jun 16, 2026
@yuqi1129 yuqi1129 added the branch-1.3 Automatically cherry-pick commit to branch-1.3 label Jun 16, 2026
…fileset catalog properties

Mark cloud storage credential properties (S3/OSS access & secret keys, GCS
service account file, Azure storage account name/key and client secret) as
hidden in FilesetCatalogPropertiesMetadata, FilesetSchemaPropertiesMetadata
and FilesetPropertiesMetadata, so BaseCatalog.properties() filters them out,
consistent with how the JDBC catalog hides jdbc-user and jdbc-password.

Hiding does not break credential vending: the raw credentials remain available
to the server-side credential manager via propertiesWithCredentialProviders(),
and the matching credential provider is auto-injected so clients (e.g. GVFS)
can still obtain vended credentials.
@yuqi1129 yuqi1129 force-pushed the fileset-hide-credential-properties-11642 branch from c75ec86 to 36eb9d4 Compare June 16, 2026 09:57
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

Code Coverage Report

Overall Project 67.2% -0.01% 🟢
Files changed 66.3% 🟢

Module Coverage
aliyun 3.42% +1.5% 🔴
api 46.82% 🟢
authorization-common 85.96% 🟢
aws 4.96% +0.72% 🔴
azure 3.68% +0.48% 🔴
catalog-common 10.4% 🔴
catalog-fileset 82.47% +2.03% 🟢
catalog-glue 66.91% 🟢
catalog-hive 79.42% 🟢
catalog-jdbc-clickhouse 80.02% 🟢
catalog-jdbc-common 44.22% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.29% 🟢
catalog-jdbc-starrocks 78.51% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 58.53% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 85.94% 🟢
catalog-lakehouse-paimon 82.14% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 78.01% 🟢
common 50.17% 🟢
core 82.64% -0.1% 🟢
filesystem-hadoop3 76.64% -3.91% 🟢
flink 0.0% 🔴
flink-common 47.12% 🟢
flink-runtime 0.0% 🔴
gcp 16.29% +0.27% 🔴
hadoop-common 13.25% +2.32% 🔴
hive-metastore-common 53.77% 🟢
iceberg-common 58.15% 🟢
iceberg-rest-server 73.98% 🟢
idp-basic 85.71% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 20.83% 🔴
lance-rest-server 60.13% 🟢
lineage 53.02% 🟢
optimizer 82.95% 🟢
optimizer-api 21.95% 🔴
server 85.96% 🟢
server-common 74.18% 🟢
spark 28.57% 🔴
spark-common 41.66% 🟢
trino-connector 40.25% 🟢
Files
Module File Coverage
aliyun OSSFileSystemProvider.java 16.0% 🔴
aws S3FileSystemProvider.java 9.43% 🔴
azure AzureFileSystemProvider.java 5.26% 🔴
catalog-fileset FilesetCatalogPropertiesMetadata.java 100.0% 🟢
FilesetPropertiesMetadata.java 100.0% 🟢
FilesetSchemaPropertiesMetadata.java 100.0% 🟢
core CredentialUtils.java 85.37% 🟢
BaseCatalog.java 67.8% 🟢
filesystem-hadoop3 BaseGVFSOperations.java 75.62% 🟢
GravitinoVirtualFileSystem.java 68.84% 🟢
gcp GCSFileSystemProvider.java 18.18% 🔴
hadoop-common SupportsCredentialVending.java 88.89% 🟢

mchades
mchades previously approved these changes Jun 16, 2026
@diqiu50

diqiu50 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

I don’t see any usage of client.credential in either the Java or Python client. Is it right?

@mchades mchades dismissed their stale review June 17, 2026 02:09

I don’t see any usage of client.credential in either the Java or Python client. Is it right?

…end them at the fileset level

Hide cloud storage credential properties (S3/OSS/Azure access & secret keys)
from the fileset catalog/schema/fileset properties() by reusing the shared
hidden PropertyEntry definitions in core, consistent with the JDBC catalog
(apache#11149) and Hive/Iceberg/Glue (apache#11264).

To keep the "client without credentials, using server-side credentials" flow
working after hiding, make fileset-level (path-based) credential vending infer
the storage credential provider from static credentials when no explicit
credential-providers is set. The detection is extracted into
CredentialUtils.getStorageCredentialProviders and reused by BaseCatalog, so it
mirrors catalog-level vending: a catalog configured with only static
credentials can now vend them at the fileset level.

GVFS clients that do not provide credentials must enable credential vending
(fs.gravitino.enableCredentialVending=true) to obtain the server-side
credentials. When credentials are missing and vending is disabled, GVFS now
surfaces an actionable AccessDenied message pointing to the right config.
@yuqi1129 yuqi1129 removed the branch-1.3 Automatically cherry-pick commit to branch-1.3 label Jun 17, 2026
yuqi1129 added 4 commits June 17, 2026 19:23
…issing credentials in GVFS

When a GVFS operation is denied because the client provides no credentials and
credential vending is disabled, the Python GVFS now wraps the PermissionError
with a message guiding the user to set enable_credential_vending=True or provide
the storage credentials in the client configuration. This mirrors the Java GVFS
behavior.
… over vended credentials

When credential vending is enabled, the vended credential provider was
merged after (and thus overrode) the client-configured static storage
credentials. Make the GVFS clients (Java and Python) skip credential
vending when the client already supplied its own storage credentials, so
client-provided credentials always win.

Detection is delegated to each storage provider/handler (S3/OSS/GCS/Azure)
via SupportsCredentialVending#containsClientCredentials (Java) and
StorageHandler.contains_client_credentials (Python), checking the
Gravitino-style credential keys.
…enabled

The client-credential precedence check resolved the storage provider/handler
unconditionally, which broke flows where credential vending is disabled and
the storage handler cannot be resolved for the path (e.g. the mocked local
filesystem in test_gvfs_with_hook). Gate the check behind enableCredentialVending
so the provider/handler is only resolved when vending is actually used; when
vending is off no credentials are fetched anyway.
@yuqi1129

Copy link
Copy Markdown
Contributor Author

For the fileset, we need to change the behaviour(by default, fs.gravitino.enableCredentialVending is false), we need to change the configuration to true if we intend to hide information like AKSK.

@yuqi1129

Copy link
Copy Markdown
Contributor Author

After this PR, users need to explicitly set 'fs.gravitino.enableCredentialVending' to true to utilize credentials provided by the Gravitino server if no credentials like AKSK were set in the Gravitino client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement] Fileset catalog should hide credential properties from catalog properties

4 participants