Skip to content

Add GCS principal attribution to vended credentials via Workload Identity Federation#4707

Open
obelix74 wants to merge 6 commits into
apache:mainfrom
obelix74:gcs-principal-attribution-wif
Open

Add GCS principal attribution to vended credentials via Workload Identity Federation#4707
obelix74 wants to merge 6 commits into
apache:mainfrom
obelix74:gcs-principal-attribution-wif

Conversation

@obelix74

Copy link
Copy Markdown
Contributor

For issue #4706.

GCP counterpart of AWS STS session tags. GCP downscoped credentials have no session-tag mechanism, and x-goog-custom-audit-* request headers only reach GCS audit logs if the client forwards them (arbitrary Iceberg clients do not), so GCS Data Access logs cannot today be tied to the requesting Polaris principal.

This vends the attribution in the one channel that survives any client: the identity of the credential itself. When configured, credential vending chains catalog-signed JWT (sub=/) -> GCP STS token exchange (via IdentityPoolCredentials programmatic supplier) -> tenant service-account impersonation -> existing CAB downscoping, so every GCS Data Access audit entry carries the principal in serviceAccountDelegationInfo.principalSubject.

New FeatureConfiguration flags:

  • GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE
  • GCS_PRINCIPAL_ATTRIBUTION_TOKEN_ISSUER
  • GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_FILE
  • GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_ID (kid for JWKS rotation)

There is no separate on/off flag: attribution activates once the audience, issuer, and signing key file are all set, and additionally requires a gcpServiceAccount on the storage config. When unconfigured, GCP vending is unchanged.

GcpStorageCredentialCacheKey gains a principalName data field, populated only when attribution is configured, so per-principal attributed tokens are never shared across principals (and cross-principal cache reuse is preserved when attribution is off). CredentialVendingContext already carries principalName, so no new plumbing into the catalog core is required.

New classes: GcpAttributionSubjectBuilder (builds / within GCP's 127-char google.subject limit) and GcpFederatedCredentialsExchanger (mints the RS256 JWT via com.auth0:java-jwt, already in the version catalog; performs the STS exchange via google-auth IdentityPoolCredentials, so no new HTTP machinery; caches the parsed signing key JVM-wide).

Tests cover the subject budget/sanitization, JWT claims + kid, signing-key caching, the IdentityPoolCredentials configuration, and per-principal cache-key identity. polaris-core compiles, tests pass, spotless clean.

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

…tity Federation

GCP counterpart of AWS STS session tags. GCP downscoped credentials have no
session-tag mechanism, and x-goog-custom-audit-* request headers only reach GCS
audit logs if the client forwards them (arbitrary Iceberg clients do not), so
GCS Data Access logs cannot today be tied to the requesting Polaris principal.

This vends the attribution in the one channel that survives any client: the
identity of the credential itself. When configured, credential vending chains
catalog-signed JWT (sub=<realm>/<principal>) -> GCP STS token exchange (via
IdentityPoolCredentials programmatic supplier) -> tenant service-account
impersonation -> existing CAB downscoping, so every GCS Data Access audit entry
carries the principal in serviceAccountDelegationInfo.principalSubject.

New FeatureConfiguration flags:
- GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE
- GCS_PRINCIPAL_ATTRIBUTION_TOKEN_ISSUER
- GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_FILE
- GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_ID (kid for JWKS rotation)

There is no separate on/off flag: attribution activates once the audience,
issuer, and signing key file are all set, and additionally requires a
gcpServiceAccount on the storage config. When unconfigured, GCP vending is
unchanged.

GcpStorageCredentialCacheKey gains a principalName data field, populated only
when attribution is configured, so per-principal attributed tokens are never
shared across principals (and cross-principal cache reuse is preserved when
attribution is off). CredentialVendingContext already carries principalName, so
no new plumbing into the catalog core is required.

New classes: GcpAttributionSubjectBuilder (builds <realm>/<principal> within
GCP's 127-char google.subject limit) and GcpFederatedCredentialsExchanger
(mints the RS256 JWT via com.auth0:java-jwt, already in the version catalog;
performs the STS exchange via google-auth IdentityPoolCredentials, so no new
HTTP machinery; caches the parsed signing key JVM-wide).

Tests cover the subject budget/sanitization, JWT claims + kid, signing-key
caching, the IdentityPoolCredentials configuration, and per-principal cache-key
identity. polaris-core compiles, tests pass, spotless clean.
Anand Kumar Sankaran added 4 commits June 11, 2026 09:22
The generateLicenseReport task for :polaris-admin requires all
non-Apache bundled dependencies to have a full license mention in
runtime/admin/distribution/LICENSE. The GCS principal attribution
feature introduced java-jwt (MIT) as a polaris-core implementation
dependency; the server distribution already had the entry but the
admin distribution did not.

@dimas-b dimas-b left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall 👍 Nice feature! Thanks, @obelix74 !

GcpAttributionSubjectBuilder.buildSubject(key.realmId(), key.principalName());
GcpFederatedCredentialsExchanger exchanger =
new GcpFederatedCredentialsExchanger(
realmConfig.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_TOKEN_ISSUER),

@dimas-b dimas-b Jun 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably work correctly given that realm ID is part of the cache key, but conceptually, I think it would be nicer to evaluate these parameters at cache key build time (i.e. around line 185) and just use the values here.

The cache key is meant to be a "rich" object, so it should be fine to add new data as another (immutable) sub-object, that is if we want to avoid having too many properties at the same level (just an aesthetic concern).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Addresses review feedback to evaluate attribution configuration values
(token issuer, WIF audience, signing key file/ID) at cache key build
time rather than inside compute(). The new GcpAttributionParams
immutable sub-object is stored as @Value.Auxiliary on the cache key;
baseCredentialsForVending() now reads from it directly instead of
calling realmConfig.getConfig() at cache miss time.
@obelix74 obelix74 requested a review from dimas-b June 12, 2026 04:15
static RSAPrivateKey readPkcs8PrivateKey(Path pemPath) throws IOException {
String pem = Files.readString(pemPath);
String base64 =
pem.replaceAll("-----BEGIN [A-Z ]+-----", "")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about reusing PemUtils? If it's not conveniently located it might be worth moving refactoring it to allow PEM-related code reuse 🤔

return cached;
}
RSAPrivateKey key = readPkcs8PrivateKey(signingKeyPath);
SIGNING_KEY_CACHE.putIfAbsent(signingKeyPath, key);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return SIGNING_KEY_CACHE.compute(signingKeyPath, (k,v) -> key) might be more correct (returning the same key to all threads, while avoiding blocking others on local file I/O).


This product bundles Auth0 Java JWT.

* Maven group:artifact IDs: com.auth0:java-jwt

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not introduce the dependency on java-jwt it just makes it explicit... I believe java-jwt was a transitive dependency even before this PR .... so why did not not have to mention it in the license before 🤔

@snazy @jbonofre : WDYT?

@github-project-automation github-project-automation Bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants