Add GCS principal attribution to vended credentials via Workload Identity Federation#4707
Add GCS principal attribution to vended credentials via Workload Identity Federation#4707obelix74 wants to merge 6 commits into
Conversation
…tity Federation GCP counterpart of AWS STS session tags. GCP downscoped credentials have no session-tag mechanism, and x-goog-custom-audit-* request headers only reach GCS audit logs if the client forwards them (arbitrary Iceberg clients do not), so GCS Data Access logs cannot today be tied to the requesting Polaris principal. This vends the attribution in the one channel that survives any client: the identity of the credential itself. When configured, credential vending chains catalog-signed JWT (sub=<realm>/<principal>) -> GCP STS token exchange (via IdentityPoolCredentials programmatic supplier) -> tenant service-account impersonation -> existing CAB downscoping, so every GCS Data Access audit entry carries the principal in serviceAccountDelegationInfo.principalSubject. New FeatureConfiguration flags: - GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE - GCS_PRINCIPAL_ATTRIBUTION_TOKEN_ISSUER - GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_FILE - GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_ID (kid for JWKS rotation) There is no separate on/off flag: attribution activates once the audience, issuer, and signing key file are all set, and additionally requires a gcpServiceAccount on the storage config. When unconfigured, GCP vending is unchanged. GcpStorageCredentialCacheKey gains a principalName data field, populated only when attribution is configured, so per-principal attributed tokens are never shared across principals (and cross-principal cache reuse is preserved when attribution is off). CredentialVendingContext already carries principalName, so no new plumbing into the catalog core is required. New classes: GcpAttributionSubjectBuilder (builds <realm>/<principal> within GCP's 127-char google.subject limit) and GcpFederatedCredentialsExchanger (mints the RS256 JWT via com.auth0:java-jwt, already in the version catalog; performs the STS exchange via google-auth IdentityPoolCredentials, so no new HTTP machinery; caches the parsed signing key JVM-wide). Tests cover the subject budget/sanitization, JWT claims + kid, signing-key caching, the IdentityPoolCredentials configuration, and per-principal cache-key identity. polaris-core compiles, tests pass, spotless clean.
The generateLicenseReport task for :polaris-admin requires all non-Apache bundled dependencies to have a full license mention in runtime/admin/distribution/LICENSE. The GCS principal attribution feature introduced java-jwt (MIT) as a polaris-core implementation dependency; the server distribution already had the entry but the admin distribution did not.
| GcpAttributionSubjectBuilder.buildSubject(key.realmId(), key.principalName()); | ||
| GcpFederatedCredentialsExchanger exchanger = | ||
| new GcpFederatedCredentialsExchanger( | ||
| realmConfig.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_TOKEN_ISSUER), |
There was a problem hiding this comment.
This will probably work correctly given that realm ID is part of the cache key, but conceptually, I think it would be nicer to evaluate these parameters at cache key build time (i.e. around line 185) and just use the values here.
The cache key is meant to be a "rich" object, so it should be fine to add new data as another (immutable) sub-object, that is if we want to avoid having too many properties at the same level (just an aesthetic concern).
Addresses review feedback to evaluate attribution configuration values (token issuer, WIF audience, signing key file/ID) at cache key build time rather than inside compute(). The new GcpAttributionParams immutable sub-object is stored as @Value.Auxiliary on the cache key; baseCredentialsForVending() now reads from it directly instead of calling realmConfig.getConfig() at cache miss time.
| static RSAPrivateKey readPkcs8PrivateKey(Path pemPath) throws IOException { | ||
| String pem = Files.readString(pemPath); | ||
| String base64 = | ||
| pem.replaceAll("-----BEGIN [A-Z ]+-----", "") |
There was a problem hiding this comment.
WDYT about reusing PemUtils? If it's not conveniently located it might be worth moving refactoring it to allow PEM-related code reuse 🤔
| return cached; | ||
| } | ||
| RSAPrivateKey key = readPkcs8PrivateKey(signingKeyPath); | ||
| SIGNING_KEY_CACHE.putIfAbsent(signingKeyPath, key); |
There was a problem hiding this comment.
nit: return SIGNING_KEY_CACHE.compute(signingKeyPath, (k,v) -> key) might be more correct (returning the same key to all threads, while avoiding blocking others on local file I/O).
|
|
||
| This product bundles Auth0 Java JWT. | ||
|
|
||
| * Maven group:artifact IDs: com.auth0:java-jwt |
For issue #4706.
GCP counterpart of AWS STS session tags. GCP downscoped credentials have no session-tag mechanism, and x-goog-custom-audit-* request headers only reach GCS audit logs if the client forwards them (arbitrary Iceberg clients do not), so GCS Data Access logs cannot today be tied to the requesting Polaris principal.
This vends the attribution in the one channel that survives any client: the identity of the credential itself. When configured, credential vending chains catalog-signed JWT (sub=/) -> GCP STS token exchange (via IdentityPoolCredentials programmatic supplier) -> tenant service-account impersonation -> existing CAB downscoping, so every GCS Data Access audit entry carries the principal in serviceAccountDelegationInfo.principalSubject.
New FeatureConfiguration flags:
There is no separate on/off flag: attribution activates once the audience, issuer, and signing key file are all set, and additionally requires a gcpServiceAccount on the storage config. When unconfigured, GCP vending is unchanged.
GcpStorageCredentialCacheKey gains a principalName data field, populated only when attribution is configured, so per-principal attributed tokens are never shared across principals (and cross-principal cache reuse is preserved when attribution is off). CredentialVendingContext already carries principalName, so no new plumbing into the catalog core is required.
New classes: GcpAttributionSubjectBuilder (builds / within GCP's 127-char google.subject limit) and GcpFederatedCredentialsExchanger (mints the RS256 JWT via com.auth0:java-jwt, already in the version catalog; performs the STS exchange via google-auth IdentityPoolCredentials, so no new HTTP machinery; caches the parsed signing key JVM-wide).
Tests cover the subject budget/sanitization, JWT claims + kid, signing-key caching, the IdentityPoolCredentials configuration, and per-principal cache-key identity. polaris-core compiles, tests pass, spotless clean.
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)