Add GCS principal attribution to vended credentials via Workload Identity Federation#4707
Add GCS principal attribution to vended credentials via Workload Identity Federation#4707obelix74 wants to merge 7 commits into
Conversation
…tity Federation GCP counterpart of AWS STS session tags. GCP downscoped credentials have no session-tag mechanism, and x-goog-custom-audit-* request headers only reach GCS audit logs if the client forwards them (arbitrary Iceberg clients do not), so GCS Data Access logs cannot today be tied to the requesting Polaris principal. This vends the attribution in the one channel that survives any client: the identity of the credential itself. When configured, credential vending chains catalog-signed JWT (sub=<realm>/<principal>) -> GCP STS token exchange (via IdentityPoolCredentials programmatic supplier) -> tenant service-account impersonation -> existing CAB downscoping, so every GCS Data Access audit entry carries the principal in serviceAccountDelegationInfo.principalSubject. New FeatureConfiguration flags: - GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE - GCS_PRINCIPAL_ATTRIBUTION_TOKEN_ISSUER - GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_FILE - GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_ID (kid for JWKS rotation) There is no separate on/off flag: attribution activates once the audience, issuer, and signing key file are all set, and additionally requires a gcpServiceAccount on the storage config. When unconfigured, GCP vending is unchanged. GcpStorageCredentialCacheKey gains a principalName data field, populated only when attribution is configured, so per-principal attributed tokens are never shared across principals (and cross-principal cache reuse is preserved when attribution is off). CredentialVendingContext already carries principalName, so no new plumbing into the catalog core is required. New classes: GcpAttributionSubjectBuilder (builds <realm>/<principal> within GCP's 127-char google.subject limit) and GcpFederatedCredentialsExchanger (mints the RS256 JWT via com.auth0:java-jwt, already in the version catalog; performs the STS exchange via google-auth IdentityPoolCredentials, so no new HTTP machinery; caches the parsed signing key JVM-wide). Tests cover the subject budget/sanitization, JWT claims + kid, signing-key caching, the IdentityPoolCredentials configuration, and per-principal cache-key identity. polaris-core compiles, tests pass, spotless clean.
The generateLicenseReport task for :polaris-admin requires all non-Apache bundled dependencies to have a full license mention in runtime/admin/distribution/LICENSE. The GCS principal attribution feature introduced java-jwt (MIT) as a polaris-core implementation dependency; the server distribution already had the entry but the admin distribution did not.
| GcpAttributionSubjectBuilder.buildSubject(key.realmId(), key.principalName()); | ||
| GcpFederatedCredentialsExchanger exchanger = | ||
| new GcpFederatedCredentialsExchanger( | ||
| realmConfig.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_TOKEN_ISSUER), |
There was a problem hiding this comment.
This will probably work correctly given that realm ID is part of the cache key, but conceptually, I think it would be nicer to evaluate these parameters at cache key build time (i.e. around line 185) and just use the values here.
The cache key is meant to be a "rich" object, so it should be fine to add new data as another (immutable) sub-object, that is if we want to avoid having too many properties at the same level (just an aesthetic concern).
Addresses review feedback to evaluate attribution configuration values (token issuer, WIF audience, signing key file/ID) at cache key build time rather than inside compute(). The new GcpAttributionParams immutable sub-object is stored as @Value.Auxiliary on the cache key; baseCredentialsForVending() now reads from it directly instead of calling realmConfig.getConfig() at cache miss time.
| static RSAPrivateKey readPkcs8PrivateKey(Path pemPath) throws IOException { | ||
| String pem = Files.readString(pemPath); | ||
| String base64 = | ||
| pem.replaceAll("-----BEGIN [A-Z ]+-----", "") |
There was a problem hiding this comment.
WDYT about reusing PemUtils? If it's not conveniently located it might be worth moving refactoring it to allow PEM-related code reuse 🤔
There was a problem hiding this comment.
PemUtils lives in runtime/service (org.apache.polaris.service.auth.internal.broker), which is a downstream module that depends on polaris-core. Using it from polaris-core would invert that dependency and create a cycle. Moving it to polaris-core could work in principle, but it is also package-private and has broader scope (key-pair generation, public-key reading, file writing) that does not belong in core storage. The readPkcs8PrivateKey here is a small, targeted helper with no deps beyond the JDK — I think keeping it local is the cleaner choice. Happy to file a follow-up to consolidate PEM utilities into a shared module if that is desirable.
| return cached; | ||
| } | ||
| RSAPrivateKey key = readPkcs8PrivateKey(signingKeyPath); | ||
| SIGNING_KEY_CACHE.putIfAbsent(signingKeyPath, key); |
There was a problem hiding this comment.
nit: return SIGNING_KEY_CACHE.compute(signingKeyPath, (k,v) -> key) might be more correct (returning the same key to all threads, while avoiding blocking others on local file I/O).
There was a problem hiding this comment.
Good call — switched to computeIfAbsent, which atomically ensures only one thread does the file I/O per path. Used UncheckedIOException to wrap the checked IOException inside the lambda and re-throws it at the call site.
|
|
||
| This product bundles Auth0 Java JWT. | ||
|
|
||
| * Maven group:artifact IDs: com.auth0:java-jwt |
There was a problem hiding this comment.
java-jwt was already a transitive dependency in the admin distribution before this PR — runtime/service has used it for internal JWT token signing since before this branch. Its absence from the LICENSE was a pre-existing omission rather than something introduced here. Since this PR adds polaris-core as a direct consumer it was a natural point to catch and correct that gap. The entry is needed: java-jwt is bundled in the admin distribution jar regardless of whether the dependency is direct or transitive.
For issue #4706.
GCP counterpart of AWS STS session tags. GCP downscoped credentials have no session-tag mechanism, and x-goog-custom-audit-* request headers only reach GCS audit logs if the client forwards them (arbitrary Iceberg clients do not), so GCS Data Access logs cannot today be tied to the requesting Polaris principal.
This vends the attribution in the one channel that survives any client: the identity of the credential itself. When configured, credential vending chains catalog-signed JWT (sub=/) -> GCP STS token exchange (via IdentityPoolCredentials programmatic supplier) -> tenant service-account impersonation -> existing CAB downscoping, so every GCS Data Access audit entry carries the principal in serviceAccountDelegationInfo.principalSubject.
New FeatureConfiguration flags:
There is no separate on/off flag: attribution activates once the audience, issuer, and signing key file are all set, and additionally requires a gcpServiceAccount on the storage config. When unconfigured, GCP vending is unchanged.
GcpStorageCredentialCacheKey gains a principalName data field, populated only when attribution is configured, so per-principal attributed tokens are never shared across principals (and cross-principal cache reuse is preserved when attribution is off). CredentialVendingContext already carries principalName, so no new plumbing into the catalog core is required.
New classes: GcpAttributionSubjectBuilder (builds / within GCP's 127-char google.subject limit) and GcpFederatedCredentialsExchanger (mints the RS256 JWT via com.auth0:java-jwt, already in the version catalog; performs the STS exchange via google-auth IdentityPoolCredentials, so no new HTTP machinery; caches the parsed signing key JVM-wide).
Tests cover the subject budget/sanitization, JWT claims + kid, signing-key caching, the IdentityPoolCredentials configuration, and per-principal cache-key identity. polaris-core compiles, tests pass, spotless clean.
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)