Skip to content
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ request adding CHANGELOG notes for breaking (!) changes and possibly other secti
- Names containing any of these characters: <code>/\:*?"<>|#+`</code>

### New Features
- Added GCS principal attribution for vended credentials (the GCP counterpart of AWS STS session tags). When the `GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE`, `GCS_PRINCIPAL_ATTRIBUTION_TOKEN_ISSUER`, and `GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_FILE` feature flags are set (plus a `gcpServiceAccount` on the storage config), credential vending chains a catalog-signed JWT through a Workload Identity Federation token exchange and tenant service-account impersonation, so the Polaris principal appears in GCS Data Access audit logs (`serviceAccountDelegationInfo.principalSubject`) for any client. `GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_ID` sets the JWT `kid` for JWKS key rotation. Attribution activates automatically once configured and is keyed per-principal in the credential cache; when unconfigured, GCP vending behaviour is unchanged.
- Added `SESSION_NAME_FIELDS_IN_SUBSCOPED_CREDENTIAL` feature flag for AWS credential vending. Operators can now configure an ordered list of fields (`realm`, `catalog`, `namespace`, `table`, `principal`) to compose structured STS role session names (e.g. `p-acme-hr_catalog-employee-etl_writer`). Session names are sanitized and proportionally truncated to the AWS 64-character limit. When unset, existing `INCLUDE_PRINCIPAL_NAME_IN_SUBSCOPED_CREDENTIAL` behaviour is preserved.
- Added `hostUsers` support in Helm chart.
- Added documentation for BigQuery Metastore Catalog federation. Build with `-PNonRESTCatalogs=BIGQUERY` to include the BigQueryMetastoreCatalog federation extension. See `site/content/in-dev/unreleased/federation/bigquery-metastore-federation.md`.
Expand Down
3 changes: 3 additions & 0 deletions polaris-core/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,9 @@ dependencies {
implementation(platform(libs.google.cloud.storage.bom))
implementation("com.google.cloud:google-cloud-storage")
implementation(libs.google.cloud.iamcredentials)
// Signs short-lived attribution JWTs for GCS principal attribution via Workload Identity
// Federation (see GcpFederatedCredentialsExchanger).
implementation(libs.auth0.jwt)

testCompileOnly(project(":polaris-immutables"))
testAnnotationProcessor(project(":polaris-immutables", configuration = "processor"))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,64 @@ public static void enforceFeatureEnabledOrThrow(
.defaultValue(List.<String>of())
.buildFeatureConfiguration();

// ---------------------------------------------------------------------------
// GCS principal attribution via Workload Identity Federation
//
// GCP downscoped credentials have no session-tag mechanism (unlike AWS STS), and custom audit
// headers only reach GCS audit logs if the client forwards them. To attribute GCS data access
// to the Polaris principal for ANY client, credential vending can chain
// catalog-signed JWT -> STS token exchange -> per-catalog service-account impersonation, so the
// principal appears in serviceAccountDelegationInfo of every GCS Data Access audit log entry.
//
// Attribution activates automatically once the audience, issuer, and signing key file are all
// set (no on/off flag); it additionally requires a gcpServiceAccount on the storage config.
// ---------------------------------------------------------------------------

public static final FeatureConfiguration<String> GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE =
PolarisConfiguration.<String>builder()
.key("GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE")
.description(
"Full resource name of the Workload Identity Pool provider used for GCS principal\n"
+ "attribution, e.g.\n"
+ "//iam.googleapis.com/projects/<num>/locations/global/workloadIdentityPools/<pool>/providers/<provider>.\n"
+ "Used as both the attribution JWT 'aud' claim and the STS token-exchange audience.\n"
+ "Empty (default) disables principal attribution.")
.defaultValue("")
.buildFeatureConfiguration();

public static final FeatureConfiguration<String> GCS_PRINCIPAL_ATTRIBUTION_TOKEN_ISSUER =
PolarisConfiguration.<String>builder()
.key("GCS_PRINCIPAL_ATTRIBUTION_TOKEN_ISSUER")
.description(
"Issuer (iss claim) of catalog-minted GCS attribution JWTs; must match the issuer\n"
+ "configured on the Workload Identity Pool OIDC provider. The provider verifies\n"
+ "signatures against its uploaded JWKS, so no public discovery endpoint is required.\n"
+ "Empty (default) disables principal attribution.")
.defaultValue("")
.buildFeatureConfiguration();

public static final FeatureConfiguration<String> GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_FILE =
PolarisConfiguration.<String>builder()
.key("GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_FILE")
.description(
"Filesystem path to the PKCS#8 PEM RSA private key used to sign GCS attribution JWTs\n"
+ "(RS256). The corresponding public key must be published in the Workload Identity\n"
+ "Pool provider's uploaded JWKS. Empty (default) disables principal attribution.")
.defaultValue("")
.buildFeatureConfiguration();

public static final FeatureConfiguration<String> GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_ID =
PolarisConfiguration.<String>builder()
.key("GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_ID")
.description(
"Key ID (kid) written into the header of GCS attribution JWTs so the Workload Identity\n"
+ "Pool provider can select the right public key from its JWKS during key rotation\n"
+ "(when the JWKS holds both the old and new keys). Must match the kid of the JWKS\n"
+ "entry for the configured signing key. Empty omits the header (only safe with a\n"
+ "single-key JWKS).")
.defaultValue("")
.buildFeatureConfiguration();

public static final FeatureConfiguration<Boolean> ALLOW_SETTING_S3_ENDPOINTS =
PolarisConfiguration.<Boolean>builder()
.key("ALLOW_SETTING_S3_ENDPOINTS")
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.polaris.core.storage.gcp;

/**
* Builds the {@code sub} claim for GCS principal-attribution JWTs as {@code <realm>/<principal>},
* within GCP's 127-character {@code google.subject} limit.
*
* <p>The character budget mirrors the AWS session-name builder: one character is reserved for the
* separator, then each field receives an equal share of the remainder, and budget unused by a short
* field flows to the other. ISO control characters and the {@code /} separator are stripped from
* each field so the subject stays unambiguously parseable, and the {@code unknown} placeholder
* substitutes null/empty fields so the subject shape stays stable.
*/
public final class GcpAttributionSubjectBuilder {

/** GCP limit for the {@code google.subject} attribute of a federated identity. */
public static final int MAX_SUBJECT_LENGTH = 127;

static final String SEPARATOR = "/";

static final String VALUE_UNKNOWN = "unknown";

private GcpAttributionSubjectBuilder() {}

/**
* Builds the attribution subject {@code <realm>/<principal>}, guaranteed to be at most {@value
* #MAX_SUBJECT_LENGTH} characters.
*
* @param realm the realm identifier (gets first-half budget priority)
* @param principalName the Polaris principal name
* @return the subject string
*/
public static String buildSubject(String realm, String principalName) {
String cleanRealm = sanitize(realm);
String cleanPrincipal = sanitize(principalName);

int budget = MAX_SUBJECT_LENGTH - SEPARATOR.length();
int remaining = budget;

int realmAlloc = remaining / 2;
int realmUsed = Math.min(cleanRealm.length(), realmAlloc);
remaining -= realmUsed;

int principalUsed = Math.min(cleanPrincipal.length(), remaining);
remaining -= principalUsed;

// Carry-forward: if the principal left budget unused, the realm may take more than its
// initial half-share.
int realmFinal = Math.min(cleanRealm.length(), realmUsed + remaining);

return cleanRealm.substring(0, realmFinal)
+ SEPARATOR
+ cleanPrincipal.substring(0, principalUsed);
}

private static String sanitize(String value) {
if (value == null || value.isEmpty()) {
return VALUE_UNKNOWN;
}
StringBuilder cleaned = new StringBuilder(value.length());
for (int i = 0; i < value.length(); i++) {
char c = value.charAt(i);
// Drop control chars and the separator itself so the subject stays unambiguously
// <realm>/<principal>: a value containing '/' would otherwise let an audit-log consumer
// mis-split it (e.g. principal "a/b" read as realm "a", principal "b").
if (!Character.isISOControl(c) && c != '/') {
cleaned.append(c);
}
}
return cleaned.length() == 0 ? VALUE_UNKNOWN : cleaned.toString();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
import com.google.protobuf.Duration;
import com.google.protobuf.Timestamp;
import java.io.IOException;
import java.nio.file.Path;
import java.time.Instant;
import java.util.ArrayList;
import java.util.Date;
Expand All @@ -46,6 +47,7 @@
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import org.apache.polaris.core.config.FeatureConfiguration;
import org.apache.polaris.core.config.RealmConfig;
import org.apache.polaris.core.storage.CachingStorageIntegration;
import org.apache.polaris.core.storage.CredentialVendingContext;
Expand Down Expand Up @@ -176,19 +178,44 @@ private GcpStorageCredentialCacheKey buildCacheKey(
@NonNull Set<String> writeLocations,
@NonNull Optional<String> refreshEndpoint,
@NonNull CredentialVendingContext context) {
// Principal attribution makes the vended token per-principal, so the principal must
// participate in cache identity; otherwise it is left empty to preserve cross-principal cache
// reuse. Attribution requires a service account to impersonate and a principal to attribute.
String principalName = "";
if (principalAttributionConfigured(realmConfig())
&& storageConfig().getGcpServiceAccount() != null) {
principalName = context.principalName().orElse("");
}
return GcpStorageCredentialCacheKey.of(
context.realm().orElse(""),
storageConfig(),
readLocations,
listLocations,
writeLocations,
refreshEndpoint,
principalName,
sourceCredentials,
transportFactory,
realmConfig(),
credentialOps);
}

/**
* Returns true when GCS principal attribution is fully configured (WIF audience, token issuer,
* and signing key file all set). There is intentionally no separate on/off flag.
*/
private static boolean principalAttributionConfigured(RealmConfig realmConfig) {
return !realmConfig
.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE)
.isEmpty()
&& !realmConfig
.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_TOKEN_ISSUER)
.isEmpty()
&& !realmConfig
.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_FILE)
.isEmpty();
}

/** Mint a fresh {@link StorageAccessConfig} for the given GCP cache key. */
static StorageAccessConfig compute(GcpStorageCredentialCacheKey key) {
GcpStorageConfigurationInfo gcpStorageConfig = key.storageConfig();
Expand All @@ -206,7 +233,7 @@ static StorageAccessConfig compute(GcpStorageCredentialCacheKey key) {
}

GoogleCredentials credentialsToDownscope =
getBaseCredentials(gcpStorageConfig, sourceCredentials, credentialOps);
baseCredentialsForVending(key, gcpStorageConfig, sourceCredentials, credentialOps);

CredentialAccessBoundary accessBoundary =
generateAccessBoundaryRules(readLocations, listLocations, writeLocations);
Expand Down Expand Up @@ -246,6 +273,42 @@ static StorageAccessConfig compute(GcpStorageCredentialCacheKey key) {
return accessConfig.build();
}

/**
* Returns the credential to be used as the source for downscoping.
*
* <p>When GCS principal attribution is configured and a principal is present (so the cache key
* carries it), the impersonation source is a federated identity whose subject is {@code
* <realm>/<principal>}, which surfaces the principal in {@code serviceAccountDelegationInfo} of
* GCS Data Access audit logs. Otherwise this is the standard path: impersonate the configured
* service account from the ambient source credentials, or use those credentials directly.
*/
private static GoogleCredentials baseCredentialsForVending(
GcpStorageCredentialCacheKey key,
GcpStorageConfigurationInfo storageConfig,
GoogleCredentials sourceCredentials,
GcpCredentialOps credentialOps) {
RealmConfig realmConfig = key.realmConfig();
String serviceAccount = storageConfig.getGcpServiceAccount();
if (serviceAccount != null
&& !key.principalName().isEmpty()
&& principalAttributionConfigured(realmConfig)) {
String subject =
GcpAttributionSubjectBuilder.buildSubject(key.realmId(), key.principalName());
GcpFederatedCredentialsExchanger exchanger =
new GcpFederatedCredentialsExchanger(
realmConfig.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_TOKEN_ISSUER),

@dimas-b dimas-b Jun 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably work correctly given that realm ID is part of the cache key, but conceptually, I think it would be nicer to evaluate these parameters at cache key build time (i.e. around line 185) and just use the values here.

The cache key is meant to be a "rich" object, so it should be fine to add new data as another (immutable) sub-object, that is if we want to avoid having too many properties at the same level (just an aesthetic concern).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

realmConfig.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_WIF_AUDIENCE),
Path.of(
realmConfig.getConfig(
FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_FILE)),
realmConfig.getConfig(FeatureConfiguration.GCS_PRINCIPAL_ATTRIBUTION_SIGNING_KEY_ID),
key.transportFactory());
GoogleCredentials federated = exchanger.federatedCredentials(subject, key.realmId());
return createImpersonatedCredentials(federated, serviceAccount, credentialOps);
}
return getBaseCredentials(storageConfig, sourceCredentials, credentialOps);
}

/**
* Returns the credential to be used as the source for downscoping. If a specific service account
* is configured, it impersonates that account first.
Expand Down
Loading