Skip to content

(feat) Implement metrics rest api#4115

Open
obelix74 wants to merge 35 commits into
apache:mainfrom
obelix74:implement_metrics_rest_api
Open

(feat) Implement metrics rest api#4115
obelix74 wants to merge 35 commits into
apache:mainfrom
obelix74:implement_metrics_rest_api

Conversation

@obelix74

@obelix74 obelix74 commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

This is an implementation of the proposal in #4010. This uses the stable envelope design for the REST API instead of a flattened structure.

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

Anand Kumar Sankaran added 6 commits April 10, 2026 07:26
Adds read-only REST endpoints at /api/metrics-reports/v1/ to expose
persisted Iceberg scan and commit metrics without requiring direct
database access.

Changes:
- spec/metrics-reports-service.yml: OpenAPI 3.0.3 spec for the new API
- api/metrics-reports-service/: JAX-RS code generation module
- polaris-core: TABLE_READ_METRICS privilege (id=103), LIST_TABLE_METRICS
  authorizable operation, default no-op read methods on MetricsPersistence,
  MetricsReportToken for keyset cursor pagination
- persistence/relational-jdbc: JDBC implementations of listScanReports/
  listCommitReports using DESC keyset pagination; MetricsModelUtils shared
  ObjectMapper/parseMetadata utility
- runtime/service: MetricsReportsService — resolves catalog/namespace/table
  via PolarisResolutionManifest, authorizes with TABLE_READ_METRICS, delegates
  to MetricsPersistence; always bounded by LIMIT (default 100)
- Tests: MetricsReportsServiceTest (including 403/404 scenarios),
  MetricsReportTokenTest, metadata round-trip tests for model classes
Restructures each report in the response from a flat object into a
stable envelope with nested actor/request/object/payload sub-objects.
This decouples the API shape from the DB schema and allows payload
schemas to evolve independently without breaking clients.

- MetricsReportsService: replace baseRecordFields + flat maps with
  per-type envelope builders (actor, request, object, payload.data)
- spec/metrics-reports-service.yml: replace flat ScanMetricsReport /
  CommitMetricsReport with envelope schemas; add MetricsActor,
  MetricsRequest, ScanMetricsObject, CommitMetricsObject,
  ScanPayload/CommitPayload, ScanPayloadData/CommitPayloadData
- Tests: add scanReportHasEnvelopeStructure and
  commitReportHasEnvelopeStructure to assert the nested shape
- docs: add envelope structure example to telemetry.md
…tsService

- Remove flat inline errorResponse() helper; all 400s now throw IllegalArgumentException
  so IcebergExceptionMapper produces a consistent Iceberg error envelope instead of
  a divergent flat {message, type, code} body
- Replace BadRequestException (JAX-RS WebApplicationException) with
  IllegalArgumentException in decodeNamespace so the same mapper handles it
- Guard Integer.parseInt in ModelScanMetricsReport.toRecord() against malformed
  projected_field_ids tokens in the DB — silently skips non-numeric tokens
- Remove unused @nullable import from MetricsReportsService
- Add test for PATH_COULD_NOT_BE_FULLY_RESOLVED resolver status (namespace/table
  path not found); update 5 test expectations to match exception-throwing contract
…esponse

These are DB-internal surrogate IDs with no meaning to API clients, who already
know the catalog, namespace, and table from the request URL. The snapshotId
(an Iceberg-level concept) is retained in the object envelope as it identifies
which snapshot the metrics apply to.
…w feedback

- Replace executeSelectOverStream + AtomicReference with executeSelect,
  which returns List<T> directly and is simpler for single-threaded use
- Remove the unused `columns` parameter from buildMetricsQuery; use
  SELECT * since the parameter was always ALL_COLUMNS per model
@obelix74 obelix74 force-pushed the implement_metrics_rest_api branch from 1d37c50 to ac66c27 Compare April 10, 2026 14:26

@dimas-b dimas-b left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this feature forward, @obelix74 !

The PR LGTM in general. Posting some comments about code organization, subject to discussion, of course.

@PolarisImmutable
@JsonSerialize(as = ImmutableMetricsReportToken.class)
@JsonDeserialize(as = ImmutableMetricsReportToken.class)
public interface MetricsReportToken extends Token {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this class does not belong in polaris-core. The idea behind TokenType and related java service descriptors was to allow token types to be pluggable.

It looks like this class is currently used only by relational-jdbc, so let's move it there.

If it must be reused at some point, it will be easier to expose it later than take away from polaris-core.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Moved MetricsReportToken.java to persistence/relational-jdbc/.../jdbc/pagination/ (the only consumer)
  • Updated JdbcBasePersistenceImpl.java import
  • Moved MetricsReportTokenTest.java to relational-jdbc
  • Removed MetricsReportToken$MetricsReportTokenType from polaris-core's service descriptor; created a new one in relational-jdbc

Comment thread spec/metrics-reports-service.yml Outdated
schema:
type: integer
minimum: 1
maximum: 1000

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a max page size to the API spec looks like a overkill... ultimately, it's an impl. concern, not an API concern.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Removed maximum: 1000 from the pageSize query parameter schema

Comment thread spec/metrics-reports-service.yml Outdated
Comment thread spec/metrics-reports-service.yml Outdated
$ref: '#/components/schemas/ErrorResponse'

components:
securitySchemes:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I'm not sure it's worth specifying authentication schemes in each OpenAPI file... I believe Polaris will handle AuthN uniformly across all API and evolving AuthN will not (and should not) affect each API definition individually.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Removed root-level security: block and the entire securitySchemes: section from components:

Comment thread spec/metrics-reports-service.yml Outdated
Comment thread runtime/service/build.gradle.kts Outdated
- spec: remove max page size (impl concern, not API concern)
- spec: clarify "epoch milliseconds" → "Unix epoch milliseconds"
- spec: remove securitySchemes (auth handled uniformly by Polaris)
- spec: update namespace description to reference Polaris Iceberg REST API convention
- Move MetricsReportToken from polaris-core to relational-jdbc, the sole consumer; register TokenType service descriptor there
- Move MetricsReportsService into new extensions/metrics-reports/impl module following the OPA Authorizer pattern; wire as runtimeOnly in runtime/server so it is elective

@dimas-b dimas-b left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor remaining comment 😅

@sungwy , @sneethiraj : FYI about the new AuthZ operation.

Comment thread gradle/projects.main.properties
Comment thread CHANGELOG.md Outdated
Comment thread spec/metrics-reports-service.yml Outdated
@obelix74 obelix74 requested a review from dimas-b April 16, 2026 16:08

@flyingImer flyingImer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction looks right to me

Two structural observations:

  • With this PR, MetricsPersistence grows from 2 write methods to 4 (read + write). It's marked @beta and the javadoc calls it a "Service Provider Interface." But it lives on BasePersistence, which only local DB backends implement. NoSqlMetaStoreManager and RemotePolarisMetaStoreManager go through empty BasePersistence implementations, so these methods are permanently no-op for them. Meanwhile, the actual SPI interfaces (PolarisMetricsReporter, PolarisMetricsManager) have no annotation at all. The @beta signal is on the wrong layer IIUC

  • The write path enters through PolarisMetricsManager on MetaStoreManager, but this read path bypasses that layer and goes straight to BasePersistence via callContext.getMetaStore(). If we want the metrics read API to work for non-JDBC backends, it would need a MetaStoreManager-level entry point, same as writes.

Not blocking on this. I think the question of where metrics persistence should sit architecturally is worth a discussion on dev@.

…porter

Move the @beta signal to the actual SPI interfaces rather than only
on MetricsPersistence, which is a BasePersistence concern not visible
to callers of the service layer.
@obelix74

Copy link
Copy Markdown
Contributor Author

The direction looks right to me

Two structural observations:

  • With this PR, MetricsPersistence grows from 2 write methods to 4 (read + write). It's marked @beta and the javadoc calls it a "Service Provider Interface." But it lives on BasePersistence, which only local DB backends implement. NoSqlMetaStoreManager and RemotePolarisMetaStoreManager go through empty BasePersistence implementations, so these methods are permanently no-op for them. Meanwhile, the actual SPI interfaces (PolarisMetricsReporter, PolarisMetricsManager) have no annotation at all. The @beta signal is on the wrong layer IIUC
  • The write path enters through PolarisMetricsManager on MetaStoreManager, but this read path bypasses that layer and goes straight to BasePersistence via callContext.getMetaStore(). If we want the metrics read API to work for non-JDBC backends, it would need a MetaStoreManager-level entry point, same as writes.

Not blocking on this. I think the question of where metrics persistence should sit architecturally is worth a discussion on dev@.

Thank you. I have added @Beta annotation to PolarisMetricsManager and PolarisMetricsReporter.

About the second point, thank you. Would this mean a read method to PolarisMetricsManager and MetaStoreManager mirroring the write path? Should I do it in this PR or can this wait?

@flyingImer

Copy link
Copy Markdown
Contributor

The direction looks right to me
Two structural observations:

  • With this PR, MetricsPersistence grows from 2 write methods to 4 (read + write). It's marked @beta and the javadoc calls it a "Service Provider Interface." But it lives on BasePersistence, which only local DB backends implement. NoSqlMetaStoreManager and RemotePolarisMetaStoreManager go through empty BasePersistence implementations, so these methods are permanently no-op for them. Meanwhile, the actual SPI interfaces (PolarisMetricsReporter, PolarisMetricsManager) have no annotation at all. The @beta signal is on the wrong layer IIUC
  • The write path enters through PolarisMetricsManager on MetaStoreManager, but this read path bypasses that layer and goes straight to BasePersistence via callContext.getMetaStore(). If we want the metrics read API to work for non-JDBC backends, it would need a MetaStoreManager-level entry point, same as writes.

Not blocking on this. I think the question of where metrics persistence should sit architecturally is worth a discussion on dev@.

Thank you. I have added @Beta annotation to PolarisMetricsManager and PolarisMetricsReporter.

About the second point, thank you. Would this mean a read method to PolarisMetricsManager and MetaStoreManager mirroring the write path? Should I do it in this PR or can this wait?

Thanks for adding @beta.

Reads should go through MetaStoreManager too, same as writes. If reads stay on BasePersistence, non-JDBC backends can't implement the read API at all. I'd prefer fixing that in this PR so the read path ships with the same layering as writes.

Separately, the persistence schema discussion on dev@ is still open. A follow-up issue linking to that thread would help track it.

dimas-b
dimas-b previously approved these changes Jun 1, 2026

@flyingImer flyingImer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for continuing to push this forward. Before this merges, I want to explicitly call out the latest alignment from the metrics architecture sync notes, because I think it changes how this PR should be scoped: https://docs.google.com/document/d/100h7c4damrUzVuquYbBHM0EvA4LSWuW2IT2dN_7nYVA/edit?pli=1&tab=t.k96s2xyqr5u1#heading=h.uvb454otvxc0

My main concern is that this PR defines the SPI at the wrong layer.

PolarisMetricsReporter is currently described as the metrics reporting SPI, but it lives in runtime/service and is shaped around CDI/runtime wiring. That makes it hard for it to serve as the stable Polaris-facing contract for metrics ingestion, which should be runtime/framework agnostic. I think this contract should instead live in polaris-core, where the rest of Polaris can depend on it without depending on the Quarkus/runtime wiring layer.

At the same time, this PR expands MetricsPersistence under polaris-core and wires it through BasePersistence. Although #4397 already covers decomposing it from BasePersistence, MetricsPersistence would still dangle under polaris-core. Since durable metrics persistence is now being treated as an extension implementation, I think this durable persistence contract/data model should move under extensions and be self-contained there.

So to me the layering should be:

  1. polaris-core: stable Iceberg metrics reporting SPI contract.
  2. runtime/service: REST ingestion, table resolution, authz, and implementation selection.
  3. extensions/...: concrete implementations, including the default no-op/log-only battery and the durable JDBC-backed implementation.
  4. durable metrics extension: owns its own persistence contract, data model, schema/bootstrap, retention, and read/query backing.

Could we split this PR into two scopes?

First scope: core-level metrics reporting SPI + default battery

Define the Iceberg metrics reporting SPI in polaris-core, not in runtime/service, and not as MetricsPersistence inherited by BasePersistence. The contract can stay small: a Polaris-resolved metrics context plus the Iceberg MetricsReport payload.

Runtime/service would still call the selected implementation after resolving and authorizing the table, but it would not define the SPI shape itself.

The built-in battery can stay simple and safe: no-op/log-only by default (2 cents: preferred no-op), provided as the default extension implementation like extensions/metrics-reports/no-op (bundled for release via BOM), out of the box experience.

Second scope: durable Iceberg metrics extension implementation

Reintroduce durable metrics storage into a self-contained extension implementation. That implementation should own the durable data model, schema/bootstrap concerns, retention/query behavior, and read API backing.
Concretely, I think the decomposed MetricsPersistence work from #4397 should move under this durable metrics extension path, like extensions/metrics-reports/persistence/relational-jdbc, rather than remaining under generic metrics-reports/impl. That keeps durable JDBC metrics as one implementation of the reporting SPI, instead of making metrics persistence part of the shared core metastore persistence shape.

@dimas-b I'm trying to keep the useful work here moving, but with the SPI boundary corrected before we merge a shape that downstream implementations may start depending on. WDYT of my proposed plan?

Anand, one point from the mailing-list thread that may be worth repeating here: I agree migration matters for durable metrics data. To me, that is another reason to keep the durable data model scoped to the durable metrics implementation, rather than making the current per-type JDBC schema and BasePersistence coupling part of the shared SPI shape.

* @param pageToken pagination token
* @return a page of scan metrics records
*/
default Page<ScanMetricsRecord> listScanReports(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the latest metrics architecture sync notes, I think this is the main boundary we should avoid standardizing in this PR.

The alignment there was that the stable SPI should be Iceberg metrics reporting/emitting, not metrics persistence inherited by every BasePersistence implementation (which has already been covered by #4397). Durable metrics storage can still exist, but as a durable metrics reporter implementation detail under extensions/ module.

Could we avoid introducing this as the SPI in polaris-core and instead move the contract toward an IcebergMetricsReporter + context boundary in the metrics extension API layer?

}

@Override
public Page<ScanMetricsRecord> listScanReports(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another sign that the durable implementation is landing in the generic JDBC metastore backend rather than in a metrics-specific durable implementation.

I think the long-term shape should let a deployment use one backend for the metastore and another for metrics. Putting the metrics read/write/query logic directly in JdbcBasePersistenceImpl makes that separation harder. Could this move under the durable metrics implementation module instead, alongside its schema/bootstrap/read API support?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for the read path specifically, this bakes the per-type query API into the low-level persistence surface.

Given the sync alignment around treating the current scan/commit split as transitional, I'd prefer not to add listScanReports/listCommitReports as core persistence methods. A typed query behind the durable metrics implementation would leave us more room to consolidate the schema/model without another SPI churn.

Comment on lines +26 to +27
implementation(project(":polaris-core"))
implementation(project(":polaris-api-metrics-reports-service"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming/scope question: this module is called an extension, but the durable implementation pieces are still spread across polaris-core, runtime/service, and persistence/relational-jdbc.

If this is meant to be the metrics extension, could we make it own the durable metrics implementation more fully? Otherwise the module name may imply a cleaner extension boundary than the code currently has.

polaris-extensions-federation-bigquery=extensions/federation/bigquery
polaris-extensions-auth-opa=extensions/auth/opa/impl
polaris-extensions-auth-opa-tests=extensions/auth/opa/tests
polaris-extensions-metrics-reports=extensions/metrics-reports/impl

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we split the scope, I think the module structure could make the intended boundary more obvious:

  • an API/SPI layer for Iceberg metrics reporting + default no-op/log-only battery
  • a durable JDBC metrics implementation module that owns its persistence/schema/read support

That would avoid the generic metrics-reports extension name becoming a catch-all for both API surface and durable storage concerns.

Comment on lines +160 to +162
oneOf:
- $ref: '#/components/schemas/ListScanMetricsResponse'
- $ref: '#/components/schemas/ListCommitMetricsResponse'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The beta API shape here is probably okay to keep moving, but I think it should be clearly decoupled from the durable storage/SPI shape.

In other words, I'm less concerned about this response envelope than about what backs it. Could we keep the REST API work scoped as beta while moving the durable persistence backing out of the core metastore persistence path?

@github-project-automation github-project-automation Bot moved this from Ready to merge to PRs In Progress in Basic Kanban Board Jun 12, 2026
…polaris-core, remove MetricsPersistence from BasePersistence

Addresses architecture review feedback that the metrics reporting SPI was
defined at the wrong layer (CDI-coupled in runtime/service instead of
polaris-core), and that durable metrics persistence was bleeding through
BasePersistence into every metastore backend.

Scope 1 changes (this commit):
- Add stable CDI-agnostic IcebergMetricsReporter SPI to polaris-core
- Remove MetricsPersistence from BasePersistence extends clause
- Remove PolarisMetricsManager from PolarisMetaStoreManager extends clause
- Delete all durable-path code: JDBC models, converters, PersistingMetricsReporter
- Add no-op (default) and log-only reporters to extensions/metrics-reports/impl
- Stub REST read path in MetricsReportsService to return empty results
- Fix namespace decoding to use NamespaceUtils.splitNamespace (canonical)
- Add multi-level namespace test; fix CHANGELOG accuracy

Durable JDBC metrics persistence is deferred to a follow-up extension
module (extensions/metrics-reports/persistence/relational-jdbc) that
will back the read API without touching BasePersistence.
@obelix74

Copy link
Copy Markdown
Contributor Author

Thanks for continuing to push this forward. Before this merges, I want to explicitly call out the latest alignment from the metrics architecture sync notes, because I think it changes how this PR should be scoped: https://docs.google.com/document/d/100h7c4damrUzVuquYbBHM0EvA4LSWuW2IT2dN_7nYVA/edit?pli=1&tab=t.k96s2xyqr5u1#heading=h.uvb454otvxc0

My main concern is that this PR defines the SPI at the wrong layer.

PolarisMetricsReporter is currently described as the metrics reporting SPI, but it lives in runtime/service and is shaped around CDI/runtime wiring. That makes it hard for it to serve as the stable Polaris-facing contract for metrics ingestion, which should be runtime/framework agnostic. I think this contract should instead live in polaris-core, where the rest of Polaris can depend on it without depending on the Quarkus/runtime wiring layer.

At the same time, this PR expands MetricsPersistence under polaris-core and wires it through BasePersistence. Although #4397 already covers decomposing it from BasePersistence, MetricsPersistence would still dangle under polaris-core. Since durable metrics persistence is now being treated as an extension implementation, I think this durable persistence contract/data model should move under extensions and be self-contained there.

So to me the layering should be:

  1. polaris-core: stable Iceberg metrics reporting SPI contract.
  2. runtime/service: REST ingestion, table resolution, authz, and implementation selection.
  3. extensions/...: concrete implementations, including the default no-op/log-only battery and the durable JDBC-backed implementation.
  4. durable metrics extension: owns its own persistence contract, data model, schema/bootstrap, retention, and read/query backing.

Could we split this PR into two scopes?

First scope: core-level metrics reporting SPI + default battery

Define the Iceberg metrics reporting SPI in polaris-core, not in runtime/service, and not as MetricsPersistence inherited by BasePersistence. The contract can stay small: a Polaris-resolved metrics context plus the Iceberg MetricsReport payload.

Runtime/service would still call the selected implementation after resolving and authorizing the table, but it would not define the SPI shape itself.

The built-in battery can stay simple and safe: no-op/log-only by default (2 cents: preferred no-op), provided as the default extension implementation like extensions/metrics-reports/no-op (bundled for release via BOM), out of the box experience.

Second scope: durable Iceberg metrics extension implementation

Reintroduce durable metrics storage into a self-contained extension implementation. That implementation should own the durable data model, schema/bootstrap concerns, retention/query behavior, and read API backing. Concretely, I think the decomposed MetricsPersistence work from #4397 should move under this durable metrics extension path, like extensions/metrics-reports/persistence/relational-jdbc, rather than remaining under generic metrics-reports/impl. That keeps durable JDBC metrics as one implementation of the reporting SPI, instead of making metrics persistence part of the shared core metastore persistence shape.

@dimas-b I'm trying to keep the useful work here moving, but with the SPI boundary corrected before we merge a shape that downstream implementations may start depending on. WDYT of my proposed plan?

Anand, one point from the mailing-list thread that may be worth repeating here: I agree migration matters for durable metrics data. To me, that is another reason to keep the durable data model scoped to the durable metrics implementation, rather than making the current per-type JDBC schema and BasePersistence coupling part of the shared SPI shape.

@flyingImer I think I have got all your comments.

Scope 1 changes: SPI layering correction

What changed

  1. IcebergMetricsReporter moved to polaris-core — the stable reporting SPI is no longer inside runtime/service (CDI-coupled). PolarisMetricsReporter, DefaultMetricsReporter, and PersistingMetricsReporter are deleted from runtime/service. All references updated.
  2. MetricsPersistence removed from BasePersistenceBasePersistence no longer extends MetricsPersistence; PolarisMetaStoreManager no longer extends PolarisMetricsManager. The durable metrics write/read path is gone from JdbcBasePersistenceImpl. The data models (ModelScanMetricsReport, ModelCommitMetricsReport, MetricsReportToken) are deleted from persistence/relational-jdbc.
  3. Default battery in extensions/metrics-reports/implNoOpMetricsReporter (@Identifier("no-op"), default) and LoggingMetricsReporter (@Identifier("log")) live here. The default reporting type is now "no-op".
  4. REST read path stubbed — MetricsReportsService.listTableMetrics() resolves and authorizes the table (full privilege check via LIST_TABLE_METRICS / TABLE_READ_METRICS), then returns an empty result set. Namespace decoding uses NamespaceUtils.splitNamespace().
  5. CHANGELOG corrected — removed the inaccurate "exposes persisted metrics" language; clarified that durable storage is a follow-up.

What's deferred (Scope 2)

Durable JDBC metrics persistence in a new self-contained module extensions/metrics-reports/persistence/relational-jdbc. It will own the data models, DDL bootstrap, write path (@Identifier("persisting") reporter), and the backing for the REST read API — without touching BasePersistence or PolarisMetaStoreManager.

Should I create the Scope 2 PR based on this right now or should I wait?

…licts

Conflicts resolved by keeping HEAD: metrics SPI classes deleted in Scope 1
refactor (MetricsPersistence removed from BasePersistence, MetricsReportPersistenceTest
and PolarisMetricsManager removed) take precedence over main's edits to those files.
Main-branch merge brought in references to MetricsPersistence
(PolarisCallContext.getMetricsPersistence, MetaStoreManagerFactory
.getOrCreateMetricsPersistence, etc.) that Scope 1 had deleted.
Restore the six SPI types as standalone interfaces decoupled from
BasePersistence so all existing callers compile correctly.
@obelix74

Copy link
Copy Markdown
Contributor Author

Should I create the Scope 2 PR based on this right now or should I wait?

Raised #4756 for the second scope.

Anand Kumar Sankaran added 5 commits June 14, 2026 09:31
… fix factory

JdbcBasePersistenceImpl still declared implements MetricsPersistence after
the Scope 1 refactor removed the import and all metric method bodies,
causing a cannot-find-symbol compile error at the class declaration line.

JdbcMetaStoreManagerFactory still passed JdbcBasePersistenceImpl to the
two-arg PolarisCallContext constructor (which requires P extends both
BasePersistence and MetricsPersistence) and returned it from
getOrCreateMetricsPersistence(), causing no-suitable-constructor and
incompatible-types errors.

Fixes:
- Remove MetricsPersistence from JdbcBasePersistenceImpl implements clause
- Use PolarisCallContext(realmContext, metaStore, new MetricsPersistence(){})
  in bootstrap, purge, and bootstrap-check code paths
- Return a no-op MetricsPersistence from getOrCreateMetricsPersistence();
  the real JDBC implementation is provided by the extension module in Scope 2
… catalogName

Java prohibits lambda parameters from shadowing enclosing-method parameters
of the same name. The no-op metricsReporter lambda introduced in the Scope 1
refactor used catalogName as a parameter name inside a createHandler override
whose method parameter is also catalogName, causing a compile error.
1. Remove stale MetricsReportToken$MetricsReportTokenType entry from
   persistence/relational-jdbc service file — the class was deleted in
   Scope 1 but ServiceLoader still found the registration, causing
   ServiceConfigurationError during pagination in LocalIcebergCatalog tests.

2. Fix PolarisCallContext constructor calls in JDBC test classes — both
   AtomicMetastoreManagerWithJdbcBasePersistenceImplTest and
   JdbcGrantRecordsIdempotencyTest used the two-arg constructor, which
   requires the second arg to implement MetricsPersistence. Now that
   JdbcBasePersistenceImpl no longer implements MetricsPersistence, the
   calls must explicitly pass a no-op MetricsPersistence.

3. Add polaris-extensions-metrics-reports as a runtimeOnly dependency to
   runtime/service — IcebergCatalogHandlerFactory injects IcebergMetricsReporter
   via CDI; without this module the NoOpMetricsReporter and LoggingMetricsReporter
   beans are absent, causing CDI injection failures in reportMetrics tests and
   integration tests.
…match application.properties

application.properties sets polaris.iceberg-metrics.reporting.type=default, but
LoggingMetricsReporter was annotated @Identifier("log") causing
UnsatisfiedResolutionException at runtime. Rename identifier to "default" and
update MetricsReportingConfiguration.type() default value to match.

Also regenerate config doc for MetricsReportingConfiguration.
@dimas-b

dimas-b commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

#4708 seems related too

@@ -16,38 +16,35 @@
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.polaris.service.reporting;
package org.apache.polaris.core.metrics;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IcebergMetricsReporter is NOT used in polaris-core.

From a general design perspective, I do not see a reason for defining an SPI class in polaris-core when polaris-core does not actually use that functionality.

polaris-core is a historical artifact and I do not think it was ever intended as a container for SPI classes.

IcebergMetricsReporter is actually called from IcebergCatalogHandler. So, it represents the "needs" of the runtime/services module in terms of metrics processing capabilities required by the REST API layer implemented in that module, which is exactly what SPI stands for.

If we want to establish / clarify our general guidelines for the location of SPI classes, I suggest starting a dedicated discussion on the dev ML.

As an incremental step, I believe adding a new module called extensions/metrics-reports/spi for this class could be a reasonable starting point. The polaris-core module will not depend on it. The runtime/service module and the impl. module will depend on it.

I would not attach too much meaning to the term "extensions" in this context. It does not mean that runtime cannot depend on "extensions", IMHO. I believe "extensions" in this repo is merely a container for plugins, offering different impl. options for some Polaris features. The naming of these modules is part of what, I believe, should be discussed on dev (in a non-blocking manner WRT this PR).

Keeping new SPI classes isolated from the start will simplify future refactoring after we reach consensus on dev about general SPI placement guidelines.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've extracted IcebergMetricsReporter into a new extensions/metrics-reports/spi module (polaris-extensions-metrics-reports-spi). polaris-core no longer has any dependency on or knowledge of this interface. runtime/service and the impl/jdbc extension modules each declare an explicit dep on the new spi module.

I also moved MetricsRecordConverter out of polaris-core at the same time — it was only ever called by PersistingMetricsReporter in the JDBC extension, so it now lives there instead.

Agreed on starting a dev ML thread to align on general SPI placement guidelines — I'll keep that scoped separately from this PR.

Comment thread runtime/service/build.gradle.kts Outdated
implementation(project(":polaris-api-catalog-service"))

runtimeOnly(project(":polaris-relational-jdbc"))
runtimeOnly(project(":polaris-extensions-metrics-reports"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this before... Does runtime/service need this dep? My local build was ok without it 😅 (but I did not run tests).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right to flag it — I've changed this from runtimeOnly to testRuntimeOnly. The reporters need to be on the classpath for the authz/integration tests that exercise the metrics reporting path, but there's no reason to push them onto the runtime classpath of every downstream server that depends on runtime/service. The runtime/server module already has its own explicit runtimeOnly dep for this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction to my previous comment: I've had to revert this back to runtimeOnly. The integration tests in this module (e.g. RestCatalogFileIT) are annotated with @QuarkusIntegrationTest, which packages and starts the application as a fully built artifact using the main runtimeClasspath. testRuntimeOnly excludes the dep from that packaged app, so LoggingMetricsReporter and NoOpMetricsReporter aren't discovered by CDI at startup, causing testSendMetricsReport() to fail with a non-204 response.

The runtimeOnly dep is needed here to support these integration tests. runtime/server already has an explicit runtimeOnly dep on the same module, so this is consistent — the duplicate is harmless.

@dimas-b dimas-b Jun 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, runtimeOnly is still sub-optimal... I believe it will leak into downstream build... let me explore this a bit deeper 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — it was leaking. Fixed by making the CDI producer resilient instead: if no IcebergMetricsReporter is found for the configured type, it now logs a warning and falls back to a no-op rather than hard-failing with UnsatisfiedResolutionException. This lets runtime/service drop the runtimeOnly dep on the impl module entirely.

For real deployments runtime/server retains its own explicit runtimeOnly dep on polaris-extensions-metrics-reports, so LoggingMetricsReporter is still present in production builds. For integration tests in runtime/service (@QuarkusIntegrationTest packages the app from runtimeClasspath), the no-op fallback satisfies CDI and the metrics endpoint still returns 204.

runtimeOnly("io.quarkus:quarkus-jdbc-postgresql")
runtimeOnly(project(":polaris-extensions-federation-hadoop"))
runtimeOnly(project(":polaris-extensions-auth-opa"))
runtimeOnly(project(":polaris-extensions-metrics-reports"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for clarify: this dep is perfectly fine from my POV as it makes the ASF Polaris Server build include the Metrics read endpoint(s), which is intended.

Downstream servers should not be required to include the Metrics endpoint (the concern about the runtime/service dep. change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, thanks for confirming. Left as-is.

Anand Kumar Sankaran added 3 commits June 15, 2026 11:57
…ics-reports/spi

IcebergMetricsReporter was never used inside polaris-core itself — it is
called from IcebergCatalogHandler in runtime/service, so polaris-core was
not the right home. Extract it into a new, minimal
extensions/metrics-reports/spi module so that:

* polaris-core has no dependency on (or knowledge of) the Iceberg metrics
  SPI
* downstream servers can opt out of the metrics extension without
  dragging in polaris-core machinery they do not need
* the SPI boundary is clear: runtime/service and the impl/jdbc extension
  modules all declare an explicit dep on polaris-extensions-metrics-reports-spi

MetricsRecordConverter (only used by the persisting JDBC reporter) is also
removed from polaris-core; it will live in the jdbc extension module on the
Scope 2 branch.

runtime/service: change runtimeOnly -> testRuntimeOnly for
polaris-extensions-metrics-reports so the impl is not silently pushed onto
the runtime classpath of every downstream server that depends on
runtime/service.
…/service

The @QuarkusIntegrationTest tests start a fully packaged application built
from runtimeClasspath. testRuntimeOnly excludes the dependency from that
packaged app, causing IcebergMetricsReporter CDI injection failures at
startup and testSendMetricsReport() to fail with a non-204 response.

runtimeOnly is required here so that LoggingMetricsReporter and
NoOpMetricsReporter are available when the Quarkus app is packaged and
started for integration tests.
The runtimeOnly dep on polaris-extensions-metrics-reports was leaking
into downstream consumers' runtime classpaths. Instead, make the
IcebergMetricsReporter CDI producer resilient: if no reporter is found
for the configured type, log a warning and fall back to a no-op.

For deployments, runtime/server declares its own runtimeOnly dep on the
impl module so LoggingMetricsReporter is always present there. For
runtime/service integration tests (@QuarkusIntegrationTest), the no-op
fallback satisfies CDI injection and the metrics endpoint returns 204.
dimas-b
dimas-b previously approved these changes Jun 15, 2026
LOGGER.warn(
"No IcebergMetricsReporter found for type '{}'; Iceberg metrics will be dropped",
config.type());
return (catalogName, catalogId, table, tableId, metricsReport, receivedTimestamp) -> {};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move NoOpMetricsReporter into the metrics SPI module? This way we'll have only one "no op" impl. 😉 and builds that do not need metrics reporters do not have to reinvent the "no op" reporter either.

As a side benefit, selected will always be resolvable (aside from config mistakes).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — moved NoOpMetricsReporter to the SPI module (extensions/metrics-reports/spi) with @Identifier("no-op"). The SPI module now also has Jandex and the CDI/SmallRye annotation deps so Quarkus discovers the bean automatically.

Changed the code default in MetricsReportingConfiguration from "default" to "no-op", so the lookup always resolves to the SPI bean when nothing else is configured. Production deployments still use type=default (→ LoggingMetricsReporter) via application.properties.

Added polaris.iceberg-metrics.reporting.type=no-op to application-test.properties and application-it.properties so unit and integration tests in runtime/service (which don't have the impl on the classpath) consistently use the SPI no-op. ServiceProducers.metricsReporter() is back to a simple .get().

testImplementation("org.junit.jupiter:junit-jupiter")
testImplementation(libs.assertj.core)
testImplementation(libs.mockito.core)
testImplementation(libs.jakarta.ws.rs.api)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IJ apparently suggests that this is redundant (given line 35).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

* under the License.
*/
package org.apache.polaris.service.reporting;
package org.apache.polaris.core.metrics;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this class is in extensions/metrics-reports/spi, should the package be org.apache.polaris.extension.metrics.reports.spi perhaps? WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The org.apache.polaris.core.metrics package was kept intentionally to avoid touching 7 import sites across the codebase. Happy to rename it in a follow-up PR once this lands — it's purely mechanical and easier to review in isolation.

# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only comments in this file? Should we just delete it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted.

@github-project-automation github-project-automation Bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Jun 15, 2026
…mments

- Move NoOpMetricsReporter from extensions/metrics-reports/impl to the
  SPI module so any project with polaris-extensions-metrics-reports-spi
  on its classpath gets a built-in no-op without reinventing one
- Add jandex + CDI/SmallRye annotation deps to the SPI module so Quarkus
  discovers the bean via classpath scanning
- Change MetricsReportingConfiguration code default to 'no-op' (was
  'default') so it always resolves to the SPI bean when no explicit type
  is set; production deployments continue to use type=default via
  application.properties (LoggingMetricsReporter)
- Override type=no-op in application-test.properties and
  application-it.properties so unit and integration tests in
  runtime/service (which do not have the impl module on runtimeClasspath)
  always get the SPI no-op without needing the impl module
- Revert ServiceProducers.metricsReporter() to simple .get() — the SPI
  no-op guarantees the lookup always succeeds for the configured defaults
- Remove redundant testImplementation(jakarta.ws.rs.api) from impl build
  (already covered by implementation() at line 35)
- Delete comments-only META-INF/services registration left after
  MetricsReportToken was removed in Scope 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants