OpenLineage: add no-op POST /api/v1/lineage ingest endpoint#4667
OpenLineage: add no-op POST /api/v1/lineage ingest endpoint#4667adnanhemani wants to merge 1 commit into
Conversation
Adds the OpenLineage-compatible ingest endpoint defined in the Polaris OpenLineage proposal. This first PR mounts the route and accepts events; persistence, dataset resolution, and downstream forwarding are follow-ups. The endpoint is mounted at the standard OpenLineage path (POST /api/v1/lineage) so any engine using the OpenLineage HTTP transport (Spark, Flink, Airflow, Trino, dbt) can target Polaris by URL change alone. Body parsing follows the Marquez (OpenLineage reference server) pattern: a hand-written JAX-RS resource on top of io.openlineage:openlineage-java, with Jackson polymorphism keyed on the schemaURL field to dispatch between RunEvent / JobEvent / DatasetEvent. The OpenAPI Generator's Java template cannot translate the spec's oneOf faithfully -- it collapses the variants into a single class with every variant's required fields marked @NotNull, rejecting every valid event with 400 -- so codegen is intentionally skipped for this module. The PolarisLineageEvent wrapper hierarchy holds the official OL types by composition (since OpenLineage.{Run,Job,Dataset}Event are final). Unknown or missing schemaURL falls back to RunEvent, matching Marquez behavior. Verified against a running server with full RunEvent / JobEvent / DatasetEvent payloads (all return 201), unknown/missing schemaURL fallback (all parse as RunEvent), and unauthenticated requests (401). Standalone Jackson dispatch test confirms each schemaURL routes to the correct wrapper subclass.
flyingImer
left a comment
There was a problem hiding this comment.
Thanks Adnan! I think this is a useful route bring-up PR, and I agree this pr should stay small: no query API, no persistence model, no dataset-resolution semantics, and no forwarding implementation need to land here.
The concern I want to clarify is the replacement point this PR creates for follow-up work.
Right now the PR adds the OpenLineage route and a no-op runtime implementation. The only seam behind the JAX-RS resource is PolarisOpenLineageApiService, whose Javadoc frames it as the runtime service interface behind the resource, and whose method returns JAX-RS Response while accepting request/security context. That looks appropriate for API/runtime delegation, but I don't think it should be the seam that future OpenLineage ingest implementations build behind.
The first seam behind a new endpoint tends to become the place where follow-up behavior accumulates. If following prs add persistence, forwarding, or deployment-specific behavior directly behind this runtime delegate, then replacing OpenLineage ingest behavior means depending on or patching the JAX-RS/runtime service shape. That raises the SPI cost for advanced integrators and makes Polaris less platform-like.
What I think we want instead is a small, explicit replacement point behind the runtime delegate. Downstream deployments may already have their own OpenLineage ingest backend, persistence path, or forwarding path. They should be able to plug that behavior into Polaris without changing the OSS resource or making the runtime service itself the extension contract.
I'm not asking this pr to define a general Polaris lineage SPI. With only OpenLineage in scope, that may be premature. I'd actually prefer this first seam be explicitly OpenLineage-specific and named that way, e.g.:
OpenLineageIngestProvider.ingest(OpenLineageIngestRequest) -> OpenLineageIngestResult
Bottomline, the important constraints are (from my SPI overhaul exercise):
- it is an OpenLineage ingress provider seam, not the general Polaris lineage SPI;
- it does not return JAX-RS Response;
- it does not require provider implementations to accept SecurityContext or other runtime request objects;
- CDI/config/identifier/priority can still be used by the Polaris runtime to select the default or downstream implementation;
- the default implementation in this PR can still be no-op.
Conceptually, PR1 could stay as small as:
JAX-RS OpenLineage resource
-> OpenLineage ingest adapter
-> OpenLineageIngestProvider
default: NoOpOpenLineageIngestProvider
downstream: deployment-specific OpenLineage provider
-> runtime maps provider result to 201
That keeps this PR concrete to OpenLineage, gives downstreams a replacement point for OL ingest behavior, and avoids prematurely defining the general Polaris lineage model/query/persistence semantics.
| * Service interface implemented by the runtime to handle OpenLineage ingest. Mirrors the pattern | ||
| * used by other Polaris API modules where the JAX-RS resource sits in the API module and | ||
| * delegates to a CDI-scoped service implementation in {@code polaris-runtime-service}. |
There was a problem hiding this comment.
This Javadoc usefully frames the type as the runtime service interface behind the JAX-RS resource. I think that distinction is important to preserve.
My concern is not that this PR claims this is the lineage SPI; it does not. My concern is that this is currently the only seam behind the OpenLineage route. Since the method returns JAX-RS Response and accepts request/security context, I'd prefer to keep this as API/runtime delegation and introduce a separate OpenLineage-specific provider seam behind it for follow-up implementations.
| * @return the JAX-RS response. OpenLineage clients expect {@code 201 Created} with no body on | ||
| * success. | ||
| */ | ||
| Response sendLineageEvent( |
There was a problem hiding this comment.
Could the replacement point behind this service use OpenLineage-specific provider request/result types rather than JAX-RS/runtime types?
For example:
OpenLineageIngestProvider.ingest(OpenLineageIngestRequest) -> OpenLineageIngestResult
This would let this API service stay responsible for mapping HTTP/runtime context to a provider call and mapping the provider result back to Response, while giving downstream deployments a provider contract they can implement without patching the JAX-RS resource or depending on Response/SecurityContext as the extension contract.
| description: | ||
| OAuth2 client-credentials flow against the Polaris token endpoint. | ||
| The same client-id/secret used for catalog access is used here; only | ||
| an additional LINEAGE_INGEST privilege grant is required. |
There was a problem hiding this comment.
This seems ahead of the implementation in this PR.
The spec says the same client credentials need an additional LINEAGE_INGEST privilege, but the resource currently uses @RolesAllowed("**"). If privilege enforcement is not part of this pr, can we either remove this as current-contract language or mark it clearly as future work?
| * https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent}. The fragment name is the | ||
| * stable discriminator across spec versions. | ||
| * | ||
| * <p>If {@code schemaURL} is missing or unrecognized, the body is parsed as a {@code RunEvent} — |
There was a problem hiding this comment.
The Marquez-compatible fallback is reasonable as an OpenLineage ingress behavior, iiuc.
I'd just like to keep this scoped to the OL adapter layer. The Polaris provider seam behind the adapter should receive a Polaris-owned request/result shape after this compatibility decision has been made, rather than requiring provider implementations to understand OL schemaURL fallback behavior.
| * the OpenAPI generator's Java template does not faithfully translate the spec's {@code oneOf} | ||
| * over event variants. | ||
| */ | ||
| @Path("/api/v1/lineage") |
There was a problem hiding this comment.
I'm worried this makes a one-way-door API decision.
/api/v1/lineage reads like the generic Polaris lineage API, but this PR is specifically adding an OpenLineage-compatible ingest endpoint. Once this public route exists, it becomes hard to later introduce a Polaris-native lineage API or another lineage format without either overloading this path or adding a second, awkward namespace.
From recent community discussions, Polaris is trying to be a platform with replaceable capabilities, not just an opinionated OpenLineage receiver. If the first public lineage route is generic but semantically OpenLineage-specific, OpenLineage becomes the default interpretation of "Polaris lineage" at the API boundary. That is a hard thing to unwind later because REST paths are external contracts.
Could we either make the OL specificity visible in the route/API shape, or at least make it very explicit in naming/docs that this is the OpenLineage ingress endpoint and not the generic Polaris lineage API?
That keeps room for a future lineage surface, if one is ever needed, without making OpenLineage the default meaning of "Polaris lineage" at the API boundary.
| @Timed("polaris.OpenLineageApi.sendLineageEvent") | ||
| @Timeout | ||
| public Response sendLineageEvent( | ||
| @NotNull @Valid PolarisLineageEvent event, |
There was a problem hiding this comment.
Could we avoid making the JAX-RS method's parsed body type the object that follow-up implementations build around?
Right now the first object passed past the REST resource is PolarisLineageEvent, which wraps the current OpenLineage Java event types. That is reasonable inside an OpenLineage parser/adapter, but I'd prefer the replacement seam to be an explicitly OpenLineage ingest request/result contract rather than this runtime method signature.
The first object passed across the runtime boundary tends to become the object that follow-up persistence, forwarding, and deployment-specific implementations accept in their APIs and tests. If that object is the JAX-RS method parameter, then the extension point is shaped by the resource binding and the current OpenLineage Java model, rather than by a small provider contract that Polaris controls. That makes it harder to evolve the HTTP adapter independently from provider implementations.
Concretely, could the runtime path look more like:
raw OpenLineage request
-> parse/validate as OpenLineage
-> build OpenLineageIngestRequest
-> call OpenLineageIngestProvider
rather than having future provider implementations build directly around the JAX-RS method parameter?
| * land in follow-up PRs as described in the Polaris OpenLineage proposal. | ||
| */ | ||
| @RequestScoped | ||
| public class OpenLineageAdapter implements PolarisOpenLineageApiService { |
There was a problem hiding this comment.
The no-op scope is clear here, which I like.
Can we make the no-op implementation use the same replacement point that future real implementations will use?
Right now the no-op behavior lives directly in this runtime service implementation. That works for route bring-up, but it does not show where a future store/forward/custom OpenLineage implementation should plug in without modifying this runtime service.
Early no-op implementations often become the extension point by inertia. If follow-up PRs add persistence or forwarding directly to this runtime service, then replacing lineage behavior means replacing runtime wiring rather than implementing a clean provider contract. That is exactly the kind of boundary ambiguity the SPI work is trying to avoid.
For the initial pr, I think it would be enough to add a small OpenLineageIngestProvider contract plus a default NoOpOpenLineageIngestProvider, and have this runtime adapter call it. The default provider can still be selected/wired by CDI if that is the runtime mechanism; the key is that CDI selection remains runtime wiring, not the provider contract itself.
| @Override | ||
| public Response sendLineageEvent( | ||
| PolarisLineageEvent event, RealmContext realmContext, SecurityContext securityContext) { | ||
| return Response.status(Response.Status.CREATED).build(); |
There was a problem hiding this comment.
Could this 201 come from a provider result instead of being the whole implementation?
For example, even if the only provider in PR1 is no-op:
OpenLineageIngestResult result = provider.ingest(request);
return toResponse(result);and the no-op provider always returns accepted.
The HTTP status is runtime behavior, while accept/reject/unavailable is provider behavior. Keeping those separate lets future implementations change lineage handling without owning JAX-RS response construction. It also gives us a place to define provider semantics before persistence/query/forwarding land.
That small indirection would make the intended OpenLineage replacement point concrete without adding persistence or forwarding yet. Future implementations would change provider behavior, while this runtime class would stay responsible for HTTP response mapping.
Summary
Adds the OpenLineage-compatible ingest endpoint defined in the Polaris OpenLineage proposal. This first PR mounts the route and accepts events; persistence, dataset resolution, and downstream forwarding are follow-up PRs.
The endpoint is mounted at the standard OpenLineage path (
POST /api/v1/lineage) so any engine using the OpenLineage HTTP transport (Spark, Flink, Airflow, Trino, dbt) can target Polaris by URL change alone — no client-side rewriting required.Why hand-written instead of OpenAPI codegen
Body parsing follows the Marquez (OpenLineage reference server) pattern: a hand-written JAX-RS resource on top of
io.openlineage:openlineage-java, with Jackson polymorphism keyed on theschemaURLfield to dispatch betweenRunEvent/JobEvent/DatasetEvent.The OpenAPI Generator's Java template cannot translate the OpenLineage spec's
oneOffaithfully — it collapses the variants into a single class with every variant's required fields marked@NotNull, rejecting every valid event with a 400. Codegen is therefore intentionally skipped for this module;spec/openlineage-service.yamlis kept as documentation only.Wrapper hierarchy
PolarisLineageEventis a sealed base with three permitted subclasses (OfRunEvent,OfJobEvent,OfDatasetEvent). Each wrapper holds the officialio.openlineage.server.OpenLineage.{Run,Job,Dataset}Eventby composition (those classes arefinal). A customJsonTypeIdResolverreads the trailing path segment ofschemaURL(e.g.…/RunEvent) to dispatch. Unknown or missingschemaURLfalls back toRunEvent, matching Marquez behavior.The next PR (persistence/forwarding/query) receives
PolarisLineageEvent.event()already typed as the correct OL event — no JSON re-parsing needed.Files
spec/openlineage-service.yaml— documentation-only spec; header explains why it isn't a codegen source.api/openlineage-service/— new Gradle module:PolarisOpenLineageApi(JAX-RS resource),PolarisLineageEvent(sealed wrapper hierarchy),LineageEventTypeResolver(Jackson dispatch),PolarisOpenLineageApiService(service interface).runtime/service/.../lineage/OpenLineageAdapter.java—@RequestScopedno-op CDI bean returning201.gradle/libs.versions.toml— addsopenlineage-java = "1.48.0".gradle/projects.main.properties,runtime/service/build.gradle.kts— wire the new module into the build.Manual verification
End-to-end against a running server:
RunEventwith full body201JobEvent201DatasetEvent201schemaURL(RunEvent body)201(falls back to RunEvent)schemaURL201(falls back to RunEvent){}body201Authorizationheader401Standalone Jackson dispatch test (proves polymorphism, not just HTTP success):
Checklist
CHANGELOG.md(if needed) — N/A for a no-op endpoint with no user-facing behavior; will update when persistence/forwarding landssite/content/in-dev/unreleased(if needed) — N/A for the same reason