Skip to content

OpenLineage: add no-op POST /api/v1/lineage ingest endpoint#4667

Open
adnanhemani wants to merge 1 commit into
apache:mainfrom
adnanhemani:openlineage-ingest-noop
Open

OpenLineage: add no-op POST /api/v1/lineage ingest endpoint#4667
adnanhemani wants to merge 1 commit into
apache:mainfrom
adnanhemani:openlineage-ingest-noop

Conversation

@adnanhemani

Copy link
Copy Markdown
Contributor

Summary

Adds the OpenLineage-compatible ingest endpoint defined in the Polaris OpenLineage proposal. This first PR mounts the route and accepts events; persistence, dataset resolution, and downstream forwarding are follow-up PRs.

The endpoint is mounted at the standard OpenLineage path (POST /api/v1/lineage) so any engine using the OpenLineage HTTP transport (Spark, Flink, Airflow, Trino, dbt) can target Polaris by URL change alone — no client-side rewriting required.

Why hand-written instead of OpenAPI codegen

Body parsing follows the Marquez (OpenLineage reference server) pattern: a hand-written JAX-RS resource on top of io.openlineage:openlineage-java, with Jackson polymorphism keyed on the schemaURL field to dispatch between RunEvent / JobEvent / DatasetEvent.

The OpenAPI Generator's Java template cannot translate the OpenLineage spec's oneOf faithfully — it collapses the variants into a single class with every variant's required fields marked @NotNull, rejecting every valid event with a 400. Codegen is therefore intentionally skipped for this module; spec/openlineage-service.yaml is kept as documentation only.

Wrapper hierarchy

PolarisLineageEvent is a sealed base with three permitted subclasses (OfRunEvent, OfJobEvent, OfDatasetEvent). Each wrapper holds the official io.openlineage.server.OpenLineage.{Run,Job,Dataset}Event by composition (those classes are final). A custom JsonTypeIdResolver reads the trailing path segment of schemaURL (e.g. …/RunEvent) to dispatch. Unknown or missing schemaURL falls back to RunEvent, matching Marquez behavior.

The next PR (persistence/forwarding/query) receives PolarisLineageEvent.event() already typed as the correct OL event — no JSON re-parsing needed.

Files

  • spec/openlineage-service.yaml — documentation-only spec; header explains why it isn't a codegen source.
  • api/openlineage-service/ — new Gradle module: PolarisOpenLineageApi (JAX-RS resource), PolarisLineageEvent (sealed wrapper hierarchy), LineageEventTypeResolver (Jackson dispatch), PolarisOpenLineageApiService (service interface).
  • runtime/service/.../lineage/OpenLineageAdapter.java@RequestScoped no-op CDI bean returning 201.
  • gradle/libs.versions.toml — adds openlineage-java = "1.48.0".
  • gradle/projects.main.properties, runtime/service/build.gradle.kts — wire the new module into the build.

Manual verification

End-to-end against a running server:

Request Result
RunEvent with full body 201
JobEvent 201
DatasetEvent 201
Unknown schemaURL (RunEvent body) 201 (falls back to RunEvent)
Missing schemaURL 201 (falls back to RunEvent)
Empty {} body 201
No Authorization header 401

Standalone Jackson dispatch test (proves polymorphism, not just HTTP success):

RunEvent     -> wrapper=OfRunEvent,    event=RunEvent
JobEvent     -> wrapper=OfJobEvent,    event=JobEvent
DatasetEvent -> wrapper=OfDatasetEvent, event=DatasetEvent
unknown->Run -> wrapper=OfRunEvent,    event=RunEvent
missing->Run -> wrapper=OfRunEvent,    event=RunEvent

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: tracked in the Polaris OpenLineage proposal
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how) — manual end-to-end + Jackson dispatch test documented above; automated tests will land with the persistence PR
  • 💡 Added comments for complex logic — wrapper hierarchy and resolver carry rationale comments explaining the Marquez approach and the codegen avoidance
  • 🧾 Updated CHANGELOG.md (if needed) — N/A for a no-op endpoint with no user-facing behavior; will update when persistence/forwarding lands
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed) — N/A for the same reason

Adds the OpenLineage-compatible ingest endpoint defined in the Polaris
OpenLineage proposal. This first PR mounts the route and accepts events;
persistence, dataset resolution, and downstream forwarding are follow-ups.

The endpoint is mounted at the standard OpenLineage path (POST /api/v1/lineage)
so any engine using the OpenLineage HTTP transport (Spark, Flink, Airflow,
Trino, dbt) can target Polaris by URL change alone.

Body parsing follows the Marquez (OpenLineage reference server) pattern: a
hand-written JAX-RS resource on top of io.openlineage:openlineage-java, with
Jackson polymorphism keyed on the schemaURL field to dispatch between
RunEvent / JobEvent / DatasetEvent. The OpenAPI Generator's Java template
cannot translate the spec's oneOf faithfully -- it collapses the variants
into a single class with every variant's required fields marked @NotNull,
rejecting every valid event with 400 -- so codegen is intentionally skipped
for this module.

The PolarisLineageEvent wrapper hierarchy holds the official OL types by
composition (since OpenLineage.{Run,Job,Dataset}Event are final). Unknown or
missing schemaURL falls back to RunEvent, matching Marquez behavior.

Verified against a running server with full RunEvent / JobEvent / DatasetEvent
payloads (all return 201), unknown/missing schemaURL fallback (all parse as
RunEvent), and unauthenticated requests (401). Standalone Jackson dispatch
test confirms each schemaURL routes to the correct wrapper subclass.

@flyingImer flyingImer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Adnan! I think this is a useful route bring-up PR, and I agree this pr should stay small: no query API, no persistence model, no dataset-resolution semantics, and no forwarding implementation need to land here.

The concern I want to clarify is the replacement point this PR creates for follow-up work.

Right now the PR adds the OpenLineage route and a no-op runtime implementation. The only seam behind the JAX-RS resource is PolarisOpenLineageApiService, whose Javadoc frames it as the runtime service interface behind the resource, and whose method returns JAX-RS Response while accepting request/security context. That looks appropriate for API/runtime delegation, but I don't think it should be the seam that future OpenLineage ingest implementations build behind.

The first seam behind a new endpoint tends to become the place where follow-up behavior accumulates. If following prs add persistence, forwarding, or deployment-specific behavior directly behind this runtime delegate, then replacing OpenLineage ingest behavior means depending on or patching the JAX-RS/runtime service shape. That raises the SPI cost for advanced integrators and makes Polaris less platform-like.

What I think we want instead is a small, explicit replacement point behind the runtime delegate. Downstream deployments may already have their own OpenLineage ingest backend, persistence path, or forwarding path. They should be able to plug that behavior into Polaris without changing the OSS resource or making the runtime service itself the extension contract.

I'm not asking this pr to define a general Polaris lineage SPI. With only OpenLineage in scope, that may be premature. I'd actually prefer this first seam be explicitly OpenLineage-specific and named that way, e.g.:

OpenLineageIngestProvider.ingest(OpenLineageIngestRequest) -> OpenLineageIngestResult

Bottomline, the important constraints are (from my SPI overhaul exercise):

  • it is an OpenLineage ingress provider seam, not the general Polaris lineage SPI;
  • it does not return JAX-RS Response;
  • it does not require provider implementations to accept SecurityContext or other runtime request objects;
  • CDI/config/identifier/priority can still be used by the Polaris runtime to select the default or downstream implementation;
  • the default implementation in this PR can still be no-op.

Conceptually, PR1 could stay as small as:

JAX-RS OpenLineage resource
  -> OpenLineage ingest adapter
  -> OpenLineageIngestProvider
       default: NoOpOpenLineageIngestProvider
       downstream: deployment-specific OpenLineage provider
  -> runtime maps provider result to 201

That keeps this PR concrete to OpenLineage, gives downstreams a replacement point for OL ingest behavior, and avoids prematurely defining the general Polaris lineage model/query/persistence semantics.

Comment on lines +26 to +28
* Service interface implemented by the runtime to handle OpenLineage ingest. Mirrors the pattern
* used by other Polaris API modules where the JAX-RS resource sits in the API module and
* delegates to a CDI-scoped service implementation in {@code polaris-runtime-service}.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Javadoc usefully frames the type as the runtime service interface behind the JAX-RS resource. I think that distinction is important to preserve.

My concern is not that this PR claims this is the lineage SPI; it does not. My concern is that this is currently the only seam behind the OpenLineage route. Since the method returns JAX-RS Response and accepts request/security context, I'd prefer to keep this as API/runtime delegation and introduce a separate OpenLineage-specific provider seam behind it for follow-up implementations.

* @return the JAX-RS response. OpenLineage clients expect {@code 201 Created} with no body on
* success.
*/
Response sendLineageEvent(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the replacement point behind this service use OpenLineage-specific provider request/result types rather than JAX-RS/runtime types?

For example:

OpenLineageIngestProvider.ingest(OpenLineageIngestRequest) -> OpenLineageIngestResult

This would let this API service stay responsible for mapping HTTP/runtime context to a provider call and mapping the provider result back to Response, while giving downstream deployments a provider contract they can implement without patching the JAX-RS resource or depending on Response/SecurityContext as the extension contract.

description:
OAuth2 client-credentials flow against the Polaris token endpoint.
The same client-id/secret used for catalog access is used here; only
an additional LINEAGE_INGEST privilege grant is required.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems ahead of the implementation in this PR.

The spec says the same client credentials need an additional LINEAGE_INGEST privilege, but the resource currently uses @RolesAllowed("**"). If privilege enforcement is not part of this pr, can we either remove this as current-contract language or mark it clearly as future work?

* https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent}. The fragment name is the
* stable discriminator across spec versions.
*
* <p>If {@code schemaURL} is missing or unrecognized, the body is parsed as a {@code RunEvent} —

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Marquez-compatible fallback is reasonable as an OpenLineage ingress behavior, iiuc.

I'd just like to keep this scoped to the OL adapter layer. The Polaris provider seam behind the adapter should receive a Polaris-owned request/result shape after this compatibility decision has been made, rather than requiring provider implementations to understand OL schemaURL fallback behavior.

* the OpenAPI generator's Java template does not faithfully translate the spec's {@code oneOf}
* over event variants.
*/
@Path("/api/v1/lineage")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried this makes a one-way-door API decision.

/api/v1/lineage reads like the generic Polaris lineage API, but this PR is specifically adding an OpenLineage-compatible ingest endpoint. Once this public route exists, it becomes hard to later introduce a Polaris-native lineage API or another lineage format without either overloading this path or adding a second, awkward namespace.

From recent community discussions, Polaris is trying to be a platform with replaceable capabilities, not just an opinionated OpenLineage receiver. If the first public lineage route is generic but semantically OpenLineage-specific, OpenLineage becomes the default interpretation of "Polaris lineage" at the API boundary. That is a hard thing to unwind later because REST paths are external contracts.

Could we either make the OL specificity visible in the route/API shape, or at least make it very explicit in naming/docs that this is the OpenLineage ingress endpoint and not the generic Polaris lineage API?

That keeps room for a future lineage surface, if one is ever needed, without making OpenLineage the default meaning of "Polaris lineage" at the API boundary.

@Timed("polaris.OpenLineageApi.sendLineageEvent")
@Timeout
public Response sendLineageEvent(
@NotNull @Valid PolarisLineageEvent event,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid making the JAX-RS method's parsed body type the object that follow-up implementations build around?

Right now the first object passed past the REST resource is PolarisLineageEvent, which wraps the current OpenLineage Java event types. That is reasonable inside an OpenLineage parser/adapter, but I'd prefer the replacement seam to be an explicitly OpenLineage ingest request/result contract rather than this runtime method signature.

The first object passed across the runtime boundary tends to become the object that follow-up persistence, forwarding, and deployment-specific implementations accept in their APIs and tests. If that object is the JAX-RS method parameter, then the extension point is shaped by the resource binding and the current OpenLineage Java model, rather than by a small provider contract that Polaris controls. That makes it harder to evolve the HTTP adapter independently from provider implementations.

Concretely, could the runtime path look more like:

raw OpenLineage request
  -> parse/validate as OpenLineage
  -> build OpenLineageIngestRequest
  -> call OpenLineageIngestProvider

rather than having future provider implementations build directly around the JAX-RS method parameter?

* land in follow-up PRs as described in the Polaris OpenLineage proposal.
*/
@RequestScoped
public class OpenLineageAdapter implements PolarisOpenLineageApiService {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The no-op scope is clear here, which I like.

Can we make the no-op implementation use the same replacement point that future real implementations will use?

Right now the no-op behavior lives directly in this runtime service implementation. That works for route bring-up, but it does not show where a future store/forward/custom OpenLineage implementation should plug in without modifying this runtime service.

Early no-op implementations often become the extension point by inertia. If follow-up PRs add persistence or forwarding directly to this runtime service, then replacing lineage behavior means replacing runtime wiring rather than implementing a clean provider contract. That is exactly the kind of boundary ambiguity the SPI work is trying to avoid.

For the initial pr, I think it would be enough to add a small OpenLineageIngestProvider contract plus a default NoOpOpenLineageIngestProvider, and have this runtime adapter call it. The default provider can still be selected/wired by CDI if that is the runtime mechanism; the key is that CDI selection remains runtime wiring, not the provider contract itself.

@Override
public Response sendLineageEvent(
PolarisLineageEvent event, RealmContext realmContext, SecurityContext securityContext) {
return Response.status(Response.Status.CREATED).build();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this 201 come from a provider result instead of being the whole implementation?

For example, even if the only provider in PR1 is no-op:

OpenLineageIngestResult result = provider.ingest(request);
return toResponse(result);

and the no-op provider always returns accepted.

The HTTP status is runtime behavior, while accept/reject/unavailable is provider behavior. Keeping those separate lets future implementations change lineage handling without owning JAX-RS response construction. It also gives us a place to define provider semantics before persistence/query/forwarding land.

That small indirection would make the intended OpenLineage replacement point concrete without adding persistence or forwarding yet. Future implementations would change provider behavior, while this runtime class would stay responsible for HTTP response mapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants