Skip to content

Feature/connector azure data explorer#10986

Draft
KaiDevrim wants to merge 5 commits into
apache:devfrom
KaiDevrim:feature/connector-azure-data-explorer
Draft

Feature/connector azure data explorer#10986
KaiDevrim wants to merge 5 commits into
apache:devfrom
KaiDevrim:feature/connector-azure-data-explorer

Conversation

@KaiDevrim
Copy link
Copy Markdown

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

@github-actions github-actions Bot added document core SeaTunnel core module connectors-v2 labels May 30, 2026
Copy link
Copy Markdown
Contributor

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. I reviewed the current connector from the real source/sink call paths rather than only checking factory wiring.

What this PR solves

  • User pain: SeaTunnel does not currently ship an Azure Data Explorer source/sink connector.
  • Fix approach: add a new bounded source that runs a KQL query and a sink that ingests CSV batches into ADX.
  • One-line summary: the feature direction is good, but the current implementation still has correctness and packaging blockers on the main runtime path.

Runtime path I checked

bounded source
  -> AzureDataExplorerSplitEnumerator.run()
      -> create single split `adx-query`
  -> AzureDataExplorerSourceReader.pollNext()
      -> splitQueue.poll()
      -> executeQuery()
      -> emit all rows in one pass
  -> snapshotState()
      -> only persists remaining splitQueue

sink path
  -> AzureDataExplorerSinkWriter.write()
      -> serializer.toCsvLine(row)
      -> buffer rows
  -> flush()
      -> concatenate CSV lines
      -> ingestFromStream(...)

Findings

Problem 1: the bounded source drops its only split before the query finishes, so a failover can lose the unread part of the result set

  • Location:
    • seatunnel-connectors-v2/connector-azuredataexplorer/.../AzureDataExplorerSourceReader.java:71-77
    • seatunnel-connectors-v2/connector-azuredataexplorer/.../AzureDataExplorerSourceReader.java:179-186
  • Severity: High
  • Raised before by others: No
  • Why this is a real blocker:
    pollNext() removes the only split from splitQueue before executeQuery() starts reading. snapshotState() then persists only the remaining queue. If the reader fails after partially emitting rows but before a successful checkpoint finishes, restore sees an empty queue and never re-runs the query. That is a main-path bounded-source correctness bug.
  • Better fix:
    keep the split in state until the full query has been consumed and checkpointed, or add explicit progress/restart semantics instead of dropping the in-flight split immediately.

Problem 2: the connector is not registered in plugin-mapping.properties, so the dist layout still cannot discover its isolated dependency directory

  • Location:
    • missing update in plugin-mapping.properties
    • connector added in seatunnel-dist/pom.xml:381 and seatunnel-connectors-v2/pom.xml:95
  • Severity: High
  • Raised before by others: No
  • Why this is a real blocker:
    SeaTunnel connector loading relies on plugin-mapping.properties to resolve the connector subdirectory name under ${SEATUNNEL_HOME}/connectors. The PR adds the module and dist dependency, but not the plugin mapping entry, so the packaging story is still incomplete.
  • Better fix:
    add the new connector mapping and verify the final dist layout can actually load the connector.

Problem 3: CSV quoting is currently incorrect for embedded double quotes

  • Location: seatunnel-connectors-v2/connector-azuredataexplorer/.../AzureDataExplorerRowSerializer.java:71-78
  • Severity: High
  • Raised before by others: No
  • Why this is a real blocker:
    RFC-4180 escaping should double embedded quotes (" -> ""). The current code uses s.replace("\"", "\"\"\""), which turns one quote into three quotes. That corrupts string payloads containing " before they reach ADX.
  • Better fix:
    replace embedded quotes with exactly two quotes and keep the rest of the RFC-4180 quoting logic unchanged.

Problem 4: the new Java sources are missing the ASF license header

  • Location: for example AzureDataExplorerSourceReader.java, AzureDataExplorerSplitEnumerator.java, AzureDataExplorerSinkWriter.java, AzureDataExplorerSourceFactory.java, AzureDataExplorerSinkFactory.java
  • Severity: Medium
  • Raised before by others: No
  • Why this matters:
    new source files in SeaTunnel must carry the ASF header. This is part of the Apache contribution baseline, not optional cleanup.
  • Better fix:
    add the standard ASF header to all new source files before merge.

Tests / CI

  • I see factory/config tests, but there is still no runtime regression coverage for the failover state issue above.
  • The visible check metadata does not give me a meaningful failing runtime CI signal here; the blockers above are already source-level blockers.

Conclusion: can merge after fixes

  1. Blocking items
  • Problem 1
  • Problem 2
  • Problem 3
  1. Suggested non-blocking follow-up
  • Problem 4

@davidzollo davidzollo added the First-time contributor First-time contributor label May 31, 2026
@davidzollo
Copy link
Copy Markdown
Contributor

Please also enable CI check following by the instruction https://github.com/apache/seatunnel/pull/10986/checks?check_run_id=78684187561

image

@DanielLeens
Copy link
Copy Markdown
Contributor

Thanks for flagging the CI instruction here.

On Daniel's side, the current head 1d1397232afc2f41a40af6c5b347c633644f21af is still the exact same head I already reviewed, so I am keeping this as a reply-only follow-up rather than starting another full re-review. The CI enablement is worth doing, but the current source-level blockers from the latest Daniel review still stand on this unchanged revision as well.

Once there is a follow-up commit that addresses those runtime / packaging items and the required checks are enabled, please ping me again and I will re-review the new head from scratch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

connectors-v2 core SeaTunnel core module document First-time contributor First-time contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants