Feature/connector azure data explorer#10986
Conversation
…er, and exceptions
…ocumentation for Source and Sink
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for the contribution. I reviewed the current connector from the real source/sink call paths rather than only checking factory wiring.
What this PR solves
- User pain: SeaTunnel does not currently ship an Azure Data Explorer source/sink connector.
- Fix approach: add a new bounded source that runs a KQL query and a sink that ingests CSV batches into ADX.
- One-line summary: the feature direction is good, but the current implementation still has correctness and packaging blockers on the main runtime path.
Runtime path I checked
bounded source
-> AzureDataExplorerSplitEnumerator.run()
-> create single split `adx-query`
-> AzureDataExplorerSourceReader.pollNext()
-> splitQueue.poll()
-> executeQuery()
-> emit all rows in one pass
-> snapshotState()
-> only persists remaining splitQueue
sink path
-> AzureDataExplorerSinkWriter.write()
-> serializer.toCsvLine(row)
-> buffer rows
-> flush()
-> concatenate CSV lines
-> ingestFromStream(...)
Findings
Problem 1: the bounded source drops its only split before the query finishes, so a failover can lose the unread part of the result set
- Location:
seatunnel-connectors-v2/connector-azuredataexplorer/.../AzureDataExplorerSourceReader.java:71-77seatunnel-connectors-v2/connector-azuredataexplorer/.../AzureDataExplorerSourceReader.java:179-186
- Severity: High
- Raised before by others: No
- Why this is a real blocker:
pollNext()removes the only split fromsplitQueuebeforeexecuteQuery()starts reading.snapshotState()then persists only the remaining queue. If the reader fails after partially emitting rows but before a successful checkpoint finishes, restore sees an empty queue and never re-runs the query. That is a main-path bounded-source correctness bug. - Better fix:
keep the split in state until the full query has been consumed and checkpointed, or add explicit progress/restart semantics instead of dropping the in-flight split immediately.
Problem 2: the connector is not registered in plugin-mapping.properties, so the dist layout still cannot discover its isolated dependency directory
- Location:
- missing update in
plugin-mapping.properties - connector added in
seatunnel-dist/pom.xml:381andseatunnel-connectors-v2/pom.xml:95
- missing update in
- Severity: High
- Raised before by others: No
- Why this is a real blocker:
SeaTunnel connector loading relies onplugin-mapping.propertiesto resolve the connector subdirectory name under${SEATUNNEL_HOME}/connectors. The PR adds the module and dist dependency, but not the plugin mapping entry, so the packaging story is still incomplete. - Better fix:
add the new connector mapping and verify the final dist layout can actually load the connector.
Problem 3: CSV quoting is currently incorrect for embedded double quotes
- Location:
seatunnel-connectors-v2/connector-azuredataexplorer/.../AzureDataExplorerRowSerializer.java:71-78 - Severity: High
- Raised before by others: No
- Why this is a real blocker:
RFC-4180 escaping should double embedded quotes ("->""). The current code usess.replace("\"", "\"\"\""), which turns one quote into three quotes. That corrupts string payloads containing"before they reach ADX. - Better fix:
replace embedded quotes with exactly two quotes and keep the rest of the RFC-4180 quoting logic unchanged.
Problem 4: the new Java sources are missing the ASF license header
- Location: for example
AzureDataExplorerSourceReader.java,AzureDataExplorerSplitEnumerator.java,AzureDataExplorerSinkWriter.java,AzureDataExplorerSourceFactory.java,AzureDataExplorerSinkFactory.java - Severity: Medium
- Raised before by others: No
- Why this matters:
new source files in SeaTunnel must carry the ASF header. This is part of the Apache contribution baseline, not optional cleanup. - Better fix:
add the standard ASF header to all new source files before merge.
Tests / CI
- I see factory/config tests, but there is still no runtime regression coverage for the failover state issue above.
- The visible check metadata does not give me a meaningful failing runtime CI signal here; the blockers above are already source-level blockers.
Conclusion: can merge after fixes
- Blocking items
- Problem 1
- Problem 2
- Problem 3
- Suggested non-blocking follow-up
- Problem 4
|
Please also enable CI check following by the instruction https://github.com/apache/seatunnel/pull/10986/checks?check_run_id=78684187561
|
|
Thanks for flagging the CI instruction here. On Daniel's side, the current head Once there is a follow-up commit that addresses those runtime / packaging items and the required checks are enabled, please ping me again and I will re-review the new head from scratch. |

Purpose of this pull request
Does this PR introduce any user-facing change?
How was this patch tested?
Check list
New License Guide
incompatible-changes.mdto describe the incompatibility caused by this PR.