[Fix][E2E] Stabilize engine failover test and rebalance connector shards by DanielLeens · Pull Request #10949 · apache/seatunnel

DanielLeens · 2026-05-25T12:10:43Z

Why

This PR extracts the remaining actionable CI fixes from the previous CI follow-up line without mixing them back into the DB2 work.

It addresses two concrete failures observed from the split CI PR line:

engine-v2-it (11) could fail in ClusterFailureNoRestoreIT because the batch job sometimes finished before the test actually shut down the worker.
all-connectors-it-2 (11) could lose the hosted runner heartbeat because the shard was too heavy, especially when connector-iceberg-e2e and connector-hbase-e2e ran together with the rest of part-2.

What is changed

Stabilize ClusterFailureNoRestoreIT by:
- keeping the batch source busy longer with a larger row count per parallelism
- waiting for the job to reach RUNNING
- waiting for observable output progress before shutting down the worker
Rebalance connector CI shards by:
- removing connector-iceberg-e2e and connector-hbase-e2e from all-connectors-it-2
- adding a new dedicated all-connectors-it-8 job for those two heavier suites

Validation

Executed in the local checkout used for this PR:

./mvnw spotless:apply -nsu -Dmaven.gitcommitid.skip=true -T 3C
git diff --check -- .github/workflows/backend.yml seatunnel-e2e/seatunnel-engine-e2e/connector-seatunnel-e2e-base/src/test/java/org/apache/seatunnel/engine/e2e/ClusterFailureNoRestoreIT.java
Recomputed the workflow split and verified the new layout:
- all-connectors-it-2: :connector-assert-e2e,:connector-file-cos-e2e,:connector-rabbitmq-e2e,:connector-easysearch-e2e,:connector-qdrant-e2e,:connector-aerospike-e2e
- all-connectors-it-8: :connector-iceberg-e2e,:connector-hbase-e2e

Not fully revalidated in this run

I also attempted a focused local runtime validation for ClusterFailureNoRestoreIT, but the module is currently blocked by an unrelated upstream compile issue in LocalModeIT (SeaTunnelClient#getHealthMetrics(String) is missing from the current API surface in this branch line). That issue is outside the scope of this PR, so this PR keeps the fix focused on the two current actionable CI failures only.

… cleanup - backend.yml: add first-position removal patterns for connector-iceberg-e2e and connector-hbase-e2e so they are correctly stripped even when sorted first in the shard module list (previous //,module/ pattern silently missed first-position modules that have no leading comma) - KafkaIT.java: replace hardcoded jobId "18696753645413" in testRestoreKafkaToKafkaExactlyOnceOnStreaming with dynamic nanoTime value, consistent with how topic/group names are already dynamized - KafkaIT.java: track dynamically-created topics in a CopyOnWriteArrayList and delete them in tearDown() to prevent Kafka broker metadata bloat from accumulated retention.ms=-1 topics across CI runs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot added CI&CD e2e labels May 25, 2026

davidzollo marked this pull request as draft May 25, 2026 13:01

davidzollo marked this pull request as ready for review May 30, 2026 05:29

DanielLeens and others added 3 commits June 1, 2026 12:27

[Fix][E2E] Stabilize engine failover test and rebalance connector shards

06e4dcd

[Fix][E2E] Stabilize Kafka and engine CI flakes

751eb48

DanielLeens force-pushed the david_fix_pr10925_ci_local branch from 065b2b3 to 5202424 Compare June 1, 2026 04:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix][E2E] Stabilize engine failover test and rebalance connector shards#10949

[Fix][E2E] Stabilize engine failover test and rebalance connector shards#10949
DanielLeens wants to merge 3 commits into
apache:devfrom
DanielLeens:david_fix_pr10925_ci_local

DanielLeens commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DanielLeens commented May 25, 2026

Why

What is changed

Validation

Not fully revalidated in this run

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant