Skip to content

[Fix][Connector-V2][MySQL CDC] Use checkpoint offset for timestamp startup restore#10987

Open
goutamadwant wants to merge 1 commit into
apache:devfrom
goutamadwant:fix/seatunnel-10899-timestamp-restore-offset
Open

[Fix][Connector-V2][MySQL CDC] Use checkpoint offset for timestamp startup restore#10987
goutamadwant wants to merge 1 commit into
apache:devfrom
goutamadwant:fix/seatunnel-10899-timestamp-restore-offset

Conversation

@goutamadwant
Copy link
Copy Markdown

@goutamadwant goutamadwant commented May 31, 2026

Purpose of this pull request

Fixes #10899.

For startup.mode = timestamp, the initial incremental split stores a timestamp-only binlog offset. That offset should be resolved to a concrete MySQL binlog position before the reader starts.

After a checkpoint restore, the split already contains the concrete binlog offset saved by the reader. Reusing the configured timestamp at that point can move the recovery anchor back to the original startup timestamp, so this change uses the restored offset directly and skips the timestamp event filter for restored binlog offsets.

Does this PR introduce any user-facing change?

Yes. MySQL CDC jobs configured with startup.mode = timestamp can recover from checkpoints using the saved binlog offset, instead of resolving the original startup timestamp again.

How was this patch tested?

Added unit coverage for the restore decisions and checkpoint-state path:

  • MySqlSourceFetchTaskContext only resolves the configured timestamp for timestamp-only bootstrap offsets.
  • MySqlBinlogFetchTask only applies the timestamp filter for timestamp-only bootstrap offsets.
  • IncrementalSplitState.toSourceSplit() preserves a checkpointed binlog offset after a timestamp bootstrap, so restore does not fall back to the original timestamp.

Verified with:

JAVA_HOME=$(/usr/libexec/java_home -v 11) PATH="$JAVA_HOME/bin:$PATH" ./mvnw -pl seatunnel-connectors-v2/connector-cdc/connector-cdc-mysql -DfailIfNoTests=false package

Check list

Copy link
Copy Markdown
Contributor

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I reviewed the full current head locally and traced the restore path from split startup offset selection into the binlog fetch task.

What this PR solves

  • User pain: with startup.mode = TIMESTAMP, a job restored from checkpoint can incorrectly re-resolve the startup position from the configured timestamp instead of honoring the restored binlog offset.
  • Fix approach: distinguish a pure timestamp bootstrap offset from a real restored binlog offset, and only apply timestamp resolution/filtering to the former.
  • One-line summary: the source-level fix looks correct to me on the latest head, and I did not find a new code blocker.

Runtime path I checked

startup / restore
  -> MySqlSourceFetchTaskContext.getInitOffset() [302-315]
      -> only resolve timestamp when the split startup offset is still a pure timestamp placeholder
      -> otherwise reuse the restored split startup offset directly

binlog read
  -> MySqlBinlogFetchTask.execute() [73-101]
      -> shouldFilterByTimestamp(...) [143-147]
      -> only uses TimestampFilterMySqlStreamingChangeEventSource for the timestamp-bootstrap case

Re-review result

  • BinlogOffset.isTimestampOffset() (BinlogOffset.java:114-118) cleanly separates a timestamp-only bootstrap offset from a restored binlog/file-position offset.
  • That same guard is now used consistently in both startup-offset resolution and timestamp filtering.
  • I do not see a reopened GTID/file-position correctness issue from this change on the current head.

Tests / CI

  • The new unit coverage around timestamp startup offset handling looks good to me.
  • I did not see a flaky-test pattern in the added tests.
  • The current GitHub Build failure does not look like a source regression from this diff. The check-run output is still the fork workflow detection failure, so it is not giving us a usable validation signal yet.

Conclusion: can merge after fixes

  1. Blocking items
  • No new source-level blocker from my side on the latest head.
  • Please get the current GitHub Build into a valid state first (enable/re-run the fork workflow or refresh from latest dev and push again) so the branch has a real CI signal before merge.
  1. Suggested non-blocking follow-up
  • None from my side on the code path itself.

From the code-review side this looks good now. The remaining blocker is the missing CI signal, not a logic bug in the latest restore-path fix.

@goutamadwant goutamadwant force-pushed the fix/seatunnel-10899-timestamp-restore-offset branch 3 times, most recently from 1da685e to 2933563 Compare June 2, 2026 03:34
@goutamadwant
Copy link
Copy Markdown
Author

Thanks for your review @DanielLeens.. I synced the branch with latest dev and pushed again. The Build check is running now.

@goutamadwant goutamadwant force-pushed the fix/seatunnel-10899-timestamp-restore-offset branch from 2933563 to c2e0e0c Compare June 2, 2026 03:43
Copy link
Copy Markdown
Contributor

@DanielLeens DanielLeens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I reviewed the full current head locally and traced the restore path from split startup offset selection into the binlog fetch task.

What this PR solves

  • User pain: with startup.mode = TIMESTAMP, a job restored from checkpoint can incorrectly re-resolve the startup position from the configured timestamp instead of honoring the restored binlog offset.
  • Fix approach: distinguish a pure timestamp bootstrap offset from a real restored binlog offset, and only apply timestamp resolution/filtering to the former.
  • One-line summary: the source-level fix looks correct to me on the latest head, and I did not find a new code blocker.

Runtime path I checked

startup / restore
  -> MySqlSourceFetchTaskContext.getInitOffset() [302-315]
      -> only resolve timestamp when the split startup offset is still a pure timestamp placeholder
      -> otherwise reuse the restored split startup offset directly

binlog read
  -> MySqlBinlogFetchTask.execute() [73-101]
      -> shouldFilterByTimestamp(...) [143-147]
      -> only uses TimestampFilterMySqlStreamingChangeEventSource for the timestamp-bootstrap case

Re-review result

  • BinlogOffset.isTimestampOffset() (BinlogOffset.java:114-118) cleanly separates a timestamp-only bootstrap offset from a restored binlog/file-position offset.
  • That same guard is now used consistently in both startup-offset resolution and timestamp filtering.
  • I do not see a reopened GTID/file-position correctness issue from this change on the current head.

Tests / CI

  • The new unit coverage around timestamp startup offset handling looks good to me.
  • I did not see a flaky-test pattern in the added tests.
  • The current GitHub Build failure does not look like a source regression from this diff. The check-run output is still not giving a clean, usable CI signal for the latest head.

Conclusion: can merge after fixes

  1. Blocking items
  • No new source-level blocker from my side on the latest head.
  • Please get the current GitHub Build into a valid state first (enable/re-run the fork workflow or refresh from latest dev and push again) so the branch has a real CI signal before merge.
  1. Suggested non-blocking follow-up
  • None from my side on the code path itself.

From the code-review side this looks good now. The remaining blocker is the missing CI signal, not a logic bug in the latest restore-path fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [MySQL CDC] MySQL cdc start by time,TIMESTAMP startup mode cannot recover from checkpoints.

2 participants