[Fix][Connector-V2][MySQL CDC] Use checkpoint offset for timestamp startup restore#10987
Conversation
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for working on this. I reviewed the full current head locally and traced the restore path from split startup offset selection into the binlog fetch task.
What this PR solves
- User pain: with
startup.mode = TIMESTAMP, a job restored from checkpoint can incorrectly re-resolve the startup position from the configured timestamp instead of honoring the restored binlog offset. - Fix approach: distinguish a pure timestamp bootstrap offset from a real restored binlog offset, and only apply timestamp resolution/filtering to the former.
- One-line summary: the source-level fix looks correct to me on the latest head, and I did not find a new code blocker.
Runtime path I checked
startup / restore
-> MySqlSourceFetchTaskContext.getInitOffset() [302-315]
-> only resolve timestamp when the split startup offset is still a pure timestamp placeholder
-> otherwise reuse the restored split startup offset directly
binlog read
-> MySqlBinlogFetchTask.execute() [73-101]
-> shouldFilterByTimestamp(...) [143-147]
-> only uses TimestampFilterMySqlStreamingChangeEventSource for the timestamp-bootstrap case
Re-review result
BinlogOffset.isTimestampOffset()(BinlogOffset.java:114-118) cleanly separates a timestamp-only bootstrap offset from a restored binlog/file-position offset.- That same guard is now used consistently in both startup-offset resolution and timestamp filtering.
- I do not see a reopened GTID/file-position correctness issue from this change on the current head.
Tests / CI
- The new unit coverage around timestamp startup offset handling looks good to me.
- I did not see a flaky-test pattern in the added tests.
- The current GitHub
Buildfailure does not look like a source regression from this diff. The check-run output is still the fork workflow detection failure, so it is not giving us a usable validation signal yet.
Conclusion: can merge after fixes
- Blocking items
- No new source-level blocker from my side on the latest head.
- Please get the current GitHub
Buildinto a valid state first (enable/re-run the fork workflow or refresh from latestdevand push again) so the branch has a real CI signal before merge.
- Suggested non-blocking follow-up
- None from my side on the code path itself.
From the code-review side this looks good now. The remaining blocker is the missing CI signal, not a logic bug in the latest restore-path fix.
1da685e to
2933563
Compare
|
Thanks for your review @DanielLeens.. I synced the branch with latest dev and pushed again. The Build check is running now. |
2933563 to
c2e0e0c
Compare
DanielLeens
left a comment
There was a problem hiding this comment.
Thanks for working on this. I reviewed the full current head locally and traced the restore path from split startup offset selection into the binlog fetch task.
What this PR solves
- User pain: with
startup.mode = TIMESTAMP, a job restored from checkpoint can incorrectly re-resolve the startup position from the configured timestamp instead of honoring the restored binlog offset. - Fix approach: distinguish a pure timestamp bootstrap offset from a real restored binlog offset, and only apply timestamp resolution/filtering to the former.
- One-line summary: the source-level fix looks correct to me on the latest head, and I did not find a new code blocker.
Runtime path I checked
startup / restore
-> MySqlSourceFetchTaskContext.getInitOffset() [302-315]
-> only resolve timestamp when the split startup offset is still a pure timestamp placeholder
-> otherwise reuse the restored split startup offset directly
binlog read
-> MySqlBinlogFetchTask.execute() [73-101]
-> shouldFilterByTimestamp(...) [143-147]
-> only uses TimestampFilterMySqlStreamingChangeEventSource for the timestamp-bootstrap case
Re-review result
BinlogOffset.isTimestampOffset()(BinlogOffset.java:114-118) cleanly separates a timestamp-only bootstrap offset from a restored binlog/file-position offset.- That same guard is now used consistently in both startup-offset resolution and timestamp filtering.
- I do not see a reopened GTID/file-position correctness issue from this change on the current head.
Tests / CI
- The new unit coverage around timestamp startup offset handling looks good to me.
- I did not see a flaky-test pattern in the added tests.
- The current GitHub
Buildfailure does not look like a source regression from this diff. The check-run output is still not giving a clean, usable CI signal for the latest head.
Conclusion: can merge after fixes
- Blocking items
- No new source-level blocker from my side on the latest head.
- Please get the current GitHub
Buildinto a valid state first (enable/re-run the fork workflow or refresh from latestdevand push again) so the branch has a real CI signal before merge.
- Suggested non-blocking follow-up
- None from my side on the code path itself.
From the code-review side this looks good now. The remaining blocker is the missing CI signal, not a logic bug in the latest restore-path fix.
Purpose of this pull request
Fixes #10899.
For
startup.mode = timestamp, the initial incremental split stores a timestamp-only binlog offset. That offset should be resolved to a concrete MySQL binlog position before the reader starts.After a checkpoint restore, the split already contains the concrete binlog offset saved by the reader. Reusing the configured timestamp at that point can move the recovery anchor back to the original startup timestamp, so this change uses the restored offset directly and skips the timestamp event filter for restored binlog offsets.
Does this PR introduce any user-facing change?
Yes. MySQL CDC jobs configured with
startup.mode = timestampcan recover from checkpoints using the saved binlog offset, instead of resolving the original startup timestamp again.How was this patch tested?
Added unit coverage for the restore decisions and checkpoint-state path:
MySqlSourceFetchTaskContextonly resolves the configured timestamp for timestamp-only bootstrap offsets.MySqlBinlogFetchTaskonly applies the timestamp filter for timestamp-only bootstrap offsets.IncrementalSplitState.toSourceSplit()preserves a checkpointed binlog offset after a timestamp bootstrap, so restore does not fall back to the original timestamp.Verified with:
Check list
New License Guide
incompatible-changes.mdto describe the incompatibility caused by this PR.