dxf: resolve task store before task execution#68824
Conversation
|
@D3Hunter I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
|
Hi @D3Hunter. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughCentralize task-store resolution in Manager: rename Param.Store→Param.TaskStore, have Manager resolve per-task TaskStore by keyspace (WithNewSession/GetKSStore), and simplify schedulers/executors to use the provided TaskStore without per-executor keyspace switching. ChangesTask Store Resolution Refactoring
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #68824 +/- ##
================================================
+ Coverage 76.3105% 76.5485% +0.2380%
================================================
Files 2041 2054 +13
Lines 563465 582668 +19203
================================================
+ Hits 429983 446024 +16041
- Misses 132566 133914 +1348
- Partials 916 2730 +1814
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/ddl/backfilling_dist_executor.go (1)
137-143: ⚡ Quick winWrap the keyspace session-pool failure with task context.
return nil, errdrops which task/keyspace failed during cross-keyspace executor setup, which will make production triage harder.Proposed fix
if ddlObj.store.GetKeyspace() != taskKS { if err := s.GetTaskTable().WithNewSession(func(se sessionctx.Context) error { svr := se.GetSQLServer() sp, err := svr.GetKSSessPool(taskKS) sessPool = sess.NewSessionPool(sp) return err }); err != nil { - return nil, err + return nil, errors.Annotatef(err, "init backfill session pool for task %d in keyspace %v", s.task.ID, taskKS) } }As per coding guidelines, "Go code: Keep error handling actionable and contextual; avoid silently swallowing errors".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/ddl/backfilling_dist_executor.go` around lines 137 - 143, The error returned from s.GetTaskTable().WithNewSession(...) is not contextualized with which task/keyspace failed (variables: GetTaskTable, WithNewSession, sessionctx.Context, GetSQLServer, GetKSSessPool, sess.NewSessionPool, sessPool), so change the error return to wrap the underlying err with the task/keyspace info (e.g., include taskKS or task identifier) using fmt.Errorf or errors.Wrapf before returning; ensure either the closure or the outer return returns a wrapped error like "failed to get session pool for keyspace %s: %w" so production logs show which keyspace/task caused the failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/ddl/backfilling_dist_executor.go`:
- Around line 137-143: The error returned from
s.GetTaskTable().WithNewSession(...) is not contextualized with which
task/keyspace failed (variables: GetTaskTable, WithNewSession,
sessionctx.Context, GetSQLServer, GetKSSessPool, sess.NewSessionPool, sessPool),
so change the error return to wrap the underlying err with the task/keyspace
info (e.g., include taskKS or task identifier) using fmt.Errorf or errors.Wrapf
before returning; ensure either the closure or the outer return returns a
wrapped error like "failed to get session pool for keyspace %s: %w" so
production logs show which keyspace/task caused the failure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 5009e9b5-16c9-4ab2-b3e9-779336fb5f57
📒 Files selected for processing (1)
pkg/ddl/backfilling_dist_executor.go
|
🔍 Starting code review for this PR... |
D3Hunter
left a comment
There was a problem hiding this comment.
Summary
- Total findings: 4
- Inline comments: 4
- Summary-only findings (no inline anchor): 0
Findings (highest risk first)
🟡 [Minor] (3)
- Helper name still promises a store return value (
pkg/ddl/backfilling_dist_scheduler.go:139 and pkg/ddl/backfilling_dist_scheduler.go:186) - Task-store resolution is duplicated between DXF managers (
pkg/dxf/framework/taskexecutor/manager.go:319 and pkg/dxf/framework/scheduler/scheduler_manager.go:354) - Executor goroutine is only unblocked after assertions (
pkg/dxf/framework/taskexecutor/manager_test.go:548)
🧹 [Nit] (1)
- TaskStore comment has unclear keyspace wording (
pkg/dxf/framework/taskexecutor/task_executor.go:77)
ingress-bot
left a comment
There was a problem hiding this comment.
This review was generated by AI and should be verified by a human reviewer.
Manual follow-up is recommended before merge.
Summary
- Total findings: 10
- Inline comments: 7
- Summary-only findings (no inline anchor): 0
Findings (highest risk first)
⚠️ [Major] (3)
- Cross-keyspace store resolution now blocks the serial task-manager loop (pkg/dxf/framework/taskexecutor/manager.go:320, pkg/dxf/framework/taskexecutor/manager.go:179, pkg/domain/crossks/cross_ks.go:91)
- Transient task-keyspace store lookup now fails the whole task instead of being retried (pkg/dxf/framework/taskexecutor/manager.go:327, pkg/dxf/framework/taskexecutor/manager.go:406, pkg/dxf/framework/scheduler/scheduler_manager.go:360)
- Param.TaskStore doc comment is grammatically broken and obscures the keyspace contract (pkg/dxf/framework/taskexecutor/task_executor.go:80)
🟡 [Minor] (4)
- Cross-keyspace gate is now split between the task-executor manager (store) and the backfill step executor (session pool) (pkg/ddl/backfilling_dist_executor.go:136, pkg/dxf/framework/taskexecutor/manager.go:320, pkg/dxf/framework/taskexecutor/task_executor.go:77)
- getUserStoreAndTable no longer returns a store; name is now misleading (pkg/ddl/backfilling_dist_scheduler.go:186)
- Cross-keyspace task store resolution has no rationale comment at the enforcement point (pkg/dxf/framework/taskexecutor/manager.go:320)
- Keyspace task-store resolution is duplicated across the scheduler and task-executor managers (pkg/dxf/framework/taskexecutor/manager.go:320, pkg/dxf/framework/scheduler/scheduler_manager.go:355)
ℹ️ [Info] (1)
- Executor-side task-store resolution lacks the keyspace-consistency guard the scheduler path enforces, so the two halves diverge on mismatch (pkg/dxf/framework/taskexecutor/manager.go:320, pkg/dxf/framework/scheduler/scheduler.go:124, pkg/dxf/importinto/task_executor_test.go:514)
🧹 [Nit] (2)
- err2 naming and idiom diverge from the WithNewSession pattern used elsewhere in this PR (pkg/dxf/framework/taskexecutor/manager.go:321)
- New manager tests register global task-executor types without cleanup, leaking shared mutable state across the suite (pkg/dxf/framework/taskexecutor/manager_test.go:539, pkg/dxf/framework/taskexecutor/register.go:39)
ingress-bot
left a comment
There was a problem hiding this comment.
This review was generated by AI and should be verified by a human reviewer.
Manual follow-up is recommended before merge.
Summary
- Total findings: 7
- Inline comments: 7
- Summary-only findings (no inline anchor): 0
Findings (highest risk first)
⚠️ [Major] (1)
- Transient cross-keyspace store resolution terminally fails the task instead of retrying (pkg/dxf/framework/taskexecutor/manager.go:326, pkg/dxf/framework/taskexecutor/manager.go:409)
🟡 [Minor] (3)
- Backfill executor resolves sessPool with a second keyspace predicate keyed off a different instance store than the manager used (pkg/ddl/backfilling_dist_executor.go:133, pkg/dxf/framework/taskexecutor/manager.go:320)
- TaskStore keyspace invariant is enforced only on the scheduler side, not the executor side (pkg/dxf/framework/taskexecutor/task_executor.go:77, pkg/dxf/framework/scheduler/scheduler.go:124, pkg/dxf/importinto/task_executor_test.go:88)
- Executor path resolves the task store but never re-checks it against task.Keyspace like the scheduler does (pkg/dxf/importinto/task_executor.go:912, pkg/dxf/framework/taskexecutor/manager.go:319, pkg/dxf/framework/scheduler/scheduler.go:124)
ℹ️ [Info] (1)
- New
Param.TaskStoredoc diverges from the parallelscheduler.Param.TaskStoredoc for the same concept (pkg/dxf/framework/taskexecutor/task_executor.go:77, pkg/dxf/framework/scheduler/interface.go:185)
🧹 [Nit] (2)
- Scheduler test injects TaskStore by hand-building BaseScheduler, bypassing the provided test seams (pkg/ddl/backfilling_dist_scheduler_test.go:49, pkg/ddl/backfilling_dist_scheduler_test.go:178)
- New manager test leaks a global task-type registration without cleanup (pkg/dxf/framework/taskexecutor/manager_test.go:536, pkg/dxf/framework/taskexecutor/register.go:39)
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/dxf/framework/taskexecutor/manager.go (1)
320-330: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd an explanatory comment and post-resolution keyspace validation.
The keyspace resolution block silently swaps the instance store for the task-keyspace store, which controls whose keyspace data the task reads and writes. Without an inline comment, future editors may not understand that this branch enforces next-gen multi-keyspace isolation. Additionally, there is no validation that
GetKSStorereturned a store matchingtask.Keyspace, so a misconfiguration could silently proceed with the wrong store.As per coding guidelines, comments should explain non-obvious intent and correctness constraints.
📝 Suggested addition
taskStore := m.store + // Next-gen multi-keyspace isolation: when the task belongs to a different + // keyspace, resolve and use the task's keyspace store for all data operations. if m.store.GetKeyspace() != task.Keyspace { if err2 := m.taskTable.WithNewSession(func(se sessionctx.Context) error { var err2 error taskStore, err2 = se.GetSQLServer().GetKSStore(task.Keyspace) return err2 }); err2 != nil { m.logger.Warn("get task store failed", zap.Int64("task-id", task.ID), zap.String("task-key", task.Key), zap.Error(err2)) return false } + // Defensive check: verify the resolved store matches the task keyspace. + if taskStore.GetKeyspace() != task.Keyspace { + m.logger.Error("resolved task store keyspace mismatch", + zap.Int64("task-id", task.ID), zap.String("task-keyspace", task.Keyspace), + zap.String("store-keyspace", taskStore.GetKeyspace())) + return false + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dxf/framework/taskexecutor/manager.go` around lines 320 - 330, Add an inline comment above the branch that explains this code is intentionally swapping the executor's instance store to the task-specific keyspace to enforce next-gen multi-keyspace isolation and the correctness constraints this implies; then after calling m.taskTable.WithNewSession(... se.GetSQLServer().GetKSStore(task.Keyspace) ...) verify that the returned taskStore actually corresponds to task.Keyspace (e.g., check its GetKeyspace/GetName or other identity method) and if it does not match, log a warning including task.ID and task.Key and the resolved store identity and return false to avoid silently operating on the wrong keyspace.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@pkg/dxf/framework/taskexecutor/manager.go`:
- Around line 320-330: Add an inline comment above the branch that explains this
code is intentionally swapping the executor's instance store to the
task-specific keyspace to enforce next-gen multi-keyspace isolation and the
correctness constraints this implies; then after calling
m.taskTable.WithNewSession(... se.GetSQLServer().GetKSStore(task.Keyspace) ...)
verify that the returned taskStore actually corresponds to task.Keyspace (e.g.,
check its GetKeyspace/GetName or other identity method) and if it does not
match, log a warning including task.ID and task.Key and the resolved store
identity and return false to avoid silently operating on the wrong keyspace.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: eb14813e-9cb8-4f45-b0e8-717eb75cbbac
📒 Files selected for processing (6)
pkg/dxf/framework/scheduler/interface.gopkg/dxf/framework/taskexecutor/BUILD.bazelpkg/dxf/framework/taskexecutor/manager.gopkg/dxf/framework/taskexecutor/manager_test.gopkg/dxf/framework/taskexecutor/task_executor.gopkg/dxf/framework/taskexecutor/task_executor_test.go
✅ Files skipped from review due to trivial changes (1)
- pkg/dxf/framework/scheduler/interface.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/dxf/framework/taskexecutor/task_executor.go
- pkg/dxf/framework/taskexecutor/manager_test.go
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
D3Hunter
left a comment
There was a problem hiding this comment.
Summary
- Total findings: 4
- Inline comments: 4
- Summary-only findings (no inline anchor): 0
Findings (highest risk first)
⚠️ [Major] (1)
- Empty-keyspace tasks can get stuck before executor startup (
pkg/dxf/framework/taskexecutor/manager.go:320)
🟡 [Minor] (3)
- Keyspace mismatch error omits both keyspace values (
pkg/dxf/framework/taskexecutor/task_executor.go:265 and pkg/dxf/framework/scheduler/scheduler.go:127) - Test name hides the deliberately mismatched task keyspace (
pkg/dxf/importinto/task_executor_test.go:73) - Task-store keyspace resolution is duplicated across scheduler and executor managers (
pkg/dxf/framework/taskexecutor/manager.go:319 and pkg/dxf/framework/scheduler/scheduler_manager.go:354)
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: joechenrh, wjhuang2016 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@D3Hunter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What problem does this PR solve?
Issue Number: ref #61702
Problem Summary:
For NextGen DXF tasks, a task can belong to a different keyspace from the TiDB instance that is running the task executor manager. The downstream DDL backfill and import-into executors need to use the task keyspace store consistently, but the store lookup was duplicated in multiple executor paths.
What changed and how does it work?
This PR resolves the task keyspace store in the DXF task executor manager before creating the task executor, then passes the resolved store through task executor parameters.
The DDL backfill scheduler/executor and import-into executor now use the manager-provided
TaskStorefor table metadata, read-index planning, ingest, merge-temp-index, and import step execution instead of resolving the task keyspace store again.It also adds targeted unit coverage for successful cross-keyspace task store resolution, lookup failure handling, and import-into consuming the provided task store directly.
Check List
Tests
Commands run:
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
Bug Fixes
Refactor
Tests
Chores