ttl: stabilize TestCancelWhileScan runtime by zanmato1984 · Pull Request #67657 · pingcap/tidb

zanmato1984 · 2026-04-09T11:28:39Z

What problem does this PR solve?

Issue Number: ref #66982

Problem Summary:

TestCancelWhileScan still times out in CI under resource pressure. The previous fix in #67285 addressed statement-boundary cancellation correctness, but this test still spent too much time in long-mode stress setup/execution and could exceed shard timeout.

What changed and how does it work?

This PR keeps the same cancellation assertions and makes the stress path cheaper and more deterministic:

Batch test data inserts in TestCancelWhileScan instead of issuing 10k single-row inserts.
Use bounded rounds (10 default, 30 in -long) instead of time-based loops.
Add a small scan delay failpoint (sleepCoprRequest=200ms) so each round still exercises cancellation while avoiding heavy table size/time requirements.

This keeps the regression coverage focused on cancellation responsiveness while removing the long-tail runtime behavior that caused timeouts.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

None

Summary by CodeRabbit

Tests
- Improved test efficiency by switching from many individual inserts to batched data setup.
- Reduced test runtime and resource use through fewer database calls.
- Made scan/cancel test runs more consistent by controlling iteration counts via test-mode flags.
- Continued focus on exercising cancellation behavior and fault-injection scenarios.

pantheon-ai · 2026-04-09T11:28:46Z

@zanmato1984 I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

_{ℹ️ Learn more details on Pantheon AI.}

tiprow · 2026-04-09T11:28:58Z

Hi @zanmato1984. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai · 2026-04-09T11:29:02Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 970d5b20-7bfe-4113-b374-6badf4609936

📥 Commits

Reviewing files that changed from the base of the PR and between f140c74 and b6bfd3e.

📒 Files selected for processing (1)

pkg/ttl/ttlworker/scan_integration_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/ttl/ttlworker/scan_integration_test.go

📝 Walkthrough

Walkthrough

Test TestCancelWhileScan changes single-row INSERTs to batched multi-row INSERTs: introduces totalRows and batchSize, conditionally scales totalRows for long tests, builds per-batch VALUES with fmt.Sprintf + strings.Join, and adds a strings import.

Changes

TTL Scan Integration Test

Layer / File(s)	Summary
Batched test data insertion `pkg/ttl/ttlworker/scan_integration_test.go`	Replaces many single-row INSERTs with batched multi-row INSERTs. Adds `totalRows` and `batchSize`, increases `totalRows` when `testflag.Long()` is true, constructs per-row tuples with `fmt.Sprintf` and joins them with `strings.Join` per batch, executes one `INSERT` per batch, and adds the `strings` import.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

ok-to-test

Suggested reviewers

YangKeao
wjhuang2016
lcwangchao

Poem

🐰 I stitch the rows in tidy stacks,
One query now where thousands were,
Batches leap where inserts lagged,
Quiet hops, a slimmer run,
The rabbit smiles — the test is done.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main objective: stabilizing the test's runtime by optimizing the insert strategy and execution approach.
Description check	✅ Passed	The description includes all required sections with clear explanations of the problem, changes made, and justification. The issue number and problem summary are provided, along with detailed technical changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

zanmato1984

Role: Reviewer-R1

LGTM (round 1).

Checks:

Original test intent is preserved: TestCancelWhileScan still asserts cancellation completes within 1s after cancel().
Change scope is test-only and minimal: only pkg/ttl/ttlworker/scan_integration_test.go is updated.
Recurrence analysis is consistent with evidence from the reopened flaky issue/build logs: current recurrence is timeout/runtime pressure rather than statement-boundary correctness.
No unnecessary timing-only precise repro is retained: this update simplifies the stress profile and keeps deterministic cancellation coverage without extra reproducer-only paths.

Validation:

./tools/check/failpoint-go-test.sh pkg/ttl/ttlworker -run '^TestCancelWhileScan$' -count=20 (pass)

codecov · 2026-04-09T11:46:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.2559%. Comparing base (997e75c) to head (b6bfd3e).
⚠️ Report is 168 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #67657        +/-   ##
================================================
- Coverage   77.5871%   76.2559%   -1.3313%     
================================================
  Files          1981       1995        +14     
  Lines        547950     576477     +28527     
================================================
+ Hits         425139     439598     +14459     
- Misses       122001     135372     +13371     
- Partials        810       1507       +697

Flag	Coverage Δ
integration	`41.6898% <ø> (+7.3501%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`61.5065% <ø> (+0.0901%)`	⬆️
parser	`∅ <ø> (∅)`
br	`48.3363% <ø> (-12.0949%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

YangKeao · 2026-04-14T02:22:02Z

-	testStart := time.Now()
-	testDuration := time.Second
+	rounds := 10
 	if testflag.Long() {


Long mode doesn't affect the CI, so I don't think this change will be helpful.

Good point. I removed the testflag.Long() branch so CI and local runs now execute the same loop count (rounds := 10), while keeping the original cancellation assertion and test intent unchanged.

Validation:

./tools/check/failpoint-go-test.sh pkg/ttl/ttlworker -run "^TestCancelWhileScan$" -count=1 (pass)

make lint (pass)

Included in commit 10aeb41daf.

yinsustart · 2026-04-27T02:53:04Z

/retest

tiprow · 2026-04-27T02:53:29Z

@yinsustart: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/ttl/ttlworker/scan_integration_test.go`:
- Line 44: There's a typo: the identifier "tesflag" used in the condition
calling tesflag.Long() should be the imported package "testflag"; replace the
incorrect "tesflag" with "testflag" so the condition reads testflag.Long(),
matching the import and the other usage (e.g., the call at line using
testflag.Long()) to fix the compile error.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b64994b8-a92e-4c17-bd12-e562470fc6e0

📥 Commits

Reviewing files that changed from the base of the PR and between 10aeb41 and f140c74.

📒 Files selected for processing (1)

pkg/ttl/ttlworker/scan_integration_test.go

Signed-off-by: Yang Keao <yangkeao@chunibyo.icu>

bb7133

LGTM

ti-chi-bot · 2026-05-15T05:27:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bb7133, YangKeao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [YangKeao,bb7133]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-05-15T05:27:21Z

[LGTM Timeline notifier]

Timeline:

2026-05-15 03:12:53.938852092 +0000 UTC m=+408142.471631411: ☑️ agreed by YangKeao.
2026-05-15 05:27:20.569760703 +0000 UTC m=+416209.102540022: ☑️ agreed by bb7133.

tiprow · 2026-05-15T05:30:17Z

@zanmato1984: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
tidb_parser_test	`b6bfd3e`	link	true	`/test tidb_parser_test`
fast_test_tiprow	`b6bfd3e`	link	true	`/test fast_test_tiprow`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

ref pingcap#66982

ttl: stabilize TestCancelWhileScan runtime

fca6b36

ti-chi-bot Bot added the release-note-none Denotes a PR that doesn't merit a release note. label Apr 9, 2026

ti-chi-bot Bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 9, 2026

zanmato1984 commented Apr 9, 2026

View reviewed changes

YangKeao reviewed Apr 14, 2026

View reviewed changes

ttl: remove long-mode branch from cancel scan test

10aeb41

yinsustart requested a review from YangKeao May 10, 2026 08:58

YangKeao approved these changes May 15, 2026

View reviewed changes

ti-chi-bot Bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 15, 2026

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

Comment thread pkg/ttl/ttlworker/scan_integration_test.go Outdated

only keep the batch insert modification

b6bfd3e

Signed-off-by: Yang Keao <yangkeao@chunibyo.icu>

YangKeao force-pushed the issue-66982-flaky-timeout branch from f140c74 to b6bfd3e Compare May 15, 2026 03:24

bb7133 approved these changes May 15, 2026

View reviewed changes

ti-chi-bot Bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 15, 2026

ti-chi-bot Bot merged commit 5ffa989 into pingcap:master May 15, 2026
33 of 36 checks passed

zanmato1984 mentioned this pull request May 15, 2026

Flaky test: TestCancelWhileScan in pkg/ttl/ttlworker #66982

Open

yongman pushed a commit to yongman/tidb that referenced this pull request Jun 3, 2026

ttl: stabilize TestCancelWhileScan runtime (pingcap#67657)

3531705

ref pingcap#66982

Conversation

zanmato1984 commented Apr 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Summary by CodeRabbit

Uh oh!

pantheon-ai Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tiprow Bot commented Apr 9, 2026

Uh oh!

coderabbitai Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

zanmato1984 left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

YangKeao Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

zanmato1984 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

yinsustart commented Apr 27, 2026

Uh oh!

tiprow Bot commented Apr 27, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bb7133 left a comment

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented May 15, 2026

Uh oh!

ti-chi-bot Bot commented May 15, 2026

[LGTM Timeline notifier]

Uh oh!

tiprow Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zanmato1984 commented Apr 9, 2026 •

edited by coderabbitai Bot

Loading

pantheon-ai Bot commented Apr 9, 2026 •

edited

Loading

coderabbitai Bot commented Apr 9, 2026 •

edited

Loading

codecov Bot commented Apr 9, 2026 •

edited

Loading