Skip to content

*: add keyspace observability labels#68836

Merged
ti-chi-bot[bot] merged 4 commits into
pingcap:masterfrom
zeminzhou:zz/cherry-pick-keyspace-observability-master
Jun 2, 2026
Merged

*: add keyspace observability labels#68836
ti-chi-bot[bot] merged 4 commits into
pingcap:masterfrom
zeminzhou:zz/cherry-pick-keyspace-observability-master

Conversation

@zeminzhou
Copy link
Copy Markdown
Contributor

@zeminzhou zeminzhou commented Jun 1, 2026

What problem does this PR solve?

Issue Number: ref #67765
more detials: #68405

Problem Summary:

Cherry-pick keyspace observability changes from release-nextgen to master:

What changed and how does it work?

  • Add keyspace observability config mapping from keyspace metadata keys to metric labels, slow log fields, and statement summary log fields.
  • Resolve keyspace metadata during TiDB startup for NextGen TiKV Starter deployments.
  • Attach keyspace ID/name and configured keyspace metadata values to metrics, slow logs, and statement summary logs.
  • Enforce Keyspace_meta_ prefix for slow log field names.
  • Update generated Bazel metadata for the master branch.

Check List

Tests

  • Unit test
    • go test ./pkg/config -run '^(TestKeyspaceObservability|TestKeyspaceObservabilityInvalid|TestDeployModeConfig)$' -count=1
    • go test ./pkg/util/metricsutil -run '^TestRegisterMetricsWithKeyspaceObservabilityValues$' -count=1
    • go test ./pkg/standby -run '^(TestActivateRequestMetadata|TestActivateRequiresKeyspaceName)$' -count=1
    • go test ./pkg/util/stmtsummary/v2 -run '^TestStmtRecord$' -count=1
    • ./tools/check/failpoint-go-test.sh cmd/tidb-server -tags=intest,deadlock,nextgen -run '^TestSetupKeyspaceObservabilityForStarter' -count=1
    • ./tools/check/failpoint-go-test.sh pkg/store/driver -run '^TestSetDefaultAndOptions$' -count=1
    • go test -tags=intest,deadlock ./pkg/sessionctx/variable/tests -run '^TestSlowLogFormat$' -count=1
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Add keyspace observability configuration for NextGen deployments.

Summary by CodeRabbit

  • New Features

    • Added keyspace observability: map keyspace metadata to Prometheus labels, slow-log and statement-log fields (Starter mode)
    • Emit configured keyspace fields into slow logs and statement summaries
    • Capture activation metadata during standby startup and apply it to observability when in Starter mode
    • Metrics registration now includes configured keyspace observability labels
  • Tests

    • Added extensive tests for config validation, resolution, startup behavior, slow-log and statement-log integration

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-triage-completed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jun 1, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented Jun 1, 2026

@zeminzhou I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 1, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented Jun 1, 2026

Hi @zeminzhou. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds keyspace-observability config, resolves activation metadata into metric labels and log fields at Starter startup (TiKV only), refactors metrics initialization to include those labels, and injects observability fields into slow logs and statement-summary JSON with accompanying tests and BUILD updates.

Changes

Keyspace Observability Configuration and Integration

Layer / File(s) Summary
Keyspace schema, validation, resolution, and tests
pkg/config/keyspace_observability.go, pkg/config/config.go, pkg/config/config_test.go, pkg/config/config.toml.nextgen.example, pkg/config/BUILD.bazel
Adds KeyspaceObservability types, Valid(), ResolveKeyspaceObservability(), cloning/getters, TOML example, and unit tests for valid/invalid configs and starter-mode gating.
Standby activation metadata transport and tests
pkg/standby/standby.go, pkg/standby/standby_test.go, pkg/standby/BUILD.bazel
Adds optional metadata on ActivateRequest, LoadKeyspaceController.ActivationMetadata() defensive copy, structured activation logging, and tests validating metadata propagation and activate endpoint validation.
PD client options and metrics-label propagation
pkg/store/driver/tikv_driver.go, pkg/store/driver/config_test.go, pkg/store/driver/BUILD.bazel
Extracts PD client option construction into pdClientOptions(), conditionally appends metrics label options when const labels exist, and adds tests/BUILD deps to assert MetricsLabels flow.
Metrics initialization refactor
pkg/util/metricsutil/common.go, pkg/util/metricsutil/common_test.go, pkg/util/metricsutil/BUILD.bazel
Refactors RegisterMetrics/RegisterMetricsForBR to delegate to a new registerMetrics() that clones and merges const labels (including keyspace observability labels), sets keyspace_id when available, and updates tests.
Starter-mode startup integration
cmd/tidb-server/main.go, cmd/tidb-server/main_test.go, cmd/tidb-server/BUILD.bazel
Extracts standby activation metadata during server startup and calls prepareKeyspaceObservabilityForStarter for TiKV stores to resolve and persist KeyspaceObservabilityValues (including keyspace_name); adds tests and adjusts server test shard_count.
Slow-log field injection
pkg/sessionctx/variable/slow_log.go, pkg/sessionctx/variable/tests/session_test.go
Injects configured slow-log fields into SlowLogFormat output and updates tests to assert presence of injected fields.
Statement-summary JSON enrichment
pkg/util/stmtsummary/v2/logger.go, pkg/util/stmtsummary/v2/record_test.go, pkg/util/stmtsummary/v2/BUILD.bazel
Marshals statement-summary records via helper that optionally embeds additional_fields from KeyspaceObservability, logs marshal errors, and adds tests to verify enrichment.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • pingcap/tidb#68751: Revert PR removing keyspace-observability types and related wiring that overlap this change.
  • pingcap/tidb#68384: Modifies Starter/standby activation startup paths overlapping activation metadata handling.
  • pingcap/tidb#68404: Related work on keyspace observability startup flow and config resolution.

Suggested labels

ok-to-test, approved, lgtm

Suggested reviewers

  • D3Hunter
  • yudongusa
  • XuHuaiyu
  • ChangRui-Ryan

🐰 A rabbit sniffs the activation air,
Nibbles metadata, seeds labels with care,
Metrics hum where keyspaces grow,
Logs whisper fields in tidy rows,
Starter springs to life — observability fair!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title '*: add keyspace observability labels' clearly summarizes the main change—adding keyspace observability labels across the codebase.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description follows the required template with all critical sections completed: issue reference, problem summary, detailed change explanation, checked unit tests, affected documentation areas, and release note.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 1, 2026

Codecov Report

❌ Patch coverage is 11.23596% with 158 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.2834%. Comparing base (a9add5c) to head (591a7e5).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #68836        +/-   ##
================================================
- Coverage   76.3085%   75.2834%   -1.0252%     
================================================
  Files          2041       2026        -15     
  Lines        563262     568400      +5138     
================================================
- Hits         429817     427911      -1906     
- Misses       132529     140420      +7891     
+ Partials        916         69       -847     
Flag Coverage Δ
integration 41.2931% <11.2359%> (+1.5146%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4610% <ø> (ø)
parser ∅ <ø> (∅)
br 49.8023% <ø> (-13.0287%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 1, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 1, 2026

@ChangRui-Ryan: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ChangRui-Ryan
Copy link
Copy Markdown
Contributor

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Jun 1, 2026

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

…yspace-observability-master

# Conflicts:
#	cmd/tidb-server/main.go
#	pkg/config/BUILD.bazel
#	pkg/config/config.go
#	pkg/standby/BUILD.bazel
#	pkg/standby/standby_test.go
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 2, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ChangRui-Ryan, D3Hunter, yudongusa

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added approved lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jun 2, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 2, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-06-01 07:16:05.83750248 +0000 UTC m=+166666.907819860: ☑️ agreed by D3Hunter.
  • 2026-06-02 03:41:59.292335459 +0000 UTC m=+240220.362652839: ☑️ agreed by yudongusa.

@ti-chi-bot ti-chi-bot Bot merged commit 1677c91 into pingcap:master Jun 2, 2026
34 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants