Skip to content

*: introduce meta service group#68818

Open
ystaticy wants to merge 33 commits into
pingcap:masterfrom
ystaticy:metaservice_group_introduce
Open

*: introduce meta service group#68818
ystaticy wants to merge 33 commits into
pingcap:masterfrom
ystaticy:metaservice_group_introduce

Conversation

@ystaticy
Copy link
Copy Markdown
Contributor

@ystaticy ystaticy commented May 31, 2026

What problem does this PR solve?

Issue Number: ref #68338

Problem Summary:

What changed and how does it work?

What changed:

  • Added metadata models and helpers to build a KeyspaceMetaServiceGroup and a top-level Info object from keyspace metadata plus global meta service addresses.
  • Added logic to read meta_service_group_id and meta_service_group_addrs from keyspace config, and fall back to the global meta service group when no per-keyspace group is configured.
  • Added an etcd/PD-backed ServiceClient implementation to discover PD service addresses, PD HTTP addresses, and the current leader address.
  • Added URL parsing helpers to normalize http, https, and unix endpoints, including default ports and IPv6 handling.
  • Added tests for keyspace group resolution, address sanitization, leader discovery, leader-not-found behavior, and URL parsing.

How it works:

  1. GetKeyspaceMetaServiceGroup inspects keyspace metadata and decides which meta service group the keyspace should use.
  2. If the keyspace carries both a group ID and group addresses, the code trims and filters the address list, then returns a dedicated KeyspaceMetaServiceGroup.
  3. If no keyspace-specific group is configured, the code falls back to the global group (GroupID = "0") and uses the provided global meta service addresses.
  4. GetMetaServiceInfo combines the selected keyspace group, global meta service addresses, and PD addresses into one Info structure for upper layers to consume.
  5. The etcd-based client uses PD GetAllMembers() to collect PD endpoints and uses etcd Status() on each endpoint to identify the current leader.
  6. URL parsing is centralized so callers get normalized host:port or HTTP-form addresses without duplicating protocol-specific handling.

Scope note:

This PR only adds the reusable metaservice package and its tests. It does not yet wire the new logic into broader TiDB runtime paths such as pkg/store, pkg/domain, or pkg/session.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • New Features

    • Added a metaservice module to manage keyspace metadata, build per-keyspace/global service groups, discover service endpoints, detect leaders, and robustly parse connection URLs.
  • Tests

    • Added unit and integration tests for address parsing, leader discovery (including leader-not-found cases), and keyspace metadata handling and defaults.
  • Chores

    • Added build and test configuration to include the new metaservice package.

Signed-off-by: ystaticy <y_static_y@sina.com>
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented May 31, 2026

@ystaticy I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 31, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 31, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign djshow832 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 31, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new pkg/metaservice package: configuration helpers, an EtcdMetaServiceClient implementation for PD/keyspace etcd discovery (with URL parsing and leader detection), Bazel build/test targets, and unit/integration tests.

Changes

MetaService package implementation

Layer / File(s) Summary
Package build configuration
pkg/metaservice/BUILD.bazel
Defines go_library and go_test Bazel targets with public visibility, importpath, dependency declarations, and test settings including short timeout, flaky flag, and shard_count.
Metaservice configuration contracts
pkg/metaservice/metamanager.go
Introduces configuration constants (GlobalGroupID, keyspace metadata keys) and exported errors (ErrGroupNotMatch, ErrNilKeyspaceMeta) plus basic types (Info, KeyspaceMetaServiceGroup).
GetKeyspaceMetaServiceGroup and GetMetaServiceInfo
pkg/metaservice/metamanager.go
Implements group selection: comma-splitting and trimming addrs, erroring when group ID present but addrs missing, falling back to global group when absent, and assembling Info; defines the ServiceClient interface.
Metaservice configuration tests
pkg/metaservice/metamanager_test.go
Unit tests for configuration parsing, validation of missing fields, fallback behavior, and Info construction with nil and populated keyspace metadata.
EtcdMetaServiceClient implementation
pkg/metaservice/etcd.go
Implements EtcdMetaServiceClient (implements ServiceClient) with PD member discovery using backoff retry, ParseURL supporting unix://, http://, https://, IPv6 normalization, host:port extraction, and PD leader detection through etcd Status calls.
EtcdMetaServiceClient tests
pkg/metaservice/etcd_test.go
Integration and unit tests with mock PD client and real etcd clusters (skipped when unix sockets unavailable), table-driven URL parsing tests, and assertions covering PD address discovery and leader detection including leader-not-found behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 A metaservice hops out, tidy and spry,
parsing URLs beneath an IPv6 sky,
it queries PD with patient, steady beats,
finds leaders or reports when no leader meets,
keyspace groups and globals lined up neat.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description lacks the required Problem Summary section and the Issue Number reference is incomplete without proper closure/reference format. Add a concise Problem Summary explaining the issue being solved and update Issue Number to use 'close' or 'ref' with the issue number (e.g., 'close #68338' or 'ref #68338'). The template requires explicit issue linking.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'introduce meta service group' is concise and clearly reflects the main objective, which is to add the metaservice package with group management functionality.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 31, 2026

Hi @ystaticy. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 31, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.0036%. Comparing base (0c98567) to head (5cb8ed4).
⚠️ Report is 27 commits behind head on master.

⚠️ Current head 5cb8ed4 differs from pull request most recent head 07777fa

Please upload reports for the commit 07777fa to get more accurate results.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #68818        +/-   ##
================================================
- Coverage   76.3025%   75.0036%   -1.2990%     
================================================
  Files          2041       2026        -15     
  Lines        563407     574020     +10613     
================================================
+ Hits         429894     430536       +642     
- Misses       132597     143055     +10458     
+ Partials        916        429       -487     
Components Coverage Δ
dumpling 60.4679% <ø> (ø)
parser ∅ <ø> (∅)
br 49.5229% <ø> (-13.2996%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
pkg/metaservice/etcd.go (1)

56-62: ⚖️ Poor tradeoff

Add caller-scoped context to PD address fetching retry loop.
GetPDAddrs and GetPDHttpAddrs call GetPDHostPorts(context.Background(), ...), and the ServiceClient interface methods (GetPDAddrs(), GetPDHttpAddrs()) don’t accept a context.Context, so caller cancellation/deadlines can’t stop the PD request/retry. Thread ctx through the ServiceClient interface (e.g., GetPDAddrs(ctx) / GetPDHttpAddrs(ctx)) and pass it to GetPDHostPorts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/metaservice/etcd.go` around lines 56 - 62, The current GetPDAddrs and
GetPDHttpAddrs use context.Background() so callers cannot cancel PD host-port
retries; update the ServiceClient interface to accept a context.Context (change
method signatures to GetPDAddrs(ctx context.Context) and GetPDHttpAddrs(ctx
context.Context)), then modify EtcdMetaServiceClient.GetPDAddrs and
EtcdMetaServiceClient.GetPDHttpAddrs to accept and forward the caller-provided
ctx into GetPDHostPorts(ctx, n.pdCli, ...), and update all callers to pass their
context through to these interface methods so cancellations/deadlines are
honored.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/metaservice/BUILD.bazel`:
- Around line 29-30: The BUILD rule currently marks tests as flaky (flaky =
True) and shards them (shard_count = 5); remove the flaky flag (delete or set
flaky = False) and revert shard_count to a single shard while fixing the
underlying non-determinism in the tests (ensure proper setup/teardown,
isolation, use of mocks for unit tests, or move true integration tests to a
separate suite), or if you must keep them separate, move the target out of the
main test suite into an explicitly labeled unstable/integration suite instead of
marking flaky = True.

In `@pkg/metaservice/etcd.go`:
- Around line 64-90: GetPDLeaderAddrs currently returns an empty leaderAddr and
zap.Skip() when no member is identified as leader, which makes callers unable to
distinguish "no leader found" from a normal success; update GetPDLeaderAddrs to
return a non-skip zap.Field when leaderAddr == "" (and there were no call
errors) that explicitly signals "no leader found" (for example a descriptive
zap.String or zap.Any field), and when there are collected errors keep them in
the field (combine errMsgMap with the "no leader found" message) so callers can
reliably detect and log the absence of a leader; refer to GetPDLeaderAddrs,
leaderAddr and errMsgMap to locate where to set the errMsgField.

In `@pkg/metaservice/metamanager.go`:
- Around line 64-75: The code currently splits KeyspaceMetaGroupAddrsKey into
addrs with strings.Split which leaves empty strings (e.g., "" or trailing
commas) and can produce an invalid KeyspaceMetaServiceGroup with an empty
address; update the logic in the block that reads keyspaceMeta.Config (where
groupID and addrs are set and KeyspaceMetaServiceGroup is constructed) to trim
whitespace from addrsStr, split on commas, filter out any empty/blank entries
(e.g., after strings.TrimSpace), and if the resulting slice is empty return
ErrGroupNotMatch instead of creating a group with empty addresses; ensure the
log call still logs the validated KeyspaceMetaServiceGroup only when there is at
least one valid address.
- Around line 121-122: Update the GetPDLeaderAddrs signature to return (string,
error) instead of (string, zap.Field) in the interface and all implementations
(e.g., the implementation in pkg/metaservice/etcd.go); ensure implementations
return a non-nil error when no leader is found (i.e., when leaderAddr is empty)
rather than returning zap.Skip(), and return the resolved address with nil error
on success; update any callers to check the returned error and handle it instead
of relying on empty-string checks.

---

Nitpick comments:
In `@pkg/metaservice/etcd.go`:
- Around line 56-62: The current GetPDAddrs and GetPDHttpAddrs use
context.Background() so callers cannot cancel PD host-port retries; update the
ServiceClient interface to accept a context.Context (change method signatures to
GetPDAddrs(ctx context.Context) and GetPDHttpAddrs(ctx context.Context)), then
modify EtcdMetaServiceClient.GetPDAddrs and EtcdMetaServiceClient.GetPDHttpAddrs
to accept and forward the caller-provided ctx into GetPDHostPorts(ctx, n.pdCli,
...), and update all callers to pass their context through to these interface
methods so cancellations/deadlines are honored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: feee9952-6f70-44ca-8acb-2f36dfa4518d

📥 Commits

Reviewing files that changed from the base of the PR and between 50bad68 and 2f33c99.

📒 Files selected for processing (5)
  • pkg/metaservice/BUILD.bazel
  • pkg/metaservice/etcd.go
  • pkg/metaservice/etcd_test.go
  • pkg/metaservice/metamanager.go
  • pkg/metaservice/metamanager_test.go

Comment thread pkg/metaservice/BUILD.bazel Outdated
Comment thread pkg/metaservice/etcd.go Outdated
Comment thread pkg/metaservice/metamanager.go Outdated
Comment thread pkg/metaservice/metamanager.go Outdated
@ystaticy ystaticy changed the title introduce meta service group *: introduce meta service group Jun 1, 2026
ystaticy added 4 commits June 1, 2026 08:32
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/metaservice/etcd.go (2)

109-125: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fail fast when discovery yields no usable PD address.

If GetAllMembers succeeds but every member has an empty ClientUrls list, this returns nil error with len(pdAddrs) == 0. That makes discovery look successful and pushes the failure to downstream connection code instead of surfacing it here.

🔧 Suggested guard
 		for _, member := range members.GetMembers() {
 			if len(member.ClientUrls) > 0 {
 				prefix, host, port, err := ParseURL(member.ClientUrls[0])
 				if err != nil {
 					return nil, fmt.Errorf("parse client url from pd members %q: %w", member.ClientUrls[0], err)
 				}
 				var pdAddr string
 				if hasPrefix {
 					pdAddr = prefix + host + ":" + port // http://ip:port
 				} else {
 					pdAddr = host + ":" + port // ip:port
 				}
 
 				pdAddrs = append(pdAddrs, pdAddr)
 			}
 		}
+		if len(pdAddrs) == 0 {
+			return nil, errors.New("no PD client URLs returned by GetAllMembers")
+		}
 		return pdAddrs, nil
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/metaservice/etcd.go` around lines 109 - 125, The loop building pdAddrs
from members.GetMembers() can return an empty slice with a nil error when all
members have empty ClientUrls; after the loop in the function that calls
ParseURL and appends to pdAddrs (the block handling member.ClientUrls and
building pdAddr), add a guard that if len(pdAddrs) == 0 then return nil,
fmt.Errorf("no usable PD client URLs discovered from members") (or similar
descriptive error). Ensure this check is placed before returning pdAddrs so
discovery failures are surfaced immediately.

64-90: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use PD’s leader discovery instead of etcd Status()
GetPDLeaderAddrs currently derives the “PD leader” by calling etcd Status() and checking whether status.Leader == status.Header.MemberId; that comparison only identifies the etcd raft leader for the queried endpoint (while Header.MemberId is just the responding etcd member), not the current PD leader. PD’s client already provides GetLeaderURL() for the PD leader (returns "" until synced), so GetPDLeaderAddrs should use n.pdCli.GetLeaderURL() and normalize it via ParseURL (return a helpful error when the leader URL is empty or can’t be parsed).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/metaservice/etcd.go` around lines 64 - 90, GetPDLeaderAddrs currently
uses etcd.Status to infer PD leader which is incorrect; replace that logic to
call n.pdCli.GetLeaderURL(), check that the returned URL is non-empty, parse it
with ParseURL to derive the address, and return a clear error if the leader URL
is empty or ParseURL fails. Update GetPDLeaderAddrs to stop iterating
n.KeyspaceEtcdCli.Endpoints()/Status(), instead call n.pdCli.GetLeaderURL(),
handle the "" case with an informative error, normalize the leader URL via
ParseURL and return the parsed address or the parse error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@pkg/metaservice/etcd.go`:
- Around line 109-125: The loop building pdAddrs from members.GetMembers() can
return an empty slice with a nil error when all members have empty ClientUrls;
after the loop in the function that calls ParseURL and appends to pdAddrs (the
block handling member.ClientUrls and building pdAddr), add a guard that if
len(pdAddrs) == 0 then return nil, fmt.Errorf("no usable PD client URLs
discovered from members") (or similar descriptive error). Ensure this check is
placed before returning pdAddrs so discovery failures are surfaced immediately.
- Around line 64-90: GetPDLeaderAddrs currently uses etcd.Status to infer PD
leader which is incorrect; replace that logic to call n.pdCli.GetLeaderURL(),
check that the returned URL is non-empty, parse it with ParseURL to derive the
address, and return a clear error if the leader URL is empty or ParseURL fails.
Update GetPDLeaderAddrs to stop iterating
n.KeyspaceEtcdCli.Endpoints()/Status(), instead call n.pdCli.GetLeaderURL(),
handle the "" case with an informative error, normalize the leader URL via
ParseURL and return the parsed address or the parse error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f1818bcf-5863-4a9e-8d87-a960af900afd

📥 Commits

Reviewing files that changed from the base of the PR and between d2b1e76 and aadbf10.

📒 Files selected for processing (1)
  • pkg/metaservice/etcd.go

Signed-off-by: ystaticy <y_static_y@sina.com>
Comment thread pkg/metaservice/etcd.go Outdated
Comment thread pkg/metaservice/etcd.go Outdated
Comment thread pkg/metaservice/etcd.go Outdated
Comment thread pkg/metaservice/etcd.go Outdated
Comment thread pkg/metaservice/etcd.go Outdated
Comment thread pkg/metaservice/metamanager.go Outdated
GlobalGroupID = "0"
// KeyspaceMetaGroupIDKey is a keyspace meta config key name,
// the value of this key is meta service group id for this keyspace.
KeyspaceMetaGroupIDKey = "meta_service_group_id"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are TSO group later, shouldn't we name this key as keyspace......group_id

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, "Meta service group" and meta_service_group_id are clear concepts. We should keep this naming consistent rather than introducing new names.

Comment thread pkg/metaservice/metamanager.go Outdated
Comment thread pkg/metaservice/metamanager.go Outdated
Copy link
Copy Markdown
Contributor

@ChangRui-Ryan ChangRui-Ryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore me

Comment thread pkg/metaservice/etcd.go Outdated
Comment thread pkg/metaservice/etcd.go Outdated
Comment thread pkg/metaservice/etcd.go Outdated
Comment thread pkg/metaservice/etcd.go Outdated
Comment thread pkg/metaservice/etcd.go
Comment thread pkg/metaservice/etcd.go Outdated
Comment thread pkg/metaservice/metamanager.go Outdated
Comment thread pkg/metaservice/BUILD.bazel Outdated
Signed-off-by: ystaticy <y_static_y@sina.com>
ystaticy added 7 commits June 1, 2026 22:35
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
@ystaticy
Copy link
Copy Markdown
Contributor Author

ystaticy commented Jun 2, 2026

/retest-required

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Jun 2, 2026

@ystaticy: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ystaticy added 2 commits June 2, 2026 11:20
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Comment thread pkg/metaservice/etcd_test.go Outdated
Comment thread pkg/metaservice/metamanager.go Outdated
Comment thread pkg/metaservice/metamanager.go Outdated
Comment thread pkg/metaservice/metamanager.go Outdated
Comment thread pkg/metaservice/metamanager.go Outdated
ystaticy added 7 commits June 2, 2026 15:52
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
@ystaticy
Copy link
Copy Markdown
Contributor Author

ystaticy commented Jun 2, 2026

/retest-required

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Jun 2, 2026

@ystaticy: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ystaticy added 4 commits June 2, 2026 17:59
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Copy link
Copy Markdown
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

Comment thread pkg/metaservice/etcd.go Outdated
Comment thread pkg/metaservice/etcd.go
return "", "", "", fmt.Errorf("invalid URL prefix")
}

host, port, err = parseHostPort(u.Host)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we will JoinHostPort in caller all the time, why split here?

Comment thread pkg/metaservice/metamanager.go Outdated
// Info includes the global meta service address and the TiDB meta service group info.
type Info struct {
PDAddrs []string
GlobalMetaServiceAddrs []string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GlobalMetaServiceAddrs []string
GlobalAddrs []string

}

// GetKeyspaceMetaServiceGroup return keyspace meta service group.
func GetKeyspaceMetaServiceGroup(keyspaceMeta *keyspacepb.KeyspaceMeta, globalMetaAddrs []string) (*Group, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func GetKeyspaceMetaServiceGroup(keyspaceMeta *keyspacepb.KeyspaceMeta, globalMetaAddrs []string) (*Group, error) {
func GetGroup(keyspaceMeta *keyspacepb.KeyspaceMeta, globalMetaAddrs []string) (*Group, error) {

}

// GetMetaServiceInfo return meta service info.
func GetMetaServiceInfo(keyspaceMeta *keyspacepb.KeyspaceMeta, globalMetaAddrs []string, pdAddrs []string) (*Info, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func GetMetaServiceInfo(keyspaceMeta *keyspacepb.KeyspaceMeta, globalMetaAddrs []string, pdAddrs []string) (*Info, error) {
func GetInfo(keyspaceMeta *keyspacepb.KeyspaceMeta, globalMetaAddrs []string, pdAddrs []string) (*Info, error) {

Signed-off-by: ystaticy <y_static_y@sina.com>
Copy link
Copy Markdown
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

  • Total findings: 9
  • Inline comments: 9
  • Summary-only findings (no inline anchor): 0
Findings (highest risk first)

⚠️ [Major] (2)

  1. Keyspace meta-service group metadata can become invalid routing state (pkg/metaservice/metamanager.go:65)
  2. PD member endpoint extraction now has two canonical implementations (pkg/metaservice/etcd.go:62 and pkg/store/driver/tikv_driver.go:300)

🟡 [Minor] (5)

  1. PD address discovery drops repeated client URLs (pkg/metaservice/etcd.go:78)
  2. Address helper name hides whether it returns host-ports or URLs (pkg/metaservice/etcd.go:63)
  3. Public metaservice surface is exported before production usage establishes the API (pkg/metaservice/metamanager.go:27, pkg/metaservice/metamanager.go:45, pkg/metaservice/metamanager.go:130, and pkg/metaservice/etcd.go:62)
  4. Test-only sentinel error leaks into the public contract with unclear semantics (pkg/metaservice/metamanager.go:39)
  5. Unit test starts etcd even though the exercised path does not use it (pkg/metaservice/etcd_test.go:58)

🧹 [Nit] (2)

  1. Fallback comment references the wrong config key (pkg/metaservice/metamanager.go:91)
  2. Meta-service storage TODO lacks an owner or removal condition (pkg/metaservice/metamanager.go:64)

}
var group *Group
// TODO: Refactor meta service group storage format by moving it from config to dedicated fields in keyspace meta.
if val, ok := keyspaceMeta.Config[MetaServiceGroupIDKey]; ok {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [Major] Keyspace meta-service group metadata can become invalid routing state

Why
GetKeyspaceMetaServiceGroup treats the presence of meta_service_group_id as enough to enter the keyspace-specific routing branch, copies the raw value into GroupID, and accepts any remaining non-empty address string after trimming blanks. It also does not reject the inverse partial config where meta_service_group_addrs exists without a group ID.

Scope
pkg/metaservice/metamanager.go:65

Risk if unchanged
Malformed persisted keyspace metadata can either route a keyspace with an empty or nonnumeric group ID, report corrupt address values as a successful group lookup, or silently fall back to the global group while ignoring configured meta-service addresses. That can send requests to the wrong meta service or defer the failure to later connection/setup code with less precise diagnostics.

Evidence
The function assigns groupID := val after checking only key presence, returns Group{GroupID: groupID, Addrs: addrs} once len(addrs) > 0, and otherwise falls through to GlobalGroupID when addresses are present without MetaServiceGroupIDKey. Existing tests cover valid numeric strings, blank addresses, missing addresses after an ID, and no keys, but not blank IDs, nonnumeric IDs, addresses-only metadata, or malformed address values. Adjacent PD keyspace group handling parses IDs with strconv.ParseUint before mutating group membership.

Change request
Please validate this metadata as an all-or-nothing pair at this boundary: trim and require a non-empty unsigned decimal group ID, reject addresses without a group ID, and either validate the accepted address format or document it explicitly. Add negative tests for blank ID, nonnumeric ID, addresses-only metadata, and malformed address values.

Comment thread pkg/metaservice/etcd.go
return addrs, err
}

// GetPDAddrs returns the PD addresses from PD client.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [Major] PD member endpoint extraction now has two canonical implementations

Why
The new metaservice package reimplements PD member retry and ClientUrls parsing instead of sharing the existing TiKV store path that already derives client endpoints from the same PD members.

Scope
pkg/metaservice/etcd.go:62 and pkg/store/driver/tikv_driver.go:300

Risk if unchanged
The two paths can drift on retry context, config override behavior, URL validation, scheme handling, and empty-member behavior. A future meta-service caller may get different endpoints than the existing store helper for the same PD members.

Evidence
GetPDHostPorts builds a tikv Backoffer, calls pdClient.GetAllMembers, parses member.ClientUrls[0], and appends host or host-with-scheme. tikvStore.EtcdAddrs already builds a Backoffer, calls s.GetPDClient().GetAllMembers, parses the same member.ClientUrls[0], and appends u.Host.

Change request
Can we keep one canonical helper for PD member endpoint extraction and have both call sites use it?

Comment thread pkg/metaservice/etcd.go
Comment thread pkg/metaservice/etcd.go
}

// GetPDAddrs returns the PD addresses from PD client.
func GetPDAddrs(ctx context.Context, pdClient pd.Client, withSchema bool) ([]string, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [Minor] Address helper name hides whether it returns host-ports or URLs

Why
The helper is named GetPDHostPorts, but the withSchema flag changes the result from bare host:port values into URL strings with http:// or https:// prefixes. The flag name also says schema, which is an overloaded TiDB domain term and is not the URL component being controlled.

Scope
pkg/metaservice/etcd.go:63

Risk if unchanged
Callers can pass the boolean with the wrong expectation and route a URL where a host-port is required, or the reverse, because the name does not make the contract visible at call sites.

Evidence
GetPDAddrs calls GetPDHostPorts(ctx, n.pdCli, false) for host:port, while GetPDHttpAddrs calls GetPDHostPorts(ctx, n.pdCli, true) and line 87 prepends the parsed URL prefix. Existing TiDB naming uses scheme for this concept, for example GetPDsAddrWithoutScheme in pkg/util/util.go:344.

Change request
Prefer splitting the helper by return shape, or rename the flag to includeScheme/withScheme and update the function/comment so the public contract clearly says whether it returns bare host-ports or URLs.

Comment thread pkg/metaservice/metamanager.go
Comment thread pkg/metaservice/metamanager.go
Comment thread pkg/metaservice/etcd_test.go
return group, nil
}

// If keyspace don't have KeyspaceMetaGroupIDKey, then set keyspace meta service as global meta service.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 [Nit] Fallback comment references the wrong config key

Scope
pkg/metaservice/metamanager.go:91

Change request
Please update this comment to refer to MetaServiceGroupIDKey, or describe the fallback condition without naming a nonexistent KeyspaceMetaGroupIDKey.

Comment thread pkg/metaservice/metamanager.go
Co-authored-by: D3Hunter <jujj603@gmail.com>
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 2, 2026

@ystaticy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-build-next-gen 07777fa link true /test pull-build-next-gen
idc-jenkins-ci-tidb/check_dev 07777fa link true /test check-dev
idc-jenkins-ci-tidb/build 07777fa link true /test build
pull-unit-test-next-gen 07777fa link true /test pull-unit-test-next-gen
idc-jenkins-ci-tidb/unit-test 07777fa link true /test unit-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants