*: introduce meta service group#68818
Conversation
Signed-off-by: ystaticy <y_static_y@sina.com>
|
@ystaticy I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new pkg/metaservice package: configuration helpers, an EtcdMetaServiceClient implementation for PD/keyspace etcd discovery (with URL parsing and leader detection), Bazel build/test targets, and unit/integration tests. ChangesMetaService package implementation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @ystaticy. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov Report✅ All modified and coverable lines are covered by tests. Please upload reports for the commit 07777fa to get more accurate results. Additional details and impacted files@@ Coverage Diff @@
## master #68818 +/- ##
================================================
- Coverage 76.3025% 75.0036% -1.2990%
================================================
Files 2041 2026 -15
Lines 563407 574020 +10613
================================================
+ Hits 429894 430536 +642
- Misses 132597 143055 +10458
+ Partials 916 429 -487
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
pkg/metaservice/etcd.go (1)
56-62: ⚖️ Poor tradeoffAdd caller-scoped context to PD address fetching retry loop.
GetPDAddrsandGetPDHttpAddrscallGetPDHostPorts(context.Background(), ...), and theServiceClientinterface methods (GetPDAddrs(),GetPDHttpAddrs()) don’t accept acontext.Context, so caller cancellation/deadlines can’t stop the PD request/retry. Threadctxthrough theServiceClientinterface (e.g.,GetPDAddrs(ctx)/GetPDHttpAddrs(ctx)) and pass it toGetPDHostPorts.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/metaservice/etcd.go` around lines 56 - 62, The current GetPDAddrs and GetPDHttpAddrs use context.Background() so callers cannot cancel PD host-port retries; update the ServiceClient interface to accept a context.Context (change method signatures to GetPDAddrs(ctx context.Context) and GetPDHttpAddrs(ctx context.Context)), then modify EtcdMetaServiceClient.GetPDAddrs and EtcdMetaServiceClient.GetPDHttpAddrs to accept and forward the caller-provided ctx into GetPDHostPorts(ctx, n.pdCli, ...), and update all callers to pass their context through to these interface methods so cancellations/deadlines are honored.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/metaservice/BUILD.bazel`:
- Around line 29-30: The BUILD rule currently marks tests as flaky (flaky =
True) and shards them (shard_count = 5); remove the flaky flag (delete or set
flaky = False) and revert shard_count to a single shard while fixing the
underlying non-determinism in the tests (ensure proper setup/teardown,
isolation, use of mocks for unit tests, or move true integration tests to a
separate suite), or if you must keep them separate, move the target out of the
main test suite into an explicitly labeled unstable/integration suite instead of
marking flaky = True.
In `@pkg/metaservice/etcd.go`:
- Around line 64-90: GetPDLeaderAddrs currently returns an empty leaderAddr and
zap.Skip() when no member is identified as leader, which makes callers unable to
distinguish "no leader found" from a normal success; update GetPDLeaderAddrs to
return a non-skip zap.Field when leaderAddr == "" (and there were no call
errors) that explicitly signals "no leader found" (for example a descriptive
zap.String or zap.Any field), and when there are collected errors keep them in
the field (combine errMsgMap with the "no leader found" message) so callers can
reliably detect and log the absence of a leader; refer to GetPDLeaderAddrs,
leaderAddr and errMsgMap to locate where to set the errMsgField.
In `@pkg/metaservice/metamanager.go`:
- Around line 64-75: The code currently splits KeyspaceMetaGroupAddrsKey into
addrs with strings.Split which leaves empty strings (e.g., "" or trailing
commas) and can produce an invalid KeyspaceMetaServiceGroup with an empty
address; update the logic in the block that reads keyspaceMeta.Config (where
groupID and addrs are set and KeyspaceMetaServiceGroup is constructed) to trim
whitespace from addrsStr, split on commas, filter out any empty/blank entries
(e.g., after strings.TrimSpace), and if the resulting slice is empty return
ErrGroupNotMatch instead of creating a group with empty addresses; ensure the
log call still logs the validated KeyspaceMetaServiceGroup only when there is at
least one valid address.
- Around line 121-122: Update the GetPDLeaderAddrs signature to return (string,
error) instead of (string, zap.Field) in the interface and all implementations
(e.g., the implementation in pkg/metaservice/etcd.go); ensure implementations
return a non-nil error when no leader is found (i.e., when leaderAddr is empty)
rather than returning zap.Skip(), and return the resolved address with nil error
on success; update any callers to check the returned error and handle it instead
of relying on empty-string checks.
---
Nitpick comments:
In `@pkg/metaservice/etcd.go`:
- Around line 56-62: The current GetPDAddrs and GetPDHttpAddrs use
context.Background() so callers cannot cancel PD host-port retries; update the
ServiceClient interface to accept a context.Context (change method signatures to
GetPDAddrs(ctx context.Context) and GetPDHttpAddrs(ctx context.Context)), then
modify EtcdMetaServiceClient.GetPDAddrs and EtcdMetaServiceClient.GetPDHttpAddrs
to accept and forward the caller-provided ctx into GetPDHostPorts(ctx, n.pdCli,
...), and update all callers to pass their context through to these interface
methods so cancellations/deadlines are honored.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: feee9952-6f70-44ca-8acb-2f36dfa4518d
📒 Files selected for processing (5)
pkg/metaservice/BUILD.bazelpkg/metaservice/etcd.gopkg/metaservice/etcd_test.gopkg/metaservice/metamanager.gopkg/metaservice/metamanager_test.go
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pkg/metaservice/etcd.go (2)
109-125:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFail fast when discovery yields no usable PD address.
If
GetAllMemberssucceeds but every member has an emptyClientUrlslist, this returnsnilerror withlen(pdAddrs) == 0. That makes discovery look successful and pushes the failure to downstream connection code instead of surfacing it here.🔧 Suggested guard
for _, member := range members.GetMembers() { if len(member.ClientUrls) > 0 { prefix, host, port, err := ParseURL(member.ClientUrls[0]) if err != nil { return nil, fmt.Errorf("parse client url from pd members %q: %w", member.ClientUrls[0], err) } var pdAddr string if hasPrefix { pdAddr = prefix + host + ":" + port // http://ip:port } else { pdAddr = host + ":" + port // ip:port } pdAddrs = append(pdAddrs, pdAddr) } } + if len(pdAddrs) == 0 { + return nil, errors.New("no PD client URLs returned by GetAllMembers") + } return pdAddrs, nil } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/metaservice/etcd.go` around lines 109 - 125, The loop building pdAddrs from members.GetMembers() can return an empty slice with a nil error when all members have empty ClientUrls; after the loop in the function that calls ParseURL and appends to pdAddrs (the block handling member.ClientUrls and building pdAddr), add a guard that if len(pdAddrs) == 0 then return nil, fmt.Errorf("no usable PD client URLs discovered from members") (or similar descriptive error). Ensure this check is placed before returning pdAddrs so discovery failures are surfaced immediately.
64-90:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse PD’s leader discovery instead of etcd
Status()
GetPDLeaderAddrscurrently derives the “PD leader” by calling etcdStatus()and checking whetherstatus.Leader == status.Header.MemberId; that comparison only identifies the etcd raft leader for the queried endpoint (whileHeader.MemberIdis just the responding etcd member), not the current PD leader. PD’s client already providesGetLeaderURL()for the PD leader (returns""until synced), soGetPDLeaderAddrsshould usen.pdCli.GetLeaderURL()and normalize it viaParseURL(return a helpful error when the leader URL is empty or can’t be parsed).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/metaservice/etcd.go` around lines 64 - 90, GetPDLeaderAddrs currently uses etcd.Status to infer PD leader which is incorrect; replace that logic to call n.pdCli.GetLeaderURL(), check that the returned URL is non-empty, parse it with ParseURL to derive the address, and return a clear error if the leader URL is empty or ParseURL fails. Update GetPDLeaderAddrs to stop iterating n.KeyspaceEtcdCli.Endpoints()/Status(), instead call n.pdCli.GetLeaderURL(), handle the "" case with an informative error, normalize the leader URL via ParseURL and return the parsed address or the parse error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@pkg/metaservice/etcd.go`:
- Around line 109-125: The loop building pdAddrs from members.GetMembers() can
return an empty slice with a nil error when all members have empty ClientUrls;
after the loop in the function that calls ParseURL and appends to pdAddrs (the
block handling member.ClientUrls and building pdAddr), add a guard that if
len(pdAddrs) == 0 then return nil, fmt.Errorf("no usable PD client URLs
discovered from members") (or similar descriptive error). Ensure this check is
placed before returning pdAddrs so discovery failures are surfaced immediately.
- Around line 64-90: GetPDLeaderAddrs currently uses etcd.Status to infer PD
leader which is incorrect; replace that logic to call n.pdCli.GetLeaderURL(),
check that the returned URL is non-empty, parse it with ParseURL to derive the
address, and return a clear error if the leader URL is empty or ParseURL fails.
Update GetPDLeaderAddrs to stop iterating
n.KeyspaceEtcdCli.Endpoints()/Status(), instead call n.pdCli.GetLeaderURL(),
handle the "" case with an informative error, normalize the leader URL via
ParseURL and return the parsed address or the parse error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f1818bcf-5863-4a9e-8d87-a960af900afd
📒 Files selected for processing (1)
pkg/metaservice/etcd.go
Signed-off-by: ystaticy <y_static_y@sina.com>
| GlobalGroupID = "0" | ||
| // KeyspaceMetaGroupIDKey is a keyspace meta config key name, | ||
| // the value of this key is meta service group id for this keyspace. | ||
| KeyspaceMetaGroupIDKey = "meta_service_group_id" |
There was a problem hiding this comment.
if there are TSO group later, shouldn't we name this key as keyspace......group_id
There was a problem hiding this comment.
In my opinion, "Meta service group" and meta_service_group_id are clear concepts. We should keep this naming consistent rather than introducing new names.
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
|
/retest-required |
|
@ystaticy: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
|
/retest-required |
|
@ystaticy: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
| return "", "", "", fmt.Errorf("invalid URL prefix") | ||
| } | ||
|
|
||
| host, port, err = parseHostPort(u.Host) |
There was a problem hiding this comment.
if we will JoinHostPort in caller all the time, why split here?
| // Info includes the global meta service address and the TiDB meta service group info. | ||
| type Info struct { | ||
| PDAddrs []string | ||
| GlobalMetaServiceAddrs []string |
There was a problem hiding this comment.
| GlobalMetaServiceAddrs []string | |
| GlobalAddrs []string |
| } | ||
|
|
||
| // GetKeyspaceMetaServiceGroup return keyspace meta service group. | ||
| func GetKeyspaceMetaServiceGroup(keyspaceMeta *keyspacepb.KeyspaceMeta, globalMetaAddrs []string) (*Group, error) { |
There was a problem hiding this comment.
| func GetKeyspaceMetaServiceGroup(keyspaceMeta *keyspacepb.KeyspaceMeta, globalMetaAddrs []string) (*Group, error) { | |
| func GetGroup(keyspaceMeta *keyspacepb.KeyspaceMeta, globalMetaAddrs []string) (*Group, error) { |
| } | ||
|
|
||
| // GetMetaServiceInfo return meta service info. | ||
| func GetMetaServiceInfo(keyspaceMeta *keyspacepb.KeyspaceMeta, globalMetaAddrs []string, pdAddrs []string) (*Info, error) { |
There was a problem hiding this comment.
| func GetMetaServiceInfo(keyspaceMeta *keyspacepb.KeyspaceMeta, globalMetaAddrs []string, pdAddrs []string) (*Info, error) { | |
| func GetInfo(keyspaceMeta *keyspacepb.KeyspaceMeta, globalMetaAddrs []string, pdAddrs []string) (*Info, error) { |
Signed-off-by: ystaticy <y_static_y@sina.com>
D3Hunter
left a comment
There was a problem hiding this comment.
Summary
- Total findings: 9
- Inline comments: 9
- Summary-only findings (no inline anchor): 0
Findings (highest risk first)
⚠️ [Major] (2)
- Keyspace meta-service group metadata can become invalid routing state (
pkg/metaservice/metamanager.go:65) - PD member endpoint extraction now has two canonical implementations (
pkg/metaservice/etcd.go:62 and pkg/store/driver/tikv_driver.go:300)
🟡 [Minor] (5)
- PD address discovery drops repeated client URLs (
pkg/metaservice/etcd.go:78) - Address helper name hides whether it returns host-ports or URLs (
pkg/metaservice/etcd.go:63) - Public metaservice surface is exported before production usage establishes the API (
pkg/metaservice/metamanager.go:27, pkg/metaservice/metamanager.go:45, pkg/metaservice/metamanager.go:130, and pkg/metaservice/etcd.go:62) - Test-only sentinel error leaks into the public contract with unclear semantics (
pkg/metaservice/metamanager.go:39) - Unit test starts etcd even though the exercised path does not use it (
pkg/metaservice/etcd_test.go:58)
🧹 [Nit] (2)
- Fallback comment references the wrong config key (
pkg/metaservice/metamanager.go:91) - Meta-service storage TODO lacks an owner or removal condition (
pkg/metaservice/metamanager.go:64)
| } | ||
| var group *Group | ||
| // TODO: Refactor meta service group storage format by moving it from config to dedicated fields in keyspace meta. | ||
| if val, ok := keyspaceMeta.Config[MetaServiceGroupIDKey]; ok { |
There was a problem hiding this comment.
⚠️ [Major] Keyspace meta-service group metadata can become invalid routing state
Why
GetKeyspaceMetaServiceGroup treats the presence of meta_service_group_id as enough to enter the keyspace-specific routing branch, copies the raw value into GroupID, and accepts any remaining non-empty address string after trimming blanks. It also does not reject the inverse partial config where meta_service_group_addrs exists without a group ID.
Scope
pkg/metaservice/metamanager.go:65
Risk if unchanged
Malformed persisted keyspace metadata can either route a keyspace with an empty or nonnumeric group ID, report corrupt address values as a successful group lookup, or silently fall back to the global group while ignoring configured meta-service addresses. That can send requests to the wrong meta service or defer the failure to later connection/setup code with less precise diagnostics.
Evidence
The function assigns groupID := val after checking only key presence, returns Group{GroupID: groupID, Addrs: addrs} once len(addrs) > 0, and otherwise falls through to GlobalGroupID when addresses are present without MetaServiceGroupIDKey. Existing tests cover valid numeric strings, blank addresses, missing addresses after an ID, and no keys, but not blank IDs, nonnumeric IDs, addresses-only metadata, or malformed address values. Adjacent PD keyspace group handling parses IDs with strconv.ParseUint before mutating group membership.
Change request
Please validate this metadata as an all-or-nothing pair at this boundary: trim and require a non-empty unsigned decimal group ID, reject addresses without a group ID, and either validate the accepted address format or document it explicitly. Add negative tests for blank ID, nonnumeric ID, addresses-only metadata, and malformed address values.
| return addrs, err | ||
| } | ||
|
|
||
| // GetPDAddrs returns the PD addresses from PD client. |
There was a problem hiding this comment.
⚠️ [Major] PD member endpoint extraction now has two canonical implementations
Why
The new metaservice package reimplements PD member retry and ClientUrls parsing instead of sharing the existing TiKV store path that already derives client endpoints from the same PD members.
Scope
pkg/metaservice/etcd.go:62 and pkg/store/driver/tikv_driver.go:300
Risk if unchanged
The two paths can drift on retry context, config override behavior, URL validation, scheme handling, and empty-member behavior. A future meta-service caller may get different endpoints than the existing store helper for the same PD members.
Evidence
GetPDHostPorts builds a tikv Backoffer, calls pdClient.GetAllMembers, parses member.ClientUrls[0], and appends host or host-with-scheme. tikvStore.EtcdAddrs already builds a Backoffer, calls s.GetPDClient().GetAllMembers, parses the same member.ClientUrls[0], and appends u.Host.
Change request
Can we keep one canonical helper for PD member endpoint extraction and have both call sites use it?
| } | ||
|
|
||
| // GetPDAddrs returns the PD addresses from PD client. | ||
| func GetPDAddrs(ctx context.Context, pdClient pd.Client, withSchema bool) ([]string, error) { |
There was a problem hiding this comment.
🟡 [Minor] Address helper name hides whether it returns host-ports or URLs
Why
The helper is named GetPDHostPorts, but the withSchema flag changes the result from bare host:port values into URL strings with http:// or https:// prefixes. The flag name also says schema, which is an overloaded TiDB domain term and is not the URL component being controlled.
Scope
pkg/metaservice/etcd.go:63
Risk if unchanged
Callers can pass the boolean with the wrong expectation and route a URL where a host-port is required, or the reverse, because the name does not make the contract visible at call sites.
Evidence
GetPDAddrs calls GetPDHostPorts(ctx, n.pdCli, false) for host:port, while GetPDHttpAddrs calls GetPDHostPorts(ctx, n.pdCli, true) and line 87 prepends the parsed URL prefix. Existing TiDB naming uses scheme for this concept, for example GetPDsAddrWithoutScheme in pkg/util/util.go:344.
Change request
Prefer splitting the helper by return shape, or rename the flag to includeScheme/withScheme and update the function/comment so the public contract clearly says whether it returns bare host-ports or URLs.
| return group, nil | ||
| } | ||
|
|
||
| // If keyspace don't have KeyspaceMetaGroupIDKey, then set keyspace meta service as global meta service. |
There was a problem hiding this comment.
🧹 [Nit] Fallback comment references the wrong config key
Scope
pkg/metaservice/metamanager.go:91
Change request
Please update this comment to refer to MetaServiceGroupIDKey, or describe the fallback condition without naming a nonexistent KeyspaceMetaGroupIDKey.
Co-authored-by: D3Hunter <jujj603@gmail.com>
|
@ystaticy: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: ref #68338
Problem Summary:
What changed and how does it work?
What changed:
KeyspaceMetaServiceGroupand a top-levelInfoobject from keyspace metadata plus global meta service addresses.meta_service_group_idandmeta_service_group_addrsfrom keyspace config, and fall back to the global meta service group when no per-keyspace group is configured.ServiceClientimplementation to discover PD service addresses, PD HTTP addresses, and the current leader address.http,https, andunixendpoints, including default ports and IPv6 handling.How it works:
GetKeyspaceMetaServiceGroupinspects keyspace metadata and decides which meta service group the keyspace should use.KeyspaceMetaServiceGroup.GroupID = "0") and uses the provided global meta service addresses.GetMetaServiceInfocombines the selected keyspace group, global meta service addresses, and PD addresses into oneInfostructure for upper layers to consume.GetAllMembers()to collect PD endpoints and uses etcdStatus()on each endpoint to identify the current leader.Scope note:
This PR only adds the reusable metaservice package and its tests. It does not yet wire the new logic into broader TiDB runtime paths such as
pkg/store,pkg/domain, orpkg/session.Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
New Features
Tests
Chores