sessionctx: add cluster-wide read-only status variable#68853
Conversation
|
@bb7133 I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughAdds cluster-wide read-only plumbing and a read-only SQL-visible variable by registering vardef callbacks and reporter, extending ServerInfo with read-only fields, persisting per-instance status via Syncer, aggregating effective read-only across instances, and wiring tests and sysvar handlers. ChangesCluster-wide Read-Only Status Variable
Sequence Diagram(s)sequenceDiagram
participant User
participant SysVar as tidb_is_read_only
participant Vardef
participant Checker as getClusterReadOnlyStatus
participant Syncer
participant Instances as LiveTiDBInstances
User->>SysVar: SELECT @@global.tidb_is_read_only
SysVar->>Vardef: GetClusterReadOnlyStatus(ctx)
Vardef->>Checker: invoke checker(ctx)
Checker->>Syncer: GetAllServerInfo()
Syncer->>Instances: gather DynamicInfo per instance
Instances-->>Syncer: map[string]ServerInfo{DynamicInfo{TiDBEffectiveReadOnly}}
Syncer-->>Checker: serverInfoMap
Checker->>Checker: clusterReadOnlyStatusFromServerInfo()
alt all live instances EffectiveReadOnly==true
Checker-->>Vardef: true
else any missing/false
Checker-->>Vardef: false
end
Vardef-->>SysVar: ON/OFF
SysVar-->>User: ON/OFF
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/domain/serverinfo/syncer.go (1)
241-241: 💤 Low valueGood optimization using cached info.
When etcd is unavailable, this now returns the cached
infodirectly instead of reconstructing viagetServerInfo(). This is more efficient and consistent with the already-loaded state.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/domain/serverinfo/syncer.go` at line 241, The current change should return the cached ServerInfo from the allInfo map instead of reconstructing it via getServerInfo() when etcd is unavailable; update the code path that currently calls getServerInfo() to fetch and return allInfo[info.ID] (the cached value) and ensure callers of the sync routine use that cached ServerInfo, referencing the allInfo map and the getServerInfo function to locate the logic to modify.pkg/domain/infosync/info.go (1)
351-366: 💤 Low valueSilent failure when info syncer unavailable is intentional.
updateServerReadOnlyStatusreturnsnilwhengetGlobalInfoSyncer()fails (line 358-360). This is correct: if the syncer isn't initialized (e.g., during startup or in tests), we can't publish status to etcd, but we shouldn't fail theSEToperation. The local variable state will still be updated via the direct atomic stores invardef.Optional: add debug logging
If you want observability for this path:
func updateServerReadOnlyStatus(ctx context.Context) error { is, err := getGlobalInfoSyncer() if err != nil { + logutil.BgLogger().Debug("info syncer not available, skipping read-only status publish", zap.Error(err)) return nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/domain/infosync/info.go` around lines 351 - 366, The current updateServerReadOnlyStatus function intentionally returns nil when getGlobalInfoSyncer() fails (silent on uninitialized syncer); preserve that behavior but improve observability by logging the error at debug/info level instead of dropping it: in updateServerReadOnlyStatus, after calling getGlobalInfoSyncer(), if err != nil call the package logger (or processLogger) to emit a contextual debug message including the error and the fact the info syncer is uninitialized, then return nil; leave the subsequent call to is.svrInfoSyncer.UpdateServerReadOnlyStatus and the use of vardef.RestrictedReadOnly.Load()/vardef.VarTiDBSuperReadOnly.Load() unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/sessionctx/variable/sysvar.go`:
- Around line 84-88: The build is failing because reportTiDBReadOnlyStatus calls
vardef.ReportReadOnlyStatus and vardef.GetClusterReadOnlyStatus which do not
exist in the current plumbing; update the call sites (e.g., in
reportTiDBReadOnlyStatus and the other occurrences around lines shown) to use
the actual exported API names from the plumbing layer or export these symbols
from vardef: either change vardef.ReportReadOnlyStatus -> the real function name
provided by the plumbing (and likewise for GetClusterReadOnlyStatus), or add
matching ReportReadOnlyStatus/GetClusterReadOnlyStatus wrappers in the vardef
package that forward to the plumbing implementation so the references in
reportTiDBReadOnlyStatus compile.
---
Nitpick comments:
In `@pkg/domain/infosync/info.go`:
- Around line 351-366: The current updateServerReadOnlyStatus function
intentionally returns nil when getGlobalInfoSyncer() fails (silent on
uninitialized syncer); preserve that behavior but improve observability by
logging the error at debug/info level instead of dropping it: in
updateServerReadOnlyStatus, after calling getGlobalInfoSyncer(), if err != nil
call the package logger (or processLogger) to emit a contextual debug message
including the error and the fact the info syncer is uninitialized, then return
nil; leave the subsequent call to is.svrInfoSyncer.UpdateServerReadOnlyStatus
and the use of
vardef.RestrictedReadOnly.Load()/vardef.VarTiDBSuperReadOnly.Load() unchanged.
In `@pkg/domain/serverinfo/syncer.go`:
- Line 241: The current change should return the cached ServerInfo from the
allInfo map instead of reconstructing it via getServerInfo() when etcd is
unavailable; update the code path that currently calls getServerInfo() to fetch
and return allInfo[info.ID] (the cached value) and ensure callers of the sync
routine use that cached ServerInfo, referencing the allInfo map and the
getServerInfo function to locate the logic to modify.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: ab2e1649-8775-4a17-8a63-b05967e64199
📒 Files selected for processing (8)
pkg/domain/infosync/info.gopkg/domain/infosync/info_test.gopkg/domain/serverinfo/info.gopkg/domain/serverinfo/syncer.gopkg/sessionctx/vardef/readonly.gopkg/sessionctx/vardef/tidb_vars.gopkg/sessionctx/variable/sysvar.gopkg/sessionctx/variable/sysvar_test.go
| func reportTiDBReadOnlyStatus(ctx context.Context) { | ||
| if err := vardef.ReportReadOnlyStatus(ctx); err != nil { | ||
| logutil.BgLogger().Warn("update TiDB read-only status failed", zap.Error(err)) | ||
| } | ||
| } |
There was a problem hiding this comment.
Fix the vardef API mismatch before merge.
Crossbuild is failing here because vardef.ReportReadOnlyStatus and vardef.GetClusterReadOnlyStatus are undefined. This PR will not compile until the plumbing layer exports these symbols or these call sites are updated to the actual API names.
Also applies to: 1014-1020
🧰 Tools
🪛 GitHub Check: Bazel Crossbuild (ubuntu-24.04-arm)
[failure] 85-85:
undefined: vardef.ReportReadOnlyStatus
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/sessionctx/variable/sysvar.go` around lines 84 - 88, The build is failing
because reportTiDBReadOnlyStatus calls vardef.ReportReadOnlyStatus and
vardef.GetClusterReadOnlyStatus which do not exist in the current plumbing;
update the call sites (e.g., in reportTiDBReadOnlyStatus and the other
occurrences around lines shown) to use the actual exported API names from the
plumbing layer or export these symbols from vardef: either change
vardef.ReportReadOnlyStatus -> the real function name provided by the plumbing
(and likewise for GetClusterReadOnlyStatus), or add matching
ReportReadOnlyStatus/GetClusterReadOnlyStatus wrappers in the vardef package
that forward to the plumbing implementation so the references in
reportTiDBReadOnlyStatus compile.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #68853 +/- ##
================================================
+ Coverage 76.3085% 76.8546% +0.5461%
================================================
Files 2041 2054 +13
Lines 563262 571445 +8183
================================================
+ Hits 429817 439182 +9365
+ Misses 132529 130515 -2014
- Partials 916 1748 +832
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/domain/serverinfo/syncer.go (1)
184-214:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winConfirm read-only and label updates can't race on
s.info.
UpdateServerReadOnlyStatusandUpdateServerLabelboth perform an unsynchronized read‑modify‑write on the atomics.infopointer (clone currentDynamicInfo→ mutate →setDynamicServerInfo/Clone+store).Syncerhas no mutex. If a label change and a read-only-status change run concurrently, one of them clones a snapshot taken before the other's store, so the later store can silently drop the other's update (lost update). The atomic pointer only guarantees a torn-free swap, not RMW atomicity.If these callers are already serialized upstream (e.g. both flow through a single config/sysvar apply path), this is moot. Otherwise consider guarding the RMW sections with a small
sync.Mutex.#!/bin/bash # Find call sites of the two RMW methods to check whether they can run concurrently. rg -nP -C3 '\b(UpdateServerReadOnlyStatus|UpdateServerLabel)\s*\(' --type=go -g '!**/serverinfo/syncer.go' # Check whether Syncer already has any lock field guarding s.info. ast-grep --pattern 'type Syncer struct { $$$ }'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/domain/serverinfo/syncer.go` around lines 184 - 214, UpdateServerReadOnlyStatus and UpdateServerLabel perform unsynchronized read-modify-write on the atomic s.info (via cloneDynamicServerInfo, setDynamicServerInfo and Clone+store) which can cause lost updates; add a small sync.Mutex (e.g., infoMu) to the Syncer struct and use it to serialize the RMW critical sections in both UpdateServerReadOnlyStatus and UpdateServerLabel (lock before cloning/mutating and unlock after setDynamicServerInfo/store) so updates cannot race; ensure you only hold the mutex for the minimal span covering clone -> mutate -> set/store to avoid contention.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@pkg/domain/serverinfo/syncer.go`:
- Around line 184-214: UpdateServerReadOnlyStatus and UpdateServerLabel perform
unsynchronized read-modify-write on the atomic s.info (via
cloneDynamicServerInfo, setDynamicServerInfo and Clone+store) which can cause
lost updates; add a small sync.Mutex (e.g., infoMu) to the Syncer struct and use
it to serialize the RMW critical sections in both UpdateServerReadOnlyStatus and
UpdateServerLabel (lock before cloning/mutating and unlock after
setDynamicServerInfo/store) so updates cannot race; ensure you only hold the
mutex for the minimal span covering clone -> mutate -> set/store to avoid
contention.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 294e6206-c2fd-44ab-9ffb-01a52c65860c
📒 Files selected for processing (2)
pkg/domain/serverinfo/syncer.gopkg/domain/serverinfo/syncer_test.go
|
/retest |
1 similar comment
|
/retest |
| } | ||
|
|
||
| // UpdateServerReadOnlyStatus updates the local TiDB read-only status in the info syncer. | ||
| func UpdateServerReadOnlyStatus(ctx context.Context) error { |
There was a problem hiding this comment.
There is race here. UpdateServerLabel and UpdateServerReadOnlyStatus both clone DynamicInfo, change one field group, then write the whole struct back.
Bug case: label update clones read_only=false; read-only update writes read_only=true; label update then writes its old snapshot with new labels, restoring read_only=false. After that, tidb_is_read_only may return OFF even though this TiDB is already read-only.
What problem does this PR solve?
Issue Number: close #68852
Problem Summary:
TiDB has
tidb_restricted_read_onlyandtidb_super_read_only, but there is no SQL-visible status that tells operators whether every live TiDB instance has applied effective read-only state.What changed and how does it work?
This PR adds read-only global sysvar
tidb_is_read_only.tidb_is_read_onlyreturnsONonly when every live TiDB instance publishes effective read-only state through the existing server info sync path:If there is no live TiDB info, or any live TiDB reports
effective_read_only = false, the aggregate returnsOFF.Implementation details:
tidb_is_read_onlyas a read-only global sysvar.serverinfo.DynamicInfo.tidb_restricted_read_onlyortidb_super_read_onlychanges.Check List
Tests
Side effects
tidb_restricted_read_only/tidb_super_read_onlyenforcementRelease note
Summary by CodeRabbit
New Features
Tests
Chores