stats: add new two-levels symbol table by wbpcode · Pull Request #45359 · envoyproxy/envoy

wbpcode · 2026-05-30T12:51:44Z

Commit Message: stats: add new two-levels symbol table
Additional Description:

In Envoy stats system, we use encoded symbol to represent a string to avoid frequent string copy/memory allocation/de-allocation. Reference counter and global symbol maps are used to manage mapping of symbols and origin strings. This design provides great memory efficiency.

However, for every symbol’s encoding (string to symbol) and decoding (symbol to string), we need to access the global symbol map under guard of a global lock. In practice, we have lots of solutions to avoid frequent encoding.

But when we need to sink the stats out, we still need to decode all symbols of stats which make the sinking take lots of CPU when there are lots of stats.

If we observe all these stats, we will found lots of symbols/strings are well known like upstream_rq. If we use a const array/map to manage this type of symbols, then we needn’t to access the global symbol map when decoding/encoding this type of symbols. We call this type of symbol as well known symbols.

For the well known symbols:

no need to manage the reference of related string tokens because these strings are built in const value.
no need to take the global lock for these symbols because the mapping is const based on const array and map.

decoding of these symbols will be done by accessing array based on index.

Benchmark	Before	After	Latency Δ	Speedup	Throughput
`bmK8sClusterStatsToString` (decode → string)	104 ns	62.3 ns	−40.1%	1.67×	6.77 M/s → 11.31 M/s (+67.2%)
`bmK8sClusterStatsEncode` (string → StatName)	234 ns	184 ns	−21.4%	1.27×	2.97 M/s → 3.83 M/s (+29.0%)

Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>

nezdolik · 2026-06-02T09:17:18Z

cc @jmarantz (not sure how much are you involved with Envoy project nowadays but this could be smth interesting for you)

krinkinmu

I think gcc test failures might not be a random flake here, so it might make sense to bump the memory limits in this PR or before merging this PR. Other than that, I follow the implementation and reasoning behind it and it looks sensible to me.

krinkinmu · 2026-06-02T09:29:21Z

   * @return Symbol the encoded string.
   */
-  Symbol toSymbol(absl::string_view sv) ABSL_EXCLUSIVE_LOCKS_REQUIRED(lock_);
+  Symbol toUnWellKnownSymbol(absl::string_view sv, size_t hash)


nit: toNotWellKnownSymbol sounds a bit more natural to me, though I'm not a native speaker. Though, I would suggest to consider calling it toUnknownSymbol and drop the "well" part here all together.

agree: toUnknownSymbol would be more idiomatic.

jmarantz

before I review more deeply, can you provide a higher level description of the problem you are trying to fix?

My thinking during design was that typically we do not have to do named lookups in the hot-path.

Also note there are constructs like StatNameSet that might serve this kind of role without adding complexity to the infrrastructure.

jmarantz · 2026-06-02T12:19:54Z

+
+// The well-known token table lives in well_known_tokens.h. WellKnownTokensNum is
+// the number of reserved symbol slots (including the unused index-0 sentinel).
+static constexpr Symbol WellKnownTokensNum = sizeof(kWellKnownTokens) / sizeof(kWellKnownTokens[0]);


there's an ARRAYSIZE macro of some sort.

jmarantz · 2026-06-02T12:20:45Z

+// Lock-free encode-side lookup: token string -> symbol ID. Built once at first
+// access; safe for concurrent readers afterward.
+const absl::flat_hash_map<absl::string_view, Symbol>& wellKnownEncodeMap() {
+  static const auto* m = []() {


I don't think this pattern is needed; you can just construct this in the symbol table.

It is safe for concurrent readers but will require some sort of synchronization created by the compiler, so it's better just to make this part of the context, since we do have a context for this.

Sounds great.

jmarantz · 2026-06-02T12:23:45Z

   * @return Symbol the encoded string.
   */
-  Symbol toSymbol(absl::string_view sv) ABSL_EXCLUSIVE_LOCKS_REQUIRED(lock_);
+  Symbol toUnWellKnownSymbol(absl::string_view sv, size_t hash)


agree: toUnknownSymbol would be more idiomatic.

jmarantz · 2026-06-02T12:25:12Z

+      }
+      ++count;
+    }
+    return count;


I think this somewhat complicated function should be exposed, perhaps as a private static member funciton, so it can be unit-tested and fuzzed.

I will give it a try

wbpcode · 2026-06-02T16:29:16Z

before I review more deeply, can you provide a higher level description of the problem you are trying to fix?

My thinking during design was that typically we do not have to do named lookups in the hot-path.

Also note there are constructs like StatNameSet that might serve this kind of role without adding complexity to the infrrastructure.

Hello, Josh. Thanks for your review and i revised the description and hope that make sense for you.

In short sentences, this PR try to make the stats sinking more efficient. With this PR, when sinking stats, lots of symbols/StatNames’ decoding could be done by simple array accessing without lock. And the benchmark showed the improvement.

jmarantz · 2026-06-02T17:40:24Z

Oh this is to solve the stat sinking bottleneck. I don't think this is going to materially impact that, the bottleneck is on decrementing the refcount of the whole stat, not managing the StatNames.

PTAL @ the benchmarks in #43958 -- this is a full solution to this problem (though not mergeable yet due to possible races). Ping on slack to chat.

jmarantz · 2026-06-02T17:43:08Z

I think this is an interesting idea overall. I'm somewaht against adding the complexity and feel like this makes faster something we should try to do less of anyway.

But if we can prove a higher level benefit that isn't solved some other way then I think this sort of thing can be done. But StattNameSet is more general.

wbpcode · 2026-06-03T01:11:21Z

PTAL @ the benchmarks in #43958 -- this is a full solution to this problem (though not mergeable yet due to possible races). Ping on slack to chat.

#43958 is great change and I guess we should pick it up again? But not, there is no conflict between this PR and #43958.

In actual practice, there three phases for sinking: 1. create snapshot. 2. serialize/decode the symbols/StatNames to plain string. 3. send serialized stats out.

The #43958 is used to optimized the phase 1, and this PR is mainly try to optimize the phase 2. We should try to land both.

About the complexity, I think the complexity is trivial because this two-levels registry/map (static table + dynamic table) is pretty straightforward. IMO, It's CPU wastage to maintain the reference counter and take the lock for mapping the cluster/upstream_rq/... to symbols (or mapping symbols to cluster/upstram_rq/...). For these type of well known string tokens, assign a fixed symbol is simple and efficient.

jmarantz · 2026-06-03T02:09:02Z

In my work in the past, I don't think the stat name serializaiton was a bottleneck in practice. In sinks, for example, we typically only need to serialize stats with used()==true which removes most of them. If you are really sinking all the stats in a large system you are going to overwhelm your time-series DB.

Before moving forward with this PR I think we should get flame-graph of a sink in action. I have looked at some recently which is why I started on #43958 . Otherwise you are adding complexity without much benefit. IIUC from the comments, this PR also causes memory bloat requiring a regold, which is a breaking change for some applications.

We will pick that up again but there's some related PRs that need to be cleaned up and submitted first, to avoid a race exposed by that PR.

wbpcode · 2026-06-03T03:48:05Z

cc @jmarantz In my previous experience, for a long-running scale environment, the used()=true helps but there are still lots of stats to serialize. Although this part overhead is not dominant, I still think it should be good to optimize the serialization also with very trivial complexity (but we may have different thoughts on this complexity.) And I also want to avoid fragmented/unnecessary string allocation/copy and so on in following PRs.

But anyway, I won't stick with this for now. We can delay this to future, after you picked the lowest/valuable fruit first.

wbpcode added 2 commits May 30, 2026 12:36

new benchmark test

b6ba521

Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>

stats: add new two-levels symbol table

198c86b

Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>

nezdolik assigned krinkinmu Jun 2, 2026

krinkinmu approved these changes Jun 2, 2026

View reviewed changes

jmarantz reviewed Jun 2, 2026

View reviewed changes

wbpcode marked this pull request as draft June 3, 2026 03:34

Conversation

wbpcode commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nezdolik commented Jun 2, 2026

Uh oh!

krinkinmu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmarantz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wbpcode commented Jun 2, 2026

Uh oh!

jmarantz commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmarantz commented Jun 2, 2026

Uh oh!

wbpcode commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmarantz commented Jun 3, 2026

Uh oh!

wbpcode commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wbpcode commented May 30, 2026 •

edited

Loading

jmarantz commented Jun 2, 2026 •

edited

Loading

wbpcode commented Jun 3, 2026 •

edited

Loading