Skip to content

stats: add new two-levels symbol table#45359

Draft
wbpcode wants to merge 2 commits into
envoyproxy:mainfrom
wbpcode:dev-two-level-sysmbol-table
Draft

stats: add new two-levels symbol table#45359
wbpcode wants to merge 2 commits into
envoyproxy:mainfrom
wbpcode:dev-two-level-sysmbol-table

Conversation

@wbpcode
Copy link
Copy Markdown
Member

@wbpcode wbpcode commented May 30, 2026

Commit Message: stats: add new two-levels symbol table
Additional Description:

In Envoy stats system, we use encoded symbol to represent a string to avoid frequent string copy/memory allocation/de-allocation. Reference counter and global symbol maps are used to manage mapping of symbols and origin strings. This design provides great memory efficiency.

However, for every symbol’s encoding (string to symbol) and decoding (symbol to string), we need to access the global symbol map under guard of a global lock. In practice, we have lots of solutions to avoid frequent encoding.

But when we need to sink the stats out, we still need to decode all symbols of stats which make the sinking take lots of CPU when there are lots of stats.

If we observe all these stats, we will found lots of symbols/strings are well known like upstream_rq. If we use a const array/map to manage this type of symbols, then we needn’t to access the global symbol map when decoding/encoding this type of symbols. We call this type of symbol as well known symbols.

For the well known symbols:

  • no need to manage the reference of related string tokens because these strings are built in const value.

  • no need to take the global lock for these symbols because the mapping is const based on const array and map.

  • decoding of these symbols will be done by accessing array based on index.

    Benchmark Before After Latency Δ Speedup Throughput
    bmK8sClusterStatsToString (decode → string) 104 ns 62.3 ns −40.1% 1.67× 6.77 M/s → 11.31 M/s (+67.2%)
    bmK8sClusterStatsEncode (string → StatName) 234 ns 184 ns −21.4% 1.27× 2.97 M/s → 3.83 M/s (+29.0%)

Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

wbpcode added 2 commits May 30, 2026 12:36
Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>
Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>
@nezdolik
Copy link
Copy Markdown
Member

nezdolik commented Jun 2, 2026

cc @jmarantz (not sure how much are you involved with Envoy project nowadays but this could be smth interesting for you)

Copy link
Copy Markdown
Contributor

@krinkinmu krinkinmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think gcc test failures might not be a random flake here, so it might make sense to bump the memory limits in this PR or before merging this PR. Other than that, I follow the implementation and reasoning behind it and it looks sensible to me.

* @return Symbol the encoded string.
*/
Symbol toSymbol(absl::string_view sv) ABSL_EXCLUSIVE_LOCKS_REQUIRED(lock_);
Symbol toUnWellKnownSymbol(absl::string_view sv, size_t hash)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: toNotWellKnownSymbol sounds a bit more natural to me, though I'm not a native speaker. Though, I would suggest to consider calling it toUnknownSymbol and drop the "well" part here all together.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree: toUnknownSymbol would be more idiomatic.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before I review more deeply, can you provide a higher level description of the problem you are trying to fix?

My thinking during design was that typically we do not have to do named lookups in the hot-path.

Also note there are constructs like StatNameSet that might serve this kind of role without adding complexity to the infrrastructure.


// The well-known token table lives in well_known_tokens.h. WellKnownTokensNum is
// the number of reserved symbol slots (including the unused index-0 sentinel).
static constexpr Symbol WellKnownTokensNum = sizeof(kWellKnownTokens) / sizeof(kWellKnownTokens[0]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's an ARRAYSIZE macro of some sort.

// Lock-free encode-side lookup: token string -> symbol ID. Built once at first
// access; safe for concurrent readers afterward.
const absl::flat_hash_map<absl::string_view, Symbol>& wellKnownEncodeMap() {
static const auto* m = []() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this pattern is needed; you can just construct this in the symbol table.

It is safe for concurrent readers but will require some sort of synchronization created by the compiler, so it's better just to make this part of the context, since we do have a context for this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great.

* @return Symbol the encoded string.
*/
Symbol toSymbol(absl::string_view sv) ABSL_EXCLUSIVE_LOCKS_REQUIRED(lock_);
Symbol toUnWellKnownSymbol(absl::string_view sv, size_t hash)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree: toUnknownSymbol would be more idiomatic.

}
++count;
}
return count;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this somewhat complicated function should be exposed, perhaps as a private static member funciton, so it can be unit-tested and fuzzed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will give it a try

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Jun 2, 2026

before I review more deeply, can you provide a higher level description of the problem you are trying to fix?

My thinking during design was that typically we do not have to do named lookups in the hot-path.

Also note there are constructs like StatNameSet that might serve this kind of role without adding complexity to the infrrastructure.

Hello, Josh. Thanks for your review and i revised the description and hope that make sense for you.

In short sentences, this PR try to make the stats sinking more efficient. With this PR, when sinking stats, lots of symbols/StatNames’ decoding could be done by simple array accessing without lock. And the benchmark showed the improvement.

@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Jun 2, 2026

Oh this is to solve the stat sinking bottleneck. I don't think this is going to materially impact that, the bottleneck is on decrementing the refcount of the whole stat, not managing the StatNames.

PTAL @ the benchmarks in #43958 -- this is a full solution to this problem (though not mergeable yet due to possible races). Ping on slack to chat.

@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Jun 2, 2026

I think this is an interesting idea overall. I'm somewaht against adding the complexity and feel like this makes faster something we should try to do less of anyway.

But if we can prove a higher level benefit that isn't solved some other way then I think this sort of thing can be done. But StattNameSet is more general.

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Jun 3, 2026

PTAL @ the benchmarks in #43958 -- this is a full solution to this problem (though not mergeable yet due to possible races). Ping on slack to chat.

#43958 is great change and I guess we should pick it up again? But not, there is no conflict between this PR and #43958.

In actual practice, there three phases for sinking: 1. create snapshot. 2. serialize/decode the symbols/StatNames to plain string. 3. send serialized stats out.

The #43958 is used to optimized the phase 1, and this PR is mainly try to optimize the phase 2. We should try to land both.


About the complexity, I think the complexity is trivial because this two-levels registry/map (static table + dynamic table) is pretty straightforward. IMO, It's CPU wastage to maintain the reference counter and take the lock for mapping the cluster/upstream_rq/... to symbols (or mapping symbols to cluster/upstram_rq/...). For these type of well known string tokens, assign a fixed symbol is simple and efficient.

@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Jun 3, 2026

In my work in the past, I don't think the stat name serializaiton was a bottleneck in practice. In sinks, for example, we typically only need to serialize stats with used()==true which removes most of them. If you are really sinking all the stats in a large system you are going to overwhelm your time-series DB.

Before moving forward with this PR I think we should get flame-graph of a sink in action. I have looked at some recently which is why I started on #43958 . Otherwise you are adding complexity without much benefit. IIUC from the comments, this PR also causes memory bloat requiring a regold, which is a breaking change for some applications.

We will pick that up again but there's some related PRs that need to be cleaned up and submitted first, to avoid a race exposed by that PR.

@wbpcode wbpcode marked this pull request as draft June 3, 2026 03:34
@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Jun 3, 2026

cc @jmarantz In my previous experience, for a long-running scale environment, the used()=true helps but there are still lots of stats to serialize. Although this part overhead is not dominant, I still think it should be good to optimize the serialization also with very trivial complexity (but we may have different thoughts on this complexity.) And I also want to avoid fragmented/unnecessary string allocation/copy and so on in following PRs.

But anyway, I won't stick with this for now. We can delay this to future, after you picked the lowest/valuable fruit first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants