feat(electric-telemetry): add process_subtype attribute for supervisor/erlang/logger_olp granularity#4397
feat(electric-telemetry): add process_subtype attribute for supervisor/erlang/logger_olp granularity#4397erik-the-implementer wants to merge 7 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4397 +/- ##
===========================================
+ Coverage 37.04% 69.37% +32.32%
===========================================
Files 217 15 -202
Lines 17094 591 -16503
Branches 5762 0 -5762
===========================================
- Hits 6333 410 -5923
+ Misses 10746 181 -10565
+ Partials 15 0 -15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Claude Code ReviewSummary Iteration 4. Retracting the critical issue raised in iteration 3 — it was based on a commit ( What is Working Well
Issues Found Critical (Must Fix): None. Retracting the iteration-3 Important (Should Fix): None. Suggestions (Nice to Have): Mixed
Stale subtype on Issue Conformance The PR description accurately reflects the implementation: additive Previous Review Status Iteration 3 critical issue was incorrect. It claimed a commit Addressed since iteration 1:
Still open (all non-blocking):
Review iteration: 4 | 2026-06-01 |
Claude Code ReviewSummaryIteration 5. One new commit since iteration 4 ( What's Working Well
Issues FoundCritical (Must Fix)None. Important (Should Fix)None. Suggestions (Nice to Have)Two
|
Adds a new low-cardinality `process_subtype` attribute alongside the
existing `process_type` on all telemetry events that today carry it
(`vm.monitor.long_{gc,schedule,message_queue}`, `process.memory`,
`process.bin_memory`).
For the three coarse `process_type` buckets that previously hid most
of the signal during overload, `process_subtype` is derived as:
* `:supervisor` -> registered name, else first atom in $ancestors
* `:erlang` -> registered name, else initial_call MFA string
* `:logger_olp` -> registered name (handler id)
For every other `process_type` value, `process_subtype` is `nil`.
The existing `process_type` taxonomy is unchanged, so Honeycomb boards
and alerts that group by it continue to work; `process_subtype` adds
a finer-grained drill-down without exploding cardinality.
Refs electric-sql/alco-agent-tasks#46.
term() is correct but uninformative — in practice the type is always atom() | binary() (atoms cover :dead, :unknown, module atoms, and atom labels; binaries cover the string-label case). Helps dialyzer downstream.
…ut nil is_atom(nil) is true, so the previous clause order (`nil -> nil` before the atom guard) was load-bearing — dropping it would silently turn into `"nil"`. Rewrite the guard to match on `is_atom(name) and not is_nil(name)` so the clause stands on its own.
…_subtype/1 Existing tests cover proc_type/1 returning :dead for an exited process, but proc_type_and_subtype/1 and proc_subtype/1 weren't exercised for that case. Implementation relies on Process.info/2 returning nil and Access on nil cascading nils through every helper; lock that contract down so a future refactor of info/1 doesn't silently change the answer.
1bd5c76 to
fef04de
Compare
3c9935c to
d780505
Compare
There was a problem hiding this comment.
proc_type already comes from a variety of places (process label or if not initial_module) so it seems natural to extend that to check other places if the result we get back is not helpful. for example, I don't think it's every useful to have erlang as the proc_type and we could fall back to what you have as the subtype in that case. same for supervisor.
The strong argument for a proc_sub_type would come from {"logger_olp", handler id} if there's an infinite amount of handler_ids, and if it's useful to group by logger_olp.
So I can potentially see the need for proc_subtype, but personally I wouldn't use it for erlang and supervisor because it's not particularly useful to group by erlang or supervisor and there should only be a limited number of the names we'd replace it with.
Or maybe we shouldn't have subtype at all. What are the handler ids? Are they limited and readable?
Summary
Adds a new low-cardinality
process_subtypeattribute alongside the existingprocess_typeon allelectric-telemetryevents that today carry it:vm.monitor.long_gc,vm.monitor.long_schedule,vm.monitor.long_message_queue,process.memory,process.bin_memory.For the three coarse
process_typebuckets that hide the most signal during overload (per recent investigations into long-mailbox spikes),process_subtypeis derived from cheap process introspection:process_type = "supervisor"→ registered name; else first atom in$ancestors; elsenil.process_type = "erlang"→ registered name (catches named VM helpers like:erts_dirty_process_signal_handler); elseinitial_callMFA string (e.g.":erlang.apply/2").process_type = "logger_olp"→ registered name (the handler id —default,otel_log_handler,logger_proxy, …).For all other
process_typevalues,process_subtypeisnil.The change is purely additive:
process_typevalues are unchanged, so existing Honeycomb boards and alerts that group byprocess_typecontinue to work.process_subtypegives a drill-down dimension without exploding cardinality (registered names + MFAs only; no pids, no dynamic registry tuples).Related issues
electric-sql/alco-agent-tasks#46electric-sql/alco-agent-tasks#45— long-mailbox / overload investigations where the coarseprocess_typebuckets (supervisor,erlang,logger_olp) hid the specific processes responsible.process_subtypeadds the drill-down dimension those investigations needed.Implementation notes
ElectricTelemetry.Processes.proc_type_and_subtype/1returns{type, subtype}in a singleProcess.info/2call;proc_subtype/1is also exported for callers that only want the subtype.Process.info/2now also fetches:registered_name(one extra key per call).sorted_groups/2groups by{type, subtype}so theprocess.memory/process.bin_memorymetrics break down by subtype as well.:process_subtypeis added to thetags:lists of the affectedlast_value,sum, anddistributionmetric definitions.proc_type/1is kept unchanged for backward compatibility.Test plan
packages/electric-telemetry/test/electric/telemetry/processes_test.exs).proc_type/1tests unchanged and still pass (122/122 tests pass inelectric-telemetry).@core/electric-telemetry: minor.🤖 Generated with Claude Code