Skip to content

Add allocation-regression tests and optimize hot-path rendering#305

Merged
amir20 merged 2 commits into
masterfrom
claude/memory-allocation-optimization-bpWss
Jun 7, 2026
Merged

Add allocation-regression tests and optimize hot-path rendering#305
amir20 merged 2 commits into
masterfrom
claude/memory-allocation-optimization-bpWss

Conversation

@amir20

@amir20 amir20 commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Summary

This PR adds comprehensive allocation-regression tests to measure and prevent heap allocation regressions in the rendering hot path, and optimizes the container list renderer to eliminate per-frame allocations where possible.

Key Changes

Allocation Regression Testing Infrastructure:

  • Added src/alloc_counter.rs: A test-only CountingAllocator that wraps the system allocator and counts heap allocations on the current thread using thread-local state
  • Added src/ui/alloc_tests.rs: Three allocation-regression tests that measure steady-state rendering costs:
    • Single-host render with 30 containers (≤1300 allocations)
    • Multi-host render with 30 containers (≤1400 allocations)
    • Per-container marginal allocation cost (≤42 allocations/container)
  • Registered the CountingAllocator as the global allocator in both src/lib.rs and src/main.rs for test execution

Hot-Path Rendering Optimizations:

  • Visible columns caching: Added visible_columns_cache to AppState that is refreshed in-place each frame (clear + extend), eliminating per-frame Vec allocation for the visible columns list
  • Allocation-free host detection: Implemented AppState::has_multiple_hosts() that iterates container keys with early return instead of collecting into a HashSet, performing zero heap allocations
  • Direct buffer formatting: Refactored src/ui/formatters.rs to provide write_bytes() that formats directly into an existing String buffer, avoiding intermediate allocations
  • Memory progress bar optimization: Updated create_memory_progress_bar() to format byte values directly into the result buffer instead of creating intermediate String objects

Code Organization:

  • Moved format_bytes() to test-only (marked #[cfg(test)]) since production rendering now uses write_bytes()
  • Updated src/ui/render.rs to use the new has_multiple_hosts() method instead of creating temporary HashSets
  • Updated src/ui/container_list.rs to use the cached visible columns and direct buffer formatting

Implementation Details

The allocation counter uses thread-local state (ENABLED and COUNT) to track allocations only when explicitly enabled, allowing parallel test execution without interference. The record_alloc() function is called from alloc(), alloc_zeroed(), and realloc() to capture all allocation events relevant to the hot path.

The visible columns cache reuses the same Vec across frames by clearing and extending it, which is more efficient than creating a new Vec each time. The has_multiple_hosts() method avoids allocations by using an early-return pattern instead of collecting unique hosts into a set.

These optimizations reduce the per-frame allocation count from structural overhead while maintaining code clarity and correctness.

https://claude.ai/code/session_01KdiZdKYcPugWwy2Bm8qdRo

Add a test-only allocation-counting global allocator and allocation
regression tests for the rendering hot path, then drive the per-frame
constant allocation overhead to zero.

- Replace 3x per-frame HashSet allocations (host-column detection) with a
  non-allocating AppState::has_multiple_hosts() early-return scan.
- Cache the visible-columns list in a reusable buffer (clear + extend), so
  refreshing it each frame no longer allocates a Vec.
- Format memory progress-bar byte values directly into the bar buffer via a
  new write_bytes helper, removing two intermediate String allocations per
  row.

Steady-state render (30 containers): 1334 -> 1271 allocs/frame. The
remaining ~41 allocs/container are structural to ratatui's immediate-mode
Table/Row/Cell/Text widgets. Snapshot output is unchanged.
@claude

claude Bot commented Jun 7, 2026

Copy link
Copy Markdown

Code Review

Overview: This PR adds allocation-counting test infrastructure and hot-path rendering optimizations. The intent is solid and the specific optimizations (eliminating per-frame HashSet and Vec creation) are real wins.

Issues:

  1. count_allocations leaks ENABLED=true on panic (medium)

If f() panics, ENABLED is never reset to false. Allocations during unwinding continue to be counted. Fix with a drop guard - a struct whose Drop impl sets ENABLED back to false, placed before the f() call.

  1. visible_columns_cache is pub but is an implementation detail (medium)

Callers can read the cache before refresh_visible_columns() is called this frame with no enforcement the data is fresh. Restricting to pub(crate) or combining into a single accessor that calls refresh then returns the slice would eliminate the temporal coupling.

  1. Redundant cfg(test) nesting in alloc_tests.rs (low)

src/ui/mod.rs already gates the module with cfg(test). The inner mod tests wrapper does not need its own cfg(test) attribute.

  1. Allocation thresholds may be fragile across environments (low)

The fixed bounds (<=1300, <=1400, <=42/container) were measured on a specific platform and Rust/ratatui version. They can differ on macOS vs. Linux or after ratatui updates. A comment noting the bounds are intentionally generous (~2x measured) and the platform they were established on would help future maintainers.

  1. build_state panics on empty hosts slice (low)

hosts[i % hosts.len()] divides by zero if hosts is empty. All call sites pass non-empty slices so it is not a real bug, but an assert at the top of the function would document the invariant.

Positives:

  • has_multiple_hosts() is clean and correct: early-return iteration with zero allocation is exactly the right implementation.
  • write_byte_value / write_bytes refactoring is well done. Gating format_bytes behind cfg(test) prevents accidental hot-path use going forward.
  • Dual global_allocator registration across lib.rs and main.rs is architecturally sound. cfg(test) in a dependency crate is not active when compiled as a dep, so there is no duplicate symbol issue.
  • Warm-up render before measurement is good practice that avoids counting first-frame terminal buffer allocations.
  • The PR description gives future maintainers all the context they need to understand the allocation budget.

Minor: The let _ = if ... { write!(...) } else { write!(...) }; pattern in write_byte_value is unusual. Both branches produce fmt::Result and a let _ on the outer if discards it. Restructuring as explicit if/else if chains with let _ = write!(...) per branch would be clearer to readers unfamiliar with the idiom.

Overall this is a well-scoped optimization PR. Addressing the panic-safety issue in count_allocations and narrowing the visibility of visible_columns_cache would make it ready to merge.

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Docker Image Built Successfully

docker pull ghcr.io/amir20/dtop:pr-305

- count_allocations: restore ENABLED via a drop guard so a panicking
  measurement can't leave counting active on a reused harness thread
- AppState::visible_columns_cache and refresh_visible_columns are now
  pub(crate): the cache is render-internal scratch space, not public API
- alloc_tests: drop the redundant inner #[cfg(test)] (module is already
  test-gated in ui/mod.rs); assert non-empty hosts in build_state
- alloc_tests: document the measured allocation counts, the platform/
  toolchain they were taken on, and that the bounds are deliberately tight
- formatters::write_byte_value: use per-branch `let _ = write!(...)`
  instead of `let _ = if { write! } else { write! }`

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@claude

claude Bot commented Jun 7, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR adds a CountingAllocator for allocation-regression tests and optimizes the container-list render hot path: caching the visible-columns list in AppState, replacing HashSet host-detection with an allocation-free early-return iterator, and writing byte values directly into reused buffers.

The direction is good — measuring allocation counts in the hot path is a valuable regression guard, and the optimizations are clean. A few issues need attention before merge.


Issues

1. CI stability: allocation bounds measured only on aarch64-darwin

The comment documents measurements from aarch64-darwin (1271 / 1365 / 41.5) but CI almost certainly runs on Linux x86_64. String formatting internals, Vec growth thresholds, and ratatui's widget tree can allocate differently across platforms and Rust versions. The bounds are tight (≤1300 / ≤1400 / ≤42) relative to the headroom.

Risk: These tests may fail spuriously on the first Linux CI run if the platform numbers exceed the darwin-derived bounds.

Suggestion: Re-baseline against Linux CI numbers before merge and document both, or gate the tests with #[cfg_attr(not(target_arch = "aarch64"), ignore)] until cross-platform numbers are confirmed.

2. format_bytes made #[cfg(test)] without a deprecation path

format_bytes was a pub fn and is now only compiled under #[cfg(test)]. The container-list code has been updated to use write_bytes, so there is no internal breakage — but marking a formerly-public function as test-only is a silent API break for any code linking the library. If dtop is binary-only the impact is zero, but the intent should be documented. At minimum, drop the pub if it is truly test-internal.

3. visible_columns_cache in AppState — silent precondition

The cache lives in the state layer but must be refreshed by the caller (refresh_visible_columns) before it is valid. Nothing prevents stale reads from non-render code paths. The pub(crate) visibility and docstring help, but consider returning the slice directly to make the precondition impossible to violate:

pub(crate) fn refresh_visible_columns(&mut self) -> &[Column] {
    self.visible_columns_cache.clear();
    self.visible_columns_cache.extend(...);
    &self.visible_columns_cache
}

This also removes the separate let visible_columns = &app_state.visible_columns_cache; line in container_list.rs.

4. #[path] duplication in main.rs

#[cfg(test)]
#[path = "alloc_counter.rs"]
mod alloc_counter;

This creates two independent copies of the module (lib::alloc_counter and main::alloc_counter), which is correct but surprising. A one-line comment explaining the reason — that main.rs is a separate crate root so it cannot reuse the lib.rs module declaration — would save future maintainers from confusion.


Minor Notes

  • let _ on write! results in write_byte_value: idiomatic and correct since write! to a String is infallible. No change needed, but .expect("write to String is infallible") would be marginally more self-documenting.
  • realloc counted as allocation: correct for catching hot-path growth events (e.g. a Vec outgrowing capacity mid-render). The comment explains the rationale well.
  • DisableOnDrop guard in count_allocations: good practice — prevents a leaked ENABLED = true if the measured closure panics.
  • has_multiple_hosts(): clean and correct. The early-return iterator pattern is the right approach.

Summary

The optimization work is solid and the testing infrastructure is well thought out. The primary blocker is CI stability — the allocation bounds need Linux x86_64 numbers before they can reliably protect against regressions in the actual CI environment. The other items are low severity but worth addressing.

@amir20 amir20 merged commit 3679e38 into master Jun 7, 2026
11 checks passed
@amir20 amir20 deleted the claude/memory-allocation-optimization-bpWss branch June 7, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants