Add allocation-regression tests and optimize hot-path rendering#305
Conversation
Add a test-only allocation-counting global allocator and allocation regression tests for the rendering hot path, then drive the per-frame constant allocation overhead to zero. - Replace 3x per-frame HashSet allocations (host-column detection) with a non-allocating AppState::has_multiple_hosts() early-return scan. - Cache the visible-columns list in a reusable buffer (clear + extend), so refreshing it each frame no longer allocates a Vec. - Format memory progress-bar byte values directly into the bar buffer via a new write_bytes helper, removing two intermediate String allocations per row. Steady-state render (30 containers): 1334 -> 1271 allocs/frame. The remaining ~41 allocs/container are structural to ratatui's immediate-mode Table/Row/Cell/Text widgets. Snapshot output is unchanged.
|
Code Review Overview: This PR adds allocation-counting test infrastructure and hot-path rendering optimizations. The intent is solid and the specific optimizations (eliminating per-frame HashSet and Vec creation) are real wins. Issues:
If f() panics, ENABLED is never reset to false. Allocations during unwinding continue to be counted. Fix with a drop guard - a struct whose Drop impl sets ENABLED back to false, placed before the f() call.
Callers can read the cache before refresh_visible_columns() is called this frame with no enforcement the data is fresh. Restricting to pub(crate) or combining into a single accessor that calls refresh then returns the slice would eliminate the temporal coupling.
src/ui/mod.rs already gates the module with cfg(test). The inner mod tests wrapper does not need its own cfg(test) attribute.
The fixed bounds (<=1300, <=1400, <=42/container) were measured on a specific platform and Rust/ratatui version. They can differ on macOS vs. Linux or after ratatui updates. A comment noting the bounds are intentionally generous (~2x measured) and the platform they were established on would help future maintainers.
hosts[i % hosts.len()] divides by zero if hosts is empty. All call sites pass non-empty slices so it is not a real bug, but an assert at the top of the function would document the invariant. Positives:
Minor: The let _ = if ... { write!(...) } else { write!(...) }; pattern in write_byte_value is unusual. Both branches produce fmt::Result and a let _ on the outer if discards it. Restructuring as explicit if/else if chains with let _ = write!(...) per branch would be clearer to readers unfamiliar with the idiom. Overall this is a well-scoped optimization PR. Addressing the panic-safety issue in count_allocations and narrowing the visibility of visible_columns_cache would make it ready to merge. |
Docker Image Built Successfully |
- count_allocations: restore ENABLED via a drop guard so a panicking
measurement can't leave counting active on a reused harness thread
- AppState::visible_columns_cache and refresh_visible_columns are now
pub(crate): the cache is render-internal scratch space, not public API
- alloc_tests: drop the redundant inner #[cfg(test)] (module is already
test-gated in ui/mod.rs); assert non-empty hosts in build_state
- alloc_tests: document the measured allocation counts, the platform/
toolchain they were taken on, and that the bounds are deliberately tight
- formatters::write_byte_value: use per-branch `let _ = write!(...)`
instead of `let _ = if { write! } else { write! }`
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Code ReviewOverviewThis PR adds a The direction is good — measuring allocation counts in the hot path is a valuable regression guard, and the optimizations are clean. A few issues need attention before merge. Issues1. CI stability: allocation bounds measured only on aarch64-darwinThe comment documents measurements from aarch64-darwin (1271 / 1365 / 41.5) but CI almost certainly runs on Linux x86_64. String formatting internals, Risk: These tests may fail spuriously on the first Linux CI run if the platform numbers exceed the darwin-derived bounds. Suggestion: Re-baseline against Linux CI numbers before merge and document both, or gate the tests with 2.
|
Summary
This PR adds comprehensive allocation-regression tests to measure and prevent heap allocation regressions in the rendering hot path, and optimizes the container list renderer to eliminate per-frame allocations where possible.
Key Changes
Allocation Regression Testing Infrastructure:
src/alloc_counter.rs: A test-onlyCountingAllocatorthat wraps the system allocator and counts heap allocations on the current thread using thread-local statesrc/ui/alloc_tests.rs: Three allocation-regression tests that measure steady-state rendering costs:CountingAllocatoras the global allocator in bothsrc/lib.rsandsrc/main.rsfor test executionHot-Path Rendering Optimizations:
visible_columns_cachetoAppStatethat is refreshed in-place each frame (clear + extend), eliminating per-frameVecallocation for the visible columns listAppState::has_multiple_hosts()that iterates container keys with early return instead of collecting into aHashSet, performing zero heap allocationssrc/ui/formatters.rsto providewrite_bytes()that formats directly into an existingStringbuffer, avoiding intermediate allocationscreate_memory_progress_bar()to format byte values directly into the result buffer instead of creating intermediateStringobjectsCode Organization:
format_bytes()to test-only (marked#[cfg(test)]) since production rendering now useswrite_bytes()src/ui/render.rsto use the newhas_multiple_hosts()method instead of creating temporaryHashSetssrc/ui/container_list.rsto use the cached visible columns and direct buffer formattingImplementation Details
The allocation counter uses thread-local state (
ENABLEDandCOUNT) to track allocations only when explicitly enabled, allowing parallel test execution without interference. Therecord_alloc()function is called fromalloc(),alloc_zeroed(), andrealloc()to capture all allocation events relevant to the hot path.The visible columns cache reuses the same
Vecacross frames by clearing and extending it, which is more efficient than creating a newVeceach time. Thehas_multiple_hosts()method avoids allocations by using an early-return pattern instead of collecting unique hosts into a set.These optimizations reduce the per-frame allocation count from structural overhead while maintaining code clarity and correctness.
https://claude.ai/code/session_01KdiZdKYcPugWwy2Bm8qdRo