diff --git a/bench/README.md b/bench/README.md new file mode 100644 index 0000000..f8ef905 --- /dev/null +++ b/bench/README.md @@ -0,0 +1,163 @@ +# Argon scaling benchmarks + +These benchmarks stress the Argon compiler along the axes raised in review — +**number of shapes**, **number of (coupled) constraints**, **number of cell +instances**, and **depth of hierarchy** — and record how compile time and peak +memory scale with each. They exist to answer questions of the form: + +> *How does the framework scale to layouts with substantially more hierarchy, +> more constraints, and a larger number of editable objects?* + +The Argon sources that are swept live in [`../examples/`](../examples): + +| Example | Cell(s) | Axis stressed | +| -------------------------------- | ----------------------- | ------------- | +| `examples/stress_shapes` | `shapes(n)` | `n` independent rectangles in one cell (generated by recursion) | +| `examples/stress_shapes` | `shapes_loop(n)` | the same geometry generated with a `for` loop over `std::range` (also stresses the functional list representation) | +| `examples/stress_constraints` | `constraints(n)` | a ring of `n+1` rectangles whose edges are mutually coupled, forcing the general (dense) linear solver | +| `examples/stress_instances` | `instances(n)` | `n` instances of a single cached leaf cell | +| `examples/stress_hierarchy` | `h0 .. h8` | a chain of cells `h{k}` each instantiating `h{k-1}`; compiling `h{k}` exercises `k` levels of hierarchy | + +The benchmark *drivers* are the `bench_*` tests in +[`../core/compiler/src/lib.rs`](../core/compiler/src/lib.rs). For the +hierarchy axis the driver generates `h0..h{depth}` workspaces on the fly (a +single `.ar` file cannot express a runtime-variable depth because Argon cells +cannot be recursive or forward-referenced). + +## Running + +The `bench_*` tests are marked `#[ignore]` because the larger sizes take well +over 6 s in a debug build. Run them in **release**, **serially** (peak-memory +tracking uses a process-global allocator, so concurrent tests would corrupt the +measurements): + +```bash +cargo test -p compiler --release -- --ignored --test-threads=1 --nocapture bench_ +``` + +Each test writes a CSV to `bench/results/.csv` with columns +`size,time_s,peak_bytes,n_objects`. Then render the figure (and print a summary +table of fitted scaling models): + +```bash +python3 bench/plot_scaling.py # writes bench/argon_scaling.{png,pdf} +``` + +`plot_scaling.py` needs only the standard library to print the summary table; +`matplotlib` is required to draw the figure. + +To run a single axis, e.g. just the instance sweep: + +```bash +cargo test -p compiler --release -- --ignored --test-threads=1 --nocapture bench_instances +``` + +The fast `stress_*_smoke` tests (which just check that each example still +compiles) run in the normal `cargo test` suite and are **not** ignored. + +### Configuring the sweeps + +Every axis reads its list of sizes from an environment variable, falling back +to a default. This keeps the benchmarks general-purpose: the same test can be +re-run at a different scale — for example after a compiler optimization changes +how an axis scales — without editing any source. Pass a comma-separated list: + +| Env var | Axis | Default | +| ------- | ---- | ------- | +| `ARGON_BENCH_SHAPES` | shapes (recursion) | `500,1000,2000,4000,8000,16000,32000` | +| `ARGON_BENCH_SHAPES_LOOP` | shapes (`for` loop) | `250,500,1000,2000` | +| `ARGON_BENCH_INSTANCES` | instances | `500,…,64000` | +| `ARGON_BENCH_CONSTRAINTS` | coupled constraints | `32,64,128,256,512,1024` | +| `ARGON_BENCH_HIER_SINGLE` | hierarchy (1 ref) | `4,8,16,32,48,64,96,128` | +| `ARGON_BENCH_HIER_DOUBLE` | hierarchy (2 refs) | `2,4,6,8,10,12,14,16,18` | + +```bash +# e.g. sweep the for-loop variant out to the same sizes as bench_shapes +ARGON_BENCH_SHAPES_LOOP=500,1000,2000,4000,8000,16000,32000 \ + cargo test -p compiler --release -- --ignored --test-threads=1 --nocapture bench_shapes_loop +``` + +The defaults are sized so the suite runs in a few minutes within a few GiB on +the current build; they are not claims about how any axis "should" scale. + +## Methodology + +- **Time**: minimum wall-clock time over a few repetitions (`min` is robust to + noise on a shared machine). Parsing/static analysis is done once per size and + excluded from the hierarchy timings; everything else is end-to-end `compile()`. +- **Memory**: a `#[global_allocator]` compiled only into the test binary + (`bench_alloc::Tracking` in `lib.rs`) tracks live and peak heap bytes. We + report the peak heap *growth* during a single `compile()`. +- **Build**: release profile. Numbers below were collected on a Linux machine; + absolute values are machine-dependent but the *scaling* is not. + +## Results + +The numbers below are a **snapshot** from one release build on the development +machine; they are produced by the commands above and meant to be regenerated +(absolute values are machine- and build-dependent). `n` is the per-axis size +parameter; "peak" is peak heap allocated during compilation. + +| Axis | largest `n` | time @ largest | peak mem @ largest | empirical scaling | +| ---- | ----------- | -------------- | ------------------ | ----------------- | +| Shapes (recursion) | 32 000 rects | 1.53 s | 0.94 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) | +| Instances | 64 000 insts | 3.14 s | 1.29 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) | +| Hierarchy, 1 child ref | depth 128 | 0.09 s | 0.12 GiB | **polynomial** (`∝ depth^1.3–1.4`) | +| Coupled constraints | 1 024 rects | 21.7 s | 0.13 GiB | **super-cubic in time** (see below) | +| Shapes (`for`-loop) | 2 000 rects | 0.59 s | 4.1 GiB | **quadratic** (mem `∝ n^2`) | +| Hierarchy, 2 child refs | depth 18 | 11.5 s | 3.6 GiB | **exponential** (`×1.9` per level) | + +### Interpretation + +- **Geometry and instances scale linearly.** Compiling a single flat cell with + tens of thousands of fully-constrained rectangles, or with tens of thousands + of instances of a cached cell, is linear in both time and memory. Each shape + contributes 4 solver variables and each instance 2, and because their + constraints pin one variable at a time the solver resolves them by + back-substitution without ever forming a matrix. This is the common case for + real parametric cells and it scales comfortably to "thousands of rectangles". + +- **Coupled constraints are the expensive axis.** When constraints form one + large connected component that *cannot* be back-substituted (here, a ring of + mutually-coupled edges), Argon falls back to its general linear solver, which + builds a dense matrix and takes an SVD. The per-doubling cost climbs from ~4× + at `n=64→128` to ~15× at `n=512→1024`, i.e. it steepens toward the `O(n^3)` + of dense factorization (and worse, because `solve()` is re-run as the system + is assembled). This is the "general linear constraint solving (slow)" caveat + in the top-level README, quantified: ~1 000 coupled editable variables take + ~20 s. Layouts whose constraints decompose into many small independent groups + (the typical case) avoid this entirely. + +- **Hierarchy depth is limited by the type representation.** A cell's static + type (`CellTy`) stores the full structural type of every field, including + instantiated sub-cells. If a cell references its child **once** (e.g. + `let i = inst(child());`), depth scales polynomially (`~depth^1.4`) and is + fine to ~128 levels. If it references the child **twice** (e.g. the + `let c = child(); let i = inst(c);` idiom from the tutorial), the type of + `h{k}` contains two copies of the type of `h{k-1}`, so the representation — + and hence compile time and memory — **doubles with every level** (`×1.9` per + level measured). Beyond ~depth 20 this exhausts memory (depth 20 alone needs + ~14.5 GiB / 50 s; depth 18 is ~3.6 GiB / 11.5 s, which is where this sweep is + capped). Very deep hierarchies additionally hit a native-recursion stack + limit in the compiler at a few hundred levels. + +- **Recursion vs. iteration measures the list/iteration machinery.** `shapes` + and `shapes_loop` emit identical geometry; the only difference is that + `shapes_loop` builds and iterates a `std::range` list. On the build measured + here that list path is markedly heavier (≈4 GiB to emit 2 000 rectangles via + a `for` loop, vs. 32 000 by recursion in under 1 GiB), so the gap between the + two series is a direct measure of the cost of the list representation rather + than of the geometry or solver. Re-running both series (e.g. with + `ARGON_BENCH_SHAPES_LOOP` set to the same sizes as `bench_shapes`) is the way + to see that cost change as the iteration/list machinery is optimized. + +The takeaways for the paper: editable-object count and instance count scale +linearly; the practically-relevant limits are the dense general constraint +solver on large *coupled* systems and structural type expansion on deep +hierarchies — both of which line up with the future-work items already listed +in the project README (faster linear constraint solving; incremental +compilation). The bullets above describe the build at the time of measurement; +because every axis is re-runnable (and size-configurable), the same harness can +be used to confirm improvements from compiler optimizations. + +![Argon scaling](argon_scaling.png) diff --git a/bench/argon_scaling.pdf b/bench/argon_scaling.pdf new file mode 100644 index 0000000..aa37899 Binary files /dev/null and b/bench/argon_scaling.pdf differ diff --git a/bench/argon_scaling.png b/bench/argon_scaling.png new file mode 100644 index 0000000..7984513 Binary files /dev/null and b/bench/argon_scaling.png differ diff --git a/bench/plot_scaling.py b/bench/plot_scaling.py new file mode 100644 index 0000000..68c29b7 --- /dev/null +++ b/bench/plot_scaling.py @@ -0,0 +1,156 @@ +#!/usr/bin/env python3 +"""Plot Argon compile-time and memory scaling from the benchmark CSVs. + +The CSVs are produced by the `bench_*` tests in `core/compiler/src/lib.rs` +(see ../bench/README.md for how to run them). Each CSV has the columns + + size,time_s,peak_bytes,n_objects + +where `size` is the swept parameter for that axis (number of shapes, number of +coupled constraints, number of instances, or hierarchy depth). + +Usage: + python3 bench/plot_scaling.py # reads bench/results/*.csv + python3 bench/plot_scaling.py --results DIR --out FILE +""" +import argparse +import csv +import math +import os +import sys + +# Series in the order we want them drawn. Each entry is +# (csv_basename, display_label, size_unit, model) +# where `model` is "poly" (fit a power law y ~ n^k) or "exp" (fit y ~ b^n, +# appropriate for the exponentially-scaling hierarchy variant). +SERIES = [ + ("shapes", "Shapes (recursion)", "# rectangles", "poly"), + ("shapes_loop", "Shapes (for-loop / cons list)", "# rectangles", "poly"), + ("instances", "Instances", "# instances", "poly"), + ("constraints", "Coupled constraints", "# coupled rects", "poly"), + ("hierarchy_single_ref", "Hierarchy (1 child ref)", "depth", "poly"), + ("hierarchy_double_ref", "Hierarchy (2 child refs)", "depth", "exp"), +] + + +def load(path): + xs, ts, ms = [], [], [] + with open(path, newline="") as f: + for row in csv.DictReader(f): + xs.append(float(row["size"])) + ts.append(float(row["time_s"])) + ms.append(float(row["peak_bytes"])) + return xs, ts, ms + + +def _slope(pairs): + """Least-squares slope of a list of (x, y) points.""" + n = len(pairs) + if n < 2: + return float("nan") + sx = sum(p[0] for p in pairs) + sy = sum(p[1] for p in pairs) + sxx = sum(p[0] * p[0] for p in pairs) + sxy = sum(p[0] * p[1] for p in pairs) + denom = n * sxx - sx * sx + if abs(denom) < 1e-12: + return float("nan") + return (n * sxy - sx * sy) / denom + + +def fit_exponent(xs, ys): + """Power-law exponent: slope of log(y) vs log(x).""" + return _slope([(math.log(x), math.log(y)) for x, y in zip(xs, ys) if x > 0 and y > 0]) + + +def fit_base(xs, ys): + """Exponential base b for y ~ b^x: from the slope of log(y) vs x.""" + s = _slope([(x, math.log(y)) for x, y in zip(xs, ys) if y > 0]) + return math.exp(s) + + +def describe(model, xs, ys): + """Return (legend_suffix, summary_string) for the fitted scaling model.""" + if model == "exp": + b = fit_base(xs, ys) + return f"exp., $\\times{b:.1f}$/step", f"exponential (x{b:.2f} per unit)" + k = fit_exponent(xs, ys) + return f"$\\propto n^{{{k:.1f}}}$", f"~n^{k:.2f}" + + +def main(): + here = os.path.dirname(os.path.abspath(__file__)) + ap = argparse.ArgumentParser() + ap.add_argument("--results", default=os.path.join(here, "results")) + ap.add_argument("--out", default=os.path.join(here, "argon_scaling")) + args = ap.parse_args() + + data = {} + for key, label, unit, model in SERIES: + path = os.path.join(args.results, f"{key}.csv") + if os.path.exists(path): + xs, ts, ms = load(path) + if xs: + data[key] = (label, unit, model, xs, ts, ms) + + if not data: + sys.exit( + f"No benchmark CSVs found in {args.results}.\n" + "Run the benchmarks first (see bench/README.md)." + ) + + # Print a summary table of fitted scaling models. + print(f"{'series':<30}{'points':>7} {'time scaling':<22}{'mem scaling':<22}max(time, mem)") + for key, _, _, _ in SERIES: + if key not in data: + continue + label, unit, model, xs, ts, ms = data[key] + _, t_desc = describe(model, xs, ts) + _, m_desc = describe(model, xs, ms) + print( + f"{label:<30}{len(xs):>7} {t_desc:<22}{m_desc:<22}" + f"{max(ts):.3f} s / {max(ms) / 2**20:.0f} MiB" + ) + + try: + import matplotlib + + matplotlib.use("Agg") + import matplotlib.pyplot as plt + except ImportError: + sys.exit("\nmatplotlib not installed; printed summary only. `pip install matplotlib` to draw.") + + fig, (ax_t, ax_m) = plt.subplots(1, 2, figsize=(13, 5.2)) + markers = ["o", "s", "^", "D", "v", "P"] + for (key, _, _, _), marker in zip(SERIES, markers): + if key not in data: + continue + label, unit, model, xs, ts, ms = data[key] + t_suffix, _ = describe(model, xs, ts) + m_suffix, _ = describe(model, xs, ms) + ax_t.plot(xs, ts, marker=marker, label=f"{label} ({t_suffix})") + ax_m.plot(xs, [m / 2**20 for m in ms], marker=marker, + label=f"{label} ({m_suffix})") + + for ax in (ax_t, ax_m): + ax.set_xscale("log") + ax.set_yscale("log") + ax.set_xlabel("problem size $n$ (rectangles / constraints / instances / depth)") + ax.grid(True, which="both", ls=":", alpha=0.4) + + ax_t.set_ylabel("compile time (s)") + ax_t.set_title("Argon compile-time scaling") + ax_m.set_ylabel("peak heap allocated (MiB)") + ax_m.set_title("Argon memory scaling") + ax_t.legend(fontsize=8, loc="upper left") + ax_m.legend(fontsize=8, loc="upper left") + fig.tight_layout() + + for ext in ("png", "pdf"): + out = f"{args.out}.{ext}" + fig.savefig(out, dpi=150, bbox_inches="tight") + print(f"wrote {out}") + + +if __name__ == "__main__": + main() diff --git a/bench/results/constraints.csv b/bench/results/constraints.csv new file mode 100644 index 0000000..98cc1d8 --- /dev/null +++ b/bench/results/constraints.csv @@ -0,0 +1,7 @@ +size,time_s,peak_bytes,n_objects +32,0.00383075,2585876,33 +64,0.00726206,3829492,65 +128,0.029626408,6749992,129 +256,0.21347691,15396760,257 +512,1.4369801039999999,42127416,513 +1024,21.745505947,133337720,1025 diff --git a/bench/results/hierarchy_double_ref.csv b/bench/results/hierarchy_double_ref.csv new file mode 100644 index 0000000..723e66f --- /dev/null +++ b/bench/results/hierarchy_double_ref.csv @@ -0,0 +1,10 @@ +size,time_s,peak_bytes,n_objects +2,0.000971778,1325410,5 +4,0.001176136,1653373,9 +6,0.001882165,2474335,13 +8,0.004620015,5364013,17 +10,0.015455478,16602635,21 +12,0.088864192,61484713,25 +14,0.481928844,240102311,29 +16,2.506498165,954179013,33 +18,11.538940627,3810189475,37 diff --git a/bench/results/hierarchy_single_ref.csv b/bench/results/hierarchy_single_ref.csv new file mode 100644 index 0000000..e2fd0f5 --- /dev/null +++ b/bench/results/hierarchy_single_ref.csv @@ -0,0 +1,9 @@ +size,time_s,peak_bytes,n_objects +4,0.000845819,1556216,9 +8,0.001178198,2140495,17 +16,0.002346772,4034443,33 +32,0.005942098,10387267,65 +48,0.012042876,19826107,97 +64,0.022953693,33362240,129 +96,0.052964538,69348808,193 +128,0.090173076,120382868,257 diff --git a/bench/results/instances.csv b/bench/results/instances.csv new file mode 100644 index 0000000..b5d5cdd --- /dev/null +++ b/bench/results/instances.csv @@ -0,0 +1,9 @@ +size,time_s,peak_bytes,n_objects +500,0.012283447,11806766,501 +1000,0.02516376,22390998,1001 +2000,0.056507946,43559462,2001 +4000,0.147355775,85896390,4001 +8000,0.310525996,170570230,8001 +16000,0.689046789,339917926,16001 +32000,1.457187325,678613350,32001 +64000,3.140683519,1356004118,64001 diff --git a/bench/results/shapes.csv b/bench/results/shapes.csv new file mode 100644 index 0000000..b9305b0 --- /dev/null +++ b/bench/results/shapes.csv @@ -0,0 +1,8 @@ +size,time_s,peak_bytes,n_objects +500,0.012574383,16159584,500 +1000,0.028867058,31116160,1000 +2000,0.071651056,61029296,2000 +4000,0.158150608,120855616,4000 +8000,0.329667899,240508160,8000 +16000,0.70516885,479813376,16000 +32000,1.530754693,958423696,32000 diff --git a/bench/results/shapes_loop.csv b/bench/results/shapes_loop.csv new file mode 100644 index 0000000..90cde88 --- /dev/null +++ b/bench/results/shapes_loop.csv @@ -0,0 +1,5 @@ +size,time_s,peak_bytes,n_objects +250,0.026441618,73227389,250 +500,0.091560351,274248679,500 +1000,0.269871454,1063292012,1000 +2000,0.589680644,4189379419,2000 diff --git a/core/compiler/src/lib.rs b/core/compiler/src/lib.rs index 389a608..76c16f7 100644 --- a/core/compiler/src/lib.rs +++ b/core/compiler/src/lib.rs @@ -7,6 +7,85 @@ pub mod layer; pub mod parse; pub mod solver; +/// A global allocator that tracks live and peak heap usage so that the scaling +/// benchmarks in the test module can report memory consumption alongside +/// runtime. It forwards every request to the system allocator and only adds +/// atomic byte counters, so behavior is otherwise unchanged. +/// +/// This allocator is only compiled into the test binary (`cfg(test)`); release +/// and library builds use the default allocator. The counters are process-wide, +/// so the benchmarks that read them must be run serially +/// (`--test-threads=1`); see `bench/README.md`. +#[cfg(test)] +mod bench_alloc { + use std::alloc::{GlobalAlloc, Layout, System}; + use std::sync::atomic::{AtomicUsize, Ordering}; + + pub static LIVE: AtomicUsize = AtomicUsize::new(0); + pub static PEAK: AtomicUsize = AtomicUsize::new(0); + + pub struct Tracking; + + #[inline] + fn record_growth(delta: usize) { + let live = LIVE.fetch_add(delta, Ordering::Relaxed) + delta; + PEAK.fetch_max(live, Ordering::Relaxed); + } + + unsafe impl GlobalAlloc for Tracking { + unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + let ptr = unsafe { System.alloc(layout) }; + if !ptr.is_null() { + record_growth(layout.size()); + } + ptr + } + + unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { + unsafe { System.dealloc(ptr, layout) }; + LIVE.fetch_sub(layout.size(), Ordering::Relaxed); + } + + unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { + let ptr = unsafe { System.alloc_zeroed(layout) }; + if !ptr.is_null() { + record_growth(layout.size()); + } + ptr + } + + unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 { + let new_ptr = unsafe { System.realloc(ptr, layout, new_size) }; + if !new_ptr.is_null() { + if new_size >= layout.size() { + record_growth(new_size - layout.size()); + } else { + LIVE.fetch_sub(layout.size() - new_size, Ordering::Relaxed); + } + } + new_ptr + } + } + + /// Resets the peak counter to the current live usage. Call this immediately + /// before the region of interest, then read [`peak`] afterwards. + pub fn reset_peak() { + PEAK.store(LIVE.load(Ordering::Relaxed), Ordering::Relaxed); + } + + pub fn live() -> usize { + LIVE.load(Ordering::Relaxed) + } + + pub fn peak() -> usize { + PEAK.load(Ordering::Relaxed) + } +} + +#[cfg(test)] +#[global_allocator] +static BENCH_ALLOC: bench_alloc::Tracking = bench_alloc::Tracking; + #[cfg(test)] mod tests { @@ -74,6 +153,448 @@ mod tests { const ARGON_SSE_BASIC: &str = concatcp!(EXAMPLES_DIR, "/sse_basic/lib.ar"); const ARGON_PRECEDENCE: &str = concatcp!(EXAMPLES_DIR, "/precedence/lib.ar"); + // --------------------------------------------------------------------- + // Scaling / stress benchmarks. + // + // These exercise Argon along the axes raised in review: number of shapes, + // number of (coupled) constraints, number of cell instances, and depth of + // hierarchy. Each `bench_*` test sweeps a size parameter, records compile + // time and peak heap usage, and writes a CSV to `bench/results/` that + // `bench/plot_scaling.py` turns into the scaling figure. + // + // The `bench_*` tests are `#[ignore]`d because the larger sizes take well + // over 6 s in a debug build. Run them in release, serially (peak-memory + // tracking is process-global), e.g.: + // + // RUSTFLAGS=... cargo test -p compiler --release -- \ + // --ignored --test-threads=1 bench_ + // + // The `stress_*_smoke` tests below run in the normal (debug) test suite and + // just check that each example still compiles. + // --------------------------------------------------------------------- + const ARGON_STRESS_SHAPES: &str = concatcp!(EXAMPLES_DIR, "/stress_shapes/lib.ar"); + const ARGON_STRESS_CONSTRAINTS: &str = concatcp!(EXAMPLES_DIR, "/stress_constraints/lib.ar"); + const ARGON_STRESS_INSTANCES: &str = concatcp!(EXAMPLES_DIR, "/stress_instances/lib.ar"); + const ARGON_STRESS_HIERARCHY: &str = concatcp!(EXAMPLES_DIR, "/stress_hierarchy/lib.ar"); + + use crate::compile::CompileOutput; + + /// Serializes the memory/timing-sensitive benchmarks. Even when the test + /// runner is given multiple threads, holding this lock ensures only one + /// `bench_*` body runs at a time, so the process-global allocator counters + /// and wall-clock timings are not perturbed by a concurrent benchmark. + static BENCH_LOCK: std::sync::Mutex<()> = std::sync::Mutex::new(()); + + fn bench_guard() -> std::sync::MutexGuard<'static, ()> { + // Recover from poisoning: a panic in one benchmark should not wedge the + // others, and the lock guards only measurement isolation. + BENCH_LOCK.lock().unwrap_or_else(|e| e.into_inner()) + } + + /// Runs `f` `reps` times, returning the minimum wall-clock time (robust to + /// noise on a shared machine), the maximum peak heap growth observed during + /// a run, and the result of the final run. + fn measure(reps: u32, f: impl Fn() -> R) -> (std::time::Duration, usize, R) { + assert!(reps >= 1); + let mut best = std::time::Duration::MAX; + let mut peak = 0usize; + let mut result = None; + for _ in 0..reps { + // Free the previous run's result so each measurement starts from + // the same baseline. + drop(result.take()); + crate::bench_alloc::reset_peak(); + let base = crate::bench_alloc::live(); + let start = std::time::Instant::now(); + let r = f(); + best = best.min(start.elapsed()); + peak = peak.max(crate::bench_alloc::peak().saturating_sub(base)); + result = Some(r); + } + (best, peak, result.unwrap()) + } + + fn count_objects(o: &CompileOutput) -> usize { + let data = match o { + CompileOutput::Valid(d) => Some(d), + CompileOutput::ExecErrors(e) => e.output.as_ref(), + _ => None, + }; + data.map(|d| d.cells.values().map(|c| c.objects.len()).sum()) + .unwrap_or(0) + } + + fn count_cells(o: &CompileOutput) -> usize { + match o { + CompileOutput::Valid(d) => d.cells.len(), + CompileOutput::ExecErrors(e) => e.output.as_ref().map(|d| d.cells.len()).unwrap_or(0), + _ => 0, + } + } + + /// Sweep sizes for a benchmark axis. Returns `default` unless the named + /// environment variable is set to a comma-separated list of sizes, in which + /// case that list is used. This keeps the benchmarks general-purpose: the + /// same test can be re-run at a larger (or smaller) scale without editing + /// the source, e.g. after a compiler optimization changes how an axis + /// scales: + /// + /// ARGON_BENCH_SHAPES_LOOP=500,1000,2000,4000,8000,16000,32000 \ + /// cargo test -p compiler --release -- --ignored --test-threads=1 \ + /// --nocapture bench_shapes_loop + /// + /// The defaults are chosen so the whole suite runs in a few minutes and + /// stays within a few GiB on the current build; they are not assumptions + /// about how any axis "should" scale. + fn bench_sizes(env_var: &str, default: &[i64]) -> Vec { + match std::env::var(env_var) { + Ok(s) if !s.trim().is_empty() => s + .split(',') + .filter_map(|x| x.trim().parse::().ok()) + .collect(), + _ => default.to_vec(), + } + } + + fn write_bench_csv(name: &str, rows: &[(f64, f64, usize, usize)]) { + use std::fmt::Write; + let dir = PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("../../bench/results"); + std::fs::create_dir_all(&dir).unwrap(); + let mut s = String::from("size,time_s,peak_bytes,n_objects\n"); + for (size, t, mem, nobj) in rows { + writeln!(s, "{size},{t},{mem},{nobj}").unwrap(); + } + let path = dir.join(format!("{name}.csv")); + std::fs::write(&path, s).unwrap(); + eprintln!("wrote {}", path.display()); + } + + /// Generates a workspace of `depth + 1` cells `h0..h{depth}` where each + /// `h{k}` instantiates `h{k-1}`. With `double_ref = false` the child is + /// referenced by a single (instance) binding; with `double_ref = true` the + /// child cell is also bound to a `let`, which makes the structural cell type + /// of `h{k}` contain two copies of the type of `h{k-1}`. + fn gen_hier(depth: usize, double_ref: bool) -> String { + let mut s = + String::from("cell h0() {\n rect(\"met1\", x0=0., y0=0., x1=10., y1=10.);\n}\n"); + for k in 1..=depth { + let body = if double_ref { + format!(" let child = h{}();\n let i = inst(child);\n", k - 1) + } else { + format!(" let i = inst(h{}());\n", k - 1) + }; + s.push_str(&format!( + "cell h{k}() {{\n rect(\"met1\", x0=0., y0=0., x1=10., y1=10.);\n{body} eq(i.x, 0.);\n eq(i.y, 10.);\n}}\n", + )); + } + s + } + + /// Axis 1: number of independent shapes in a single cell. + #[test] + #[ignore = "scaling benchmark; run in release, serially: cargo test -p compiler --release -- --ignored --test-threads=1 bench_"] + fn bench_shapes() { + let _g = bench_guard(); + let o = parse_workspace_with_std(ARGON_STRESS_SHAPES); + assert!(o.static_errors().is_empty(), "{:?}", o.static_errors()); + let ast = o.ast(); + let mut rows = Vec::new(); + for &n in &bench_sizes( + "ARGON_BENCH_SHAPES", + &[500, 1000, 2000, 4000, 8000, 16000, 32000], + ) { + let (dt, mem, out) = measure(3, || { + compile( + &ast, + CompileInput { + cell: &["shapes"], + args: vec![CellArg::Int(n)], + lyp_file: &PathBuf::from(BASIC_LYP), + }, + ) + }); + assert!(out.is_valid(), "shapes(n={n}) invalid"); + let nobj = count_objects(&out); + eprintln!( + "shapes n={n:>6} objects={nobj:>6} time={dt:>11.3?} peak={:>8.2} MiB", + mem as f64 / (1usize << 20) as f64 + ); + rows.push((n as f64, dt.as_secs_f64(), mem, nobj)); + } + write_bench_csv("shapes", &rows); + } + + /// Axis 1b: the same geometry generated with an idiomatic `for` loop over + /// `std::range`, which additionally exercises Argon's functional list + /// representation (`cons`). Capped at a smaller size because list + /// construction is super-linear. + #[test] + #[ignore = "scaling benchmark; run in release, serially: cargo test -p compiler --release -- --ignored --test-threads=1 bench_"] + fn bench_shapes_loop() { + let _g = bench_guard(); + let o = parse_workspace_with_std(ARGON_STRESS_SHAPES); + assert!(o.static_errors().is_empty(), "{:?}", o.static_errors()); + let ast = o.ast(); + // This variant generates the same geometry as `bench_shapes` but with a + // `for` loop over `std::range`, so its cost also includes building and + // iterating the list. The default sweep is kept smaller than + // `bench_shapes` only so the default run stays bounded in memory on the + // current build; override `ARGON_BENCH_SHAPES_LOOP` to sweep to the same + // sizes as `bench_shapes` (e.g. to compare the two after changes to the + // list representation). + let mut rows = Vec::new(); + for &n in &bench_sizes("ARGON_BENCH_SHAPES_LOOP", &[250, 500, 1000, 2000]) { + let (dt, mem, out) = measure(2, || { + compile( + &ast, + CompileInput { + cell: &["shapes_loop"], + args: vec![CellArg::Int(n)], + lyp_file: &PathBuf::from(BASIC_LYP), + }, + ) + }); + assert!(out.is_valid(), "shapes_loop(n={n}) invalid"); + let nobj = count_objects(&out); + eprintln!( + "shapes_loop n={n:>6} objects={nobj:>6} time={dt:>11.3?} peak={:>8.2} MiB", + mem as f64 / (1usize << 20) as f64 + ); + rows.push((n as f64, dt.as_secs_f64(), mem, nobj)); + } + write_bench_csv("shapes_loop", &rows); + } + + /// Axis 2: number of mutually-coupled constraints solved by the general + /// (dense) linear-constraint solver. + #[test] + #[ignore = "scaling benchmark; run in release, serially: cargo test -p compiler --release -- --ignored --test-threads=1 bench_"] + fn bench_constraints() { + let _g = bench_guard(); + let o = parse_workspace_with_std(ARGON_STRESS_CONSTRAINTS); + assert!(o.static_errors().is_empty(), "{:?}", o.static_errors()); + let ast = o.ast(); + let mut rows = Vec::new(); + for &n in &bench_sizes("ARGON_BENCH_CONSTRAINTS", &[32, 64, 128, 256, 512, 1024]) { + let (dt, mem, out) = measure(1, || { + compile( + &ast, + CompileInput { + cell: &["constraints"], + args: vec![CellArg::Int(n)], + lyp_file: &PathBuf::from(BASIC_LYP), + }, + ) + }); + assert!(out.is_valid(), "constraints(n={n}) invalid"); + let nobj = count_objects(&out); + eprintln!( + "constraints n={n:>6} objects={nobj:>6} time={dt:>11.3?} peak={:>8.2} MiB", + mem as f64 / (1usize << 20) as f64 + ); + rows.push((n as f64, dt.as_secs_f64(), mem, nobj)); + } + write_bench_csv("constraints", &rows); + } + + /// Axis 3: number of instances of a single (cached) leaf cell. + #[test] + #[ignore = "scaling benchmark; run in release, serially: cargo test -p compiler --release -- --ignored --test-threads=1 bench_"] + fn bench_instances() { + let _g = bench_guard(); + let o = parse_workspace_with_std(ARGON_STRESS_INSTANCES); + assert!(o.static_errors().is_empty(), "{:?}", o.static_errors()); + let ast = o.ast(); + let mut rows = Vec::new(); + for &n in &bench_sizes( + "ARGON_BENCH_INSTANCES", + &[500, 1000, 2000, 4000, 8000, 16000, 32000, 64000], + ) { + let (dt, mem, out) = measure(3, || { + compile( + &ast, + CompileInput { + cell: &["instances"], + args: vec![CellArg::Int(n)], + lyp_file: &PathBuf::from(BASIC_LYP), + }, + ) + }); + assert!(out.is_valid(), "instances(n={n}) invalid"); + let nobj = count_objects(&out); + eprintln!( + "instances n={n:>6} objects={nobj:>6} time={dt:>11.3?} peak={:>8.2} MiB", + mem as f64 / (1usize << 20) as f64 + ); + rows.push((n as f64, dt.as_secs_f64(), mem, nobj)); + } + write_bench_csv("instances", &rows); + } + + /// Axis 4: depth of cell hierarchy. Two series are produced: `single_ref` + /// references each child once (polynomial), and `double_ref` references it + /// twice, which triggers exponential structural-type expansion. + #[test] + #[ignore = "scaling benchmark; run in release, serially: cargo test -p compiler --release -- --ignored --test-threads=1 bench_"] + fn bench_hierarchy() { + let _g = bench_guard(); + let dir = PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("build/bench_hier"); + std::fs::create_dir_all(&dir).unwrap(); + let lib = dir.join("lib.ar"); + + let mut rows = Vec::new(); + for depth in bench_sizes("ARGON_BENCH_HIER_SINGLE", &[4, 8, 16, 32, 48, 64, 96, 128]) + .into_iter() + .map(|d| d as usize) + { + std::fs::write(&lib, gen_hier(depth, false)).unwrap(); + let o = parse_workspace_with_std(&lib); + assert!(o.static_errors().is_empty(), "{:?}", o.static_errors()); + let ast = o.ast(); + let cellname = format!("h{depth}"); + let (dt, mem, out) = measure(2, || { + compile( + &ast, + CompileInput { + cell: &[&cellname], + args: vec![], + lyp_file: &PathBuf::from(BASIC_LYP), + }, + ) + }); + assert!(out.is_valid(), "hierarchy single-ref depth={depth} invalid"); + let nobj = count_objects(&out); + eprintln!( + "hier(1 ref) depth={depth:>4} cells={:>4} time={dt:>11.3?} peak={:>8.2} MiB", + count_cells(&out), + mem as f64 / (1usize << 20) as f64 + ); + rows.push((depth as f64, dt.as_secs_f64(), mem, nobj)); + } + write_bench_csv("hierarchy_single_ref", &rows); + + // `double_ref` binds the child cell twice, which (on the current build) + // makes the structural cell type grow quickly with depth, so the + // default sweep is kept shallow to stay within a few GiB. Override + // `ARGON_BENCH_HIER_DOUBLE` to push deeper. + let mut rows = Vec::new(); + for depth in bench_sizes("ARGON_BENCH_HIER_DOUBLE", &[2, 4, 6, 8, 10, 12, 14, 16, 18]) + .into_iter() + .map(|d| d as usize) + { + std::fs::write(&lib, gen_hier(depth, true)).unwrap(); + let o = parse_workspace_with_std(&lib); + assert!(o.static_errors().is_empty(), "{:?}", o.static_errors()); + let ast = o.ast(); + let cellname = format!("h{depth}"); + let (dt, mem, out) = measure(1, || { + compile( + &ast, + CompileInput { + cell: &[&cellname], + args: vec![], + lyp_file: &PathBuf::from(BASIC_LYP), + }, + ) + }); + assert!(out.is_valid(), "hierarchy double-ref depth={depth} invalid"); + let nobj = count_objects(&out); + eprintln!( + "hier(2 refs) depth={depth:>4} time={dt:>11.3?} peak={:>8.2} MiB", + mem as f64 / (1usize << 20) as f64 + ); + rows.push((depth as f64, dt.as_secs_f64(), mem, nobj)); + } + write_bench_csv("hierarchy_double_ref", &rows); + } + + // --- Smoke tests (run in the normal suite; keep these fast) --- + + #[test] + fn stress_shapes_smoke() { + let o = parse_workspace_with_std(ARGON_STRESS_SHAPES); + assert!(o.static_errors().is_empty(), "{:?}", o.static_errors()); + let ast = o.ast(); + for cell in ["shapes", "shapes_loop"] { + let out = compile( + &ast, + CompileInput { + cell: &[cell], + args: vec![CellArg::Int(64)], + lyp_file: &PathBuf::from(BASIC_LYP), + }, + ); + let d = out.unwrap_valid(); + let nrects = d + .cells + .values() + .flat_map(|c| c.objects.values()) + .filter(|o| matches!(o, SolvedValue::Rect(r) if !r.construction)) + .count(); + assert_eq!(nrects, 64, "{cell} should emit 64 rectangles"); + } + } + + #[test] + fn stress_constraints_smoke() { + let o = parse_workspace_with_std(ARGON_STRESS_CONSTRAINTS); + assert!(o.static_errors().is_empty(), "{:?}", o.static_errors()); + let ast = o.ast(); + let out = compile( + &ast, + CompileInput { + cell: &["constraints"], + args: vec![CellArg::Int(32)], + lyp_file: &PathBuf::from(BASIC_LYP), + }, + ); + assert!( + out.is_valid(), + "constraints ring should be fully determined: {out:?}" + ); + } + + #[test] + fn stress_instances_smoke() { + let o = parse_workspace_with_std(ARGON_STRESS_INSTANCES); + assert!(o.static_errors().is_empty(), "{:?}", o.static_errors()); + let ast = o.ast(); + let out = compile( + &ast, + CompileInput { + cell: &["instances"], + args: vec![CellArg::Int(64)], + lyp_file: &PathBuf::from(BASIC_LYP), + }, + ); + let d = out.unwrap_valid(); + let ninsts = d + .cells + .values() + .flat_map(|c| c.objects.values()) + .filter(|o| matches!(o, SolvedValue::Instance(_))) + .count(); + assert_eq!(ninsts, 64, "instances(64) should place 64 instances"); + } + + #[test] + fn stress_hierarchy_smoke() { + let o = parse_workspace_with_std(ARGON_STRESS_HIERARCHY); + assert!(o.static_errors().is_empty(), "{:?}", o.static_errors()); + let ast = o.ast(); + let out = compile( + &ast, + CompileInput { + cell: &["h8"], + args: vec![], + lyp_file: &PathBuf::from(BASIC_LYP), + }, + ); + let d = out.unwrap_valid(); + // h0..h8 = 9 cells of hierarchy. + assert_eq!(d.cells.len(), 9, "h8 should instantiate 9 cells deep"); + } + #[test] fn argon_scopes() { let o = parse_workspace_with_std(ARGON_SCOPES); diff --git a/examples/stress_constraints/lib.ar b/examples/stress_constraints/lib.ar new file mode 100644 index 0000000..d8c677e --- /dev/null +++ b/examples/stress_constraints/lib.ar @@ -0,0 +1,32 @@ +fn coupled_ring(prev: Rect, first: Rect, n: Int) { + // Build a *ring* of rectangles whose left edges are mutually coupled. + // Each step adds a rectangle `cur` and a two-variable difference + // constraint `prev.x0 - cur.x0 = 5` relating it to the previous one. + // Because every constraint involves two unknowns, none can be resolved + // by back-substitution: the whole ring must be solved simultaneously by + // the general linear-constraint solver (a dense SVD over the coupled + // component). The base case closes the ring with a single *sum* + // constraint, which makes the otherwise-underconstrained chain fully + // determined regardless of `n`. + #scope0 if n <= 0 { + eq(prev.x0 + first.x0, 100.); + } else { + let cur = crect(); + eq(cur.y0, 0.); + eq(cur.y1, 10.); + eq(cur.x1, cur.x0 + 10.); + eq(prev.x0 - cur.x0, 5.); + #scope1 coupled_ring(cur, first, n - 1); + } +} + +// Stress axis: number of *coupled* constraints. `constraints(n)` produces a +// single connected constraint component spanning `n + 1` rectangles, forcing +// the general solver to factor an O(n) x O(n) system. +cell constraints(n: Int) { + let first = crect(); + eq(first.y0, 0.); + eq(first.y1, 10.); + eq(first.x1, first.x0 + 10.); + #scope0 coupled_ring(first, first, n); +} diff --git a/examples/stress_hierarchy/lib.ar b/examples/stress_hierarchy/lib.ar new file mode 100644 index 0000000..6784905 --- /dev/null +++ b/examples/stress_hierarchy/lib.ar @@ -0,0 +1,57 @@ +cell h0() { + rect("met1", x0=0., y0=0., x1=10., y1=10.); +} + +// Each level `h{k}` instantiates the level below it (`h{k-1}`) and adds one +// rectangle, building a layout that is `k` cells deep. Compiling `h{k}` therefore +// exercises `k` levels of hierarchy. The instance is bound to a single variable +// (`i`); binding the child cell to an additional variable would cause the +// structural cell type to expand exponentially with depth (see bench/README.md). +cell h1() { + rect("met1", x0=0., y0=0., x1=10., y1=10.); + let i = inst(h0()); + eq(i.x, 0.); + eq(i.y, 10.); +} +cell h2() { + rect("met1", x0=0., y0=0., x1=10., y1=10.); + let i = inst(h1()); + eq(i.x, 0.); + eq(i.y, 10.); +} +cell h3() { + rect("met1", x0=0., y0=0., x1=10., y1=10.); + let i = inst(h2()); + eq(i.x, 0.); + eq(i.y, 10.); +} +cell h4() { + rect("met1", x0=0., y0=0., x1=10., y1=10.); + let i = inst(h3()); + eq(i.x, 0.); + eq(i.y, 10.); +} +cell h5() { + rect("met1", x0=0., y0=0., x1=10., y1=10.); + let i = inst(h4()); + eq(i.x, 0.); + eq(i.y, 10.); +} +cell h6() { + rect("met1", x0=0., y0=0., x1=10., y1=10.); + let i = inst(h5()); + eq(i.x, 0.); + eq(i.y, 10.); +} +cell h7() { + rect("met1", x0=0., y0=0., x1=10., y1=10.); + let i = inst(h6()); + eq(i.x, 0.); + eq(i.y, 10.); +} +cell h8() { + rect("met1", x0=0., y0=0., x1=10., y1=10.); + let i = inst(h7()); + eq(i.x, 0.); + eq(i.y, 10.); +} diff --git a/examples/stress_instances/lib.ar b/examples/stress_instances/lib.ar new file mode 100644 index 0000000..0fbbcaf --- /dev/null +++ b/examples/stress_instances/lib.ar @@ -0,0 +1,26 @@ +cell leaf() { + // A small leaf cell that is instantiated many times below. It is compiled + // exactly once and cached; every instance simply references it. + rect("met1", x0=0., y0=0., x1=10., y1=10.); +} + +fn place(c: Any, n: Int) { + // Recursively create `n` instances of cell `c`, each fully constrained to + // an absolute location. Every instance adds two solver variables (its x/y + // origin) resolved by back-substitution. + #scope0 if n <= 0 { + } else { + let it = inst(c); + eq(it.x, (n as Float) * 20.); + eq(it.y, 0.); + #scope1 place(c, n - 1); + } +} + +// Stress axis: number of cell instances. `instances(n)` places `n` copies of +// the (cached) `leaf` cell, stressing instance bookkeeping and GDS hierarchy +// emission while keeping per-leaf compilation cost constant. +cell instances(n: Int) { + let c = leaf(); + #scope0 place(c, n); +} diff --git a/examples/stress_shapes/lib.ar b/examples/stress_shapes/lib.ar new file mode 100644 index 0000000..0fc644e --- /dev/null +++ b/examples/stress_shapes/lib.ar @@ -0,0 +1,34 @@ +fn emit_shapes(n: Int) { + // Recursively emit `n` independent, fully-constrained rectangles. + // Each rectangle introduces 4 solver variables that are pinned directly + // by their kwargs, so the solver resolves them by back-substitution + // (no dense linear-algebra step is required). + #scope0 if n <= 0 { + } else { + let x = (n as Float) * 10.; + rect("met1", x0=x, y0=0., x1=x + 8., y1=8.); + #scope1 emit_shapes(n - 1); + } +} + +// Stress axis: number of shapes / editable objects in a single cell. +// `shapes(n)` produces a flat cell containing `n` rectangles. +cell shapes(n: Int) { + #scope0 emit_shapes(n); +} + +fn emit_shapes_loop(lst: [Int]) { + // Same geometry as `shapes`, but generated by iterating over a list + // produced by `std::range`. This exercises Argon's functional list + // representation (`cons`) in addition to geometry emission. + for i in lst { + let x = (i as Float) * 10.; + rect("met1", x0=x, y0=0., x1=x + 8., y1=8.); + } +} + +// Loop-based variant used to contrast iteration strategies in the benchmark. +cell shapes_loop(n: Int) { + let lst = #scope0 std::range(n); + #scope1 emit_shapes_loop(lst); +}