ucb-substrate · rahulk29 · Jun 13, 2026 · Jun 13, 2026
diff --git a/bench/README.md b/bench/README.md
@@ -0,0 +1,163 @@
+# Argon scaling benchmarks
+
+These benchmarks stress the Argon compiler along the axes raised in review —
+**number of shapes**, **number of (coupled) constraints**, **number of cell
+instances**, and **depth of hierarchy** — and record how compile time and peak
+memory scale with each. They exist to answer questions of the form:
+
+> *How does the framework scale to layouts with substantially more hierarchy,
+> more constraints, and a larger number of editable objects?*
+
+The Argon sources that are swept live in [`../examples/`](../examples):
+
+| Example                          | Cell(s)                 | Axis stressed |
+| -------------------------------- | ----------------------- | ------------- |
+| `examples/stress_shapes`         | `shapes(n)`             | `n` independent rectangles in one cell (generated by recursion) |
+| `examples/stress_shapes`         | `shapes_loop(n)`        | the same geometry generated with a `for` loop over `std::range` (also stresses the functional list representation) |
+| `examples/stress_constraints`    | `constraints(n)`        | a ring of `n+1` rectangles whose edges are mutually coupled, forcing the general (dense) linear solver |
+| `examples/stress_instances`      | `instances(n)`          | `n` instances of a single cached leaf cell |
+| `examples/stress_hierarchy`      | `h0 .. h8`              | a chain of cells `h{k}` each instantiating `h{k-1}`; compiling `h{k}` exercises `k` levels of hierarchy |
+
+The benchmark *drivers* are the `bench_*` tests in
+[`../core/compiler/src/lib.rs`](../core/compiler/src/lib.rs). For the
+hierarchy axis the driver generates `h0..h{depth}` workspaces on the fly (a
+single `.ar` file cannot express a runtime-variable depth because Argon cells
+cannot be recursive or forward-referenced).
+
+## Running
+
+The `bench_*` tests are marked `#[ignore]` because the larger sizes take well
+over 6 s in a debug build. Run them in **release**, **serially** (peak-memory
+tracking uses a process-global allocator, so concurrent tests would corrupt the
+measurements):
+
+```bash
+cargo test -p compiler --release -- --ignored --test-threads=1 --nocapture bench_
+```
+
+Each test writes a CSV to `bench/results/<axis>.csv` with columns
+`size,time_s,peak_bytes,n_objects`. Then render the figure (and print a summary
+table of fitted scaling models):
+
+```bash
+python3 bench/plot_scaling.py     # writes bench/argon_scaling.{png,pdf}
+```
+
+`plot_scaling.py` needs only the standard library to print the summary table;
+`matplotlib` is required to draw the figure.
+
+To run a single axis, e.g. just the instance sweep:
+
+```bash
+cargo test -p compiler --release -- --ignored --test-threads=1 --nocapture bench_instances
+```
+
+The fast `stress_*_smoke` tests (which just check that each example still
+compiles) run in the normal `cargo test` suite and are **not** ignored.
+
+### Configuring the sweeps
+
+Every axis reads its list of sizes from an environment variable, falling back
+to a default. This keeps the benchmarks general-purpose: the same test can be
+re-run at a different scale — for example after a compiler optimization changes
+how an axis scales — without editing any source. Pass a comma-separated list:
+
+| Env var | Axis | Default |
+| ------- | ---- | ------- |
+| `ARGON_BENCH_SHAPES`        | shapes (recursion)   | `500,1000,2000,4000,8000,16000,32000` |
+| `ARGON_BENCH_SHAPES_LOOP`   | shapes (`for` loop)  | `250,500,1000,2000` |
+| `ARGON_BENCH_INSTANCES`     | instances            | `500,…,64000` |
+| `ARGON_BENCH_CONSTRAINTS`   | coupled constraints  | `32,64,128,256,512,1024` |
+| `ARGON_BENCH_HIER_SINGLE`   | hierarchy (1 ref)    | `4,8,16,32,48,64,96,128` |
+| `ARGON_BENCH_HIER_DOUBLE`   | hierarchy (2 refs)   | `2,4,6,8,10,12,14,16,18` |
+
+```bash
+# e.g. sweep the for-loop variant out to the same sizes as bench_shapes
+ARGON_BENCH_SHAPES_LOOP=500,1000,2000,4000,8000,16000,32000 \
+  cargo test -p compiler --release -- --ignored --test-threads=1 --nocapture bench_shapes_loop
+```
+
+The defaults are sized so the suite runs in a few minutes within a few GiB on
+the current build; they are not claims about how any axis "should" scale.
+
+## Methodology
+
+- **Time**: minimum wall-clock time over a few repetitions (`min` is robust to
+  noise on a shared machine). Parsing/static analysis is done once per size and
+  excluded from the hierarchy timings; everything else is end-to-end `compile()`.
+- **Memory**: a `#[global_allocator]` compiled only into the test binary
+  (`bench_alloc::Tracking` in `lib.rs`) tracks live and peak heap bytes. We
+  report the peak heap *growth* during a single `compile()`.
+- **Build**: release profile. Numbers below were collected on a Linux machine;
+  absolute values are machine-dependent but the *scaling* is not.
+
+## Results
+
+The numbers below are a **snapshot** from one release build on the development
+machine; they are produced by the commands above and meant to be regenerated
+(absolute values are machine- and build-dependent). `n` is the per-axis size
+parameter; "peak" is peak heap allocated during compilation.
+
+| Axis | largest `n` | time @ largest | peak mem @ largest | empirical scaling |
+| ---- | ----------- | -------------- | ------------------ | ----------------- |
+| Shapes (recursion)           | 32 000 rects   | 1.53 s  | 0.94 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) |
+| Instances                    | 64 000 insts   | 3.14 s  | 1.29 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) |
+| Hierarchy, 1 child ref       | depth 128      | 0.09 s  | 0.12 GiB | **polynomial** (`∝ depth^1.3–1.4`) |
+| Coupled constraints          | 1 024 rects    | 21.7 s  | 0.13 GiB | **super-cubic in time** (see below) |
+| Shapes (`for`-loop)          | 2 000 rects    | 0.59 s  | 4.1 GiB  | **quadratic** (mem `∝ n^2`) |
+| Hierarchy, 2 child refs      | depth 18       | 11.5 s  | 3.6 GiB  | **exponential** (`×1.9` per level) |
+
+### Interpretation
+
+- **Geometry and instances scale linearly.** Compiling a single flat cell with
+  tens of thousands of fully-constrained rectangles, or with tens of thousands
+  of instances of a cached cell, is linear in both time and memory. Each shape
+  contributes 4 solver variables and each instance 2, and because their
+  constraints pin one variable at a time the solver resolves them by
+  back-substitution without ever forming a matrix. This is the common case for
+  real parametric cells and it scales comfortably to "thousands of rectangles".
+
+- **Coupled constraints are the expensive axis.** When constraints form one
+  large connected component that *cannot* be back-substituted (here, a ring of
+  mutually-coupled edges), Argon falls back to its general linear solver, which
+  builds a dense matrix and takes an SVD. The per-doubling cost climbs from ~4×
+  at `n=64→128` to ~15× at `n=512→1024`, i.e. it steepens toward the `O(n^3)`
+  of dense factorization (and worse, because `solve()` is re-run as the system
+  is assembled). This is the "general linear constraint solving (slow)" caveat
+  in the top-level README, quantified: ~1 000 coupled editable variables take
+  ~20 s. Layouts whose constraints decompose into many small independent groups
+  (the typical case) avoid this entirely.
+
+- **Hierarchy depth is limited by the type representation.** A cell's static
+  type (`CellTy`) stores the full structural type of every field, including
+  instantiated sub-cells. If a cell references its child **once** (e.g.
+  `let i = inst(child());`), depth scales polynomially (`~depth^1.4`) and is
+  fine to ~128 levels. If it references the child **twice** (e.g. the
+  `let c = child(); let i = inst(c);` idiom from the tutorial), the type of
+  `h{k}` contains two copies of the type of `h{k-1}`, so the representation —
+  and hence compile time and memory — **doubles with every level** (`×1.9` per
+  level measured). Beyond ~depth 20 this exhausts memory (depth 20 alone needs
+  ~14.5 GiB / 50 s; depth 18 is ~3.6 GiB / 11.5 s, which is where this sweep is
+  capped). Very deep hierarchies additionally hit a native-recursion stack
+  limit in the compiler at a few hundred levels.
+
+- **Recursion vs. iteration measures the list/iteration machinery.** `shapes`
+  and `shapes_loop` emit identical geometry; the only difference is that
+  `shapes_loop` builds and iterates a `std::range` list. On the build measured
+  here that list path is markedly heavier (≈4 GiB to emit 2 000 rectangles via
+  a `for` loop, vs. 32 000 by recursion in under 1 GiB), so the gap between the
+  two series is a direct measure of the cost of the list representation rather
+  than of the geometry or solver. Re-running both series (e.g. with
+  `ARGON_BENCH_SHAPES_LOOP` set to the same sizes as `bench_shapes`) is the way
+  to see that cost change as the iteration/list machinery is optimized.
+
+The takeaways for the paper: editable-object count and instance count scale
+linearly; the practically-relevant limits are the dense general constraint
+solver on large *coupled* systems and structural type expansion on deep
+hierarchies — both of which line up with the future-work items already listed
+in the project README (faster linear constraint solving; incremental
+compilation). The bullets above describe the build at the time of measurement;
+because every axis is re-runnable (and size-configurable), the same harness can
+be used to confirm improvements from compiler optimizations.
+
+![Argon scaling](argon_scaling.png)
diff --git a/bench/argon_scaling.pdf b/bench/argon_scaling.pdf
diff --git a/bench/argon_scaling.png b/bench/argon_scaling.png
diff --git a/bench/plot_scaling.py b/bench/plot_scaling.py
@@ -0,0 +1,156 @@
+#!/usr/bin/env python3
+"""Plot Argon compile-time and memory scaling from the benchmark CSVs.
+
+The CSVs are produced by the `bench_*` tests in `core/compiler/src/lib.rs`
+(see ../bench/README.md for how to run them). Each CSV has the columns
+
+    size,time_s,peak_bytes,n_objects
+
+where `size` is the swept parameter for that axis (number of shapes, number of
+coupled constraints, number of instances, or hierarchy depth).
+
+Usage:
+    python3 bench/plot_scaling.py                 # reads bench/results/*.csv
+    python3 bench/plot_scaling.py --results DIR --out FILE
+"""
+import argparse
+import csv
+import math
+import os
+import sys
+
+# Series in the order we want them drawn. Each entry is
+#   (csv_basename, display_label, size_unit, model)
+# where `model` is "poly" (fit a power law y ~ n^k) or "exp" (fit y ~ b^n,
+# appropriate for the exponentially-scaling hierarchy variant).
+SERIES = [
+    ("shapes", "Shapes (recursion)", "# rectangles", "poly"),
+    ("shapes_loop", "Shapes (for-loop / cons list)", "# rectangles", "poly"),
+    ("instances", "Instances", "# instances", "poly"),
+    ("constraints", "Coupled constraints", "# coupled rects", "poly"),
+    ("hierarchy_single_ref", "Hierarchy (1 child ref)", "depth", "poly"),
+    ("hierarchy_double_ref", "Hierarchy (2 child refs)", "depth", "exp"),
+]
+
+
+def load(path):
+    xs, ts, ms = [], [], []
+    with open(path, newline="") as f:
+        for row in csv.DictReader(f):
+            xs.append(float(row["size"]))
+            ts.append(float(row["time_s"]))
+            ms.append(float(row["peak_bytes"]))
+    return xs, ts, ms
+
+
+def _slope(pairs):
+    """Least-squares slope of a list of (x, y) points."""
+    n = len(pairs)
+    if n < 2:
+        return float("nan")
+    sx = sum(p[0] for p in pairs)
+    sy = sum(p[1] for p in pairs)
+    sxx = sum(p[0] * p[0] for p in pairs)
+    sxy = sum(p[0] * p[1] for p in pairs)
+    denom = n * sxx - sx * sx
+    if abs(denom) < 1e-12:
+        return float("nan")
+    return (n * sxy - sx * sy) / denom
+
+
+def fit_exponent(xs, ys):
+    """Power-law exponent: slope of log(y) vs log(x)."""
+    return _slope([(math.log(x), math.log(y)) for x, y in zip(xs, ys) if x > 0 and y > 0])
+
+
+def fit_base(xs, ys):
+    """Exponential base b for y ~ b^x: from the slope of log(y) vs x."""
+    s = _slope([(x, math.log(y)) for x, y in zip(xs, ys) if y > 0])
+    return math.exp(s)
+
+
+def describe(model, xs, ys):
+    """Return (legend_suffix, summary_string) for the fitted scaling model."""
+    if model == "exp":
+        b = fit_base(xs, ys)
+        return f"exp., $\\times{b:.1f}$/step", f"exponential (x{b:.2f} per unit)"
+    k = fit_exponent(xs, ys)
+    return f"$\\propto n^{{{k:.1f}}}$", f"~n^{k:.2f}"
+
+
+def main():
+    here = os.path.dirname(os.path.abspath(__file__))
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--results", default=os.path.join(here, "results"))
+    ap.add_argument("--out", default=os.path.join(here, "argon_scaling"))
+    args = ap.parse_args()
+
+    data = {}
+    for key, label, unit, model in SERIES:
+        path = os.path.join(args.results, f"{key}.csv")
+        if os.path.exists(path):
+            xs, ts, ms = load(path)
+            if xs:
+                data[key] = (label, unit, model, xs, ts, ms)
+
+    if not data:
+        sys.exit(
+            f"No benchmark CSVs found in {args.results}.\n"
+            "Run the benchmarks first (see bench/README.md)."
+        )
+
+    # Print a summary table of fitted scaling models.
+    print(f"{'series':<30}{'points':>7}  {'time scaling':<22}{'mem scaling':<22}max(time, mem)")
+    for key, _, _, _ in SERIES:
+        if key not in data:
+            continue
+        label, unit, model, xs, ts, ms = data[key]
+        _, t_desc = describe(model, xs, ts)
+        _, m_desc = describe(model, xs, ms)
+        print(
+            f"{label:<30}{len(xs):>7}  {t_desc:<22}{m_desc:<22}"
+            f"{max(ts):.3f} s / {max(ms) / 2**20:.0f} MiB"
+        )
+
+    try:
+        import matplotlib
+
+        matplotlib.use("Agg")
+        import matplotlib.pyplot as plt
+    except ImportError:
+        sys.exit("\nmatplotlib not installed; printed summary only. `pip install matplotlib` to draw.")
+
+    fig, (ax_t, ax_m) = plt.subplots(1, 2, figsize=(13, 5.2))
+    markers = ["o", "s", "^", "D", "v", "P"]
+    for (key, _, _, _), marker in zip(SERIES, markers):
+        if key not in data:
+            continue
+        label, unit, model, xs, ts, ms = data[key]
+        t_suffix, _ = describe(model, xs, ts)
+        m_suffix, _ = describe(model, xs, ms)
+        ax_t.plot(xs, ts, marker=marker, label=f"{label}  ({t_suffix})")
+        ax_m.plot(xs, [m / 2**20 for m in ms], marker=marker,
+                  label=f"{label}  ({m_suffix})")
+
+    for ax in (ax_t, ax_m):
+        ax.set_xscale("log")
+        ax.set_yscale("log")
+        ax.set_xlabel("problem size $n$ (rectangles / constraints / instances / depth)")
+        ax.grid(True, which="both", ls=":", alpha=0.4)
+
+    ax_t.set_ylabel("compile time (s)")
+    ax_t.set_title("Argon compile-time scaling")
+    ax_m.set_ylabel("peak heap allocated (MiB)")
+    ax_m.set_title("Argon memory scaling")
+    ax_t.legend(fontsize=8, loc="upper left")
+    ax_m.legend(fontsize=8, loc="upper left")
+    fig.tight_layout()
+
+    for ext in ("png", "pdf"):
+        out = f"{args.out}.{ext}"
+        fig.savefig(out, dpi=150, bbox_inches="tight")
+        print(f"wrote {out}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/bench/results/constraints.csv b/bench/results/constraints.csv
@@ -0,0 +1,7 @@
+size,time_s,peak_bytes,n_objects
+32,0.00383075,2585876,33
+64,0.00726206,3829492,65
+128,0.029626408,6749992,129
+256,0.21347691,15396760,257
+512,1.4369801039999999,42127416,513
+1024,21.745505947,133337720,1025
diff --git a/bench/results/hierarchy_double_ref.csv b/bench/results/hierarchy_double_ref.csv
@@ -0,0 +1,10 @@
+size,time_s,peak_bytes,n_objects
+2,0.000971778,1325410,5
+4,0.001176136,1653373,9
+6,0.001882165,2474335,13
+8,0.004620015,5364013,17
+10,0.015455478,16602635,21
+12,0.088864192,61484713,25
+14,0.481928844,240102311,29
+16,2.506498165,954179013,33
+18,11.538940627,3810189475,37
diff --git a/bench/results/hierarchy_single_ref.csv b/bench/results/hierarchy_single_ref.csv
@@ -0,0 +1,9 @@
+size,time_s,peak_bytes,n_objects
+4,0.000845819,1556216,9
+8,0.001178198,2140495,17
+16,0.002346772,4034443,33
+32,0.005942098,10387267,65
+48,0.012042876,19826107,97
+64,0.022953693,33362240,129
+96,0.052964538,69348808,193
+128,0.090173076,120382868,257
diff --git a/bench/results/instances.csv b/bench/results/instances.csv
@@ -0,0 +1,9 @@
+size,time_s,peak_bytes,n_objects
+500,0.012283447,11806766,501
+1000,0.02516376,22390998,1001
+2000,0.056507946,43559462,2001
+4000,0.147355775,85896390,4001
+8000,0.310525996,170570230,8001
+16000,0.689046789,339917926,16001
+32000,1.457187325,678613350,32001
+64000,3.140683519,1356004118,64001
diff --git a/bench/results/shapes.csv b/bench/results/shapes.csv
@@ -0,0 +1,8 @@
+size,time_s,peak_bytes,n_objects
+500,0.012574383,16159584,500
+1000,0.028867058,31116160,1000
+2000,0.071651056,61029296,2000
+4000,0.158150608,120855616,4000
+8000,0.329667899,240508160,8000
+16000,0.70516885,479813376,16000
+32000,1.530754693,958423696,32000
diff --git a/bench/results/shapes_loop.csv b/bench/results/shapes_loop.csv
@@ -0,0 +1,5 @@
+size,time_s,peak_bytes,n_objects
+250,0.026441618,73227389,250
+500,0.091560351,274248679,500
+1000,0.269871454,1063292012,1000
+2000,0.589680644,4189379419,2000