diff --git a/bench/README.md b/bench/README.md
new file mode 100644
index 0000000..f8ef905
--- /dev/null
+++ b/bench/README.md
@@ -0,0 +1,163 @@
+# Argon scaling benchmarks
+
+These benchmarks stress the Argon compiler along the axes raised in review —
+**number of shapes**, **number of (coupled) constraints**, **number of cell
+instances**, and **depth of hierarchy** — and record how compile time and peak
+memory scale with each. They exist to answer questions of the form:
+
+> *How does the framework scale to layouts with substantially more hierarchy,
+> more constraints, and a larger number of editable objects?*
+
+The Argon sources that are swept live in [`../examples/`](../examples):
+
+| Example                          | Cell(s)                 | Axis stressed |
+| -------------------------------- | ----------------------- | ------------- |
+| `examples/stress_shapes`         | `shapes(n)`             | `n` independent rectangles in one cell (generated by recursion) |
+| `examples/stress_shapes`         | `shapes_loop(n)`        | the same geometry generated with a `for` loop over `std::range` (also stresses the functional list representation) |
+| `examples/stress_constraints`    | `constraints(n)`        | a ring of `n+1` rectangles whose edges are mutually coupled, forcing the general (dense) linear solver |
+| `examples/stress_instances`      | `instances(n)`          | `n` instances of a single cached leaf cell |
+| `examples/stress_hierarchy`      | `h0 .. h8`              | a chain of cells `h{k}` each instantiating `h{k-1}`; compiling `h{k}` exercises `k` levels of hierarchy |
+
+The benchmark *drivers* are the `bench_*` tests in
+[`../core/compiler/src/lib.rs`](../core/compiler/src/lib.rs). For the
+hierarchy axis the driver generates `h0..h{depth}` workspaces on the fly (a
+single `.ar` file cannot express a runtime-variable depth because Argon cells
+cannot be recursive or forward-referenced).
+
+## Running
+
+The `bench_*` tests are marked `#[ignore]` because the larger sizes take well
+over 6 s in a debug build. Run them in **release**, **serially** (peak-memory
+tracking uses a process-global allocator, so concurrent tests would corrupt the
+measurements):
+
+```bash
+cargo test -p compiler --release -- --ignored --test-threads=1 --nocapture bench_
+```
+
+Each test writes a CSV to `bench/results/<axis>.csv` with columns
+`size,time_s,peak_bytes,n_objects`. Then render the figure (and print a summary
+table of fitted scaling models):
+
+```bash
+python3 bench/plot_scaling.py     # writes bench/argon_scaling.{png,pdf}
+```
+
+`plot_scaling.py` needs only the standard library to print the summary table;
+`matplotlib` is required to draw the figure.
+
+To run a single axis, e.g. just the instance sweep:
+
+```bash
+cargo test -p compiler --release -- --ignored --test-threads=1 --nocapture bench_instances
+```
+
+The fast `stress_*_smoke` tests (which just check that each example still
+compiles) run in the normal `cargo test` suite and are **not** ignored.
+
+### Configuring the sweeps
+
+Every axis reads its list of sizes from an environment variable, falling back
+to a default. This keeps the benchmarks general-purpose: the same test can be
+re-run at a different scale — for example after a compiler optimization changes
+how an axis scales — without editing any source. Pass a comma-separated list:
+
+| Env var | Axis | Default |
+| ------- | ---- | ------- |
+| `ARGON_BENCH_SHAPES`        | shapes (recursion)   | `500,1000,2000,4000,8000,16000,32000` |
+| `ARGON_BENCH_SHAPES_LOOP`   | shapes (`for` loop)  | `250,500,1000,2000` |
+| `ARGON_BENCH_INSTANCES`     | instances            | `500,…,64000` |
+| `ARGON_BENCH_CONSTRAINTS`   | coupled constraints  | `32,64,128,256,512,1024` |
+| `ARGON_BENCH_HIER_SINGLE`   | hierarchy (1 ref)    | `4,8,16,32,48,64,96,128` |
+| `ARGON_BENCH_HIER_DOUBLE`   | hierarchy (2 refs)   | `2,4,6,8,10,12,14,16,18` |
+
+```bash
+# e.g. sweep the for-loop variant out to the same sizes as bench_shapes
+ARGON_BENCH_SHAPES_LOOP=500,1000,2000,4000,8000,16000,32000 \
+  cargo test -p compiler --release -- --ignored --test-threads=1 --nocapture bench_shapes_loop
+```
+
+The defaults are sized so the suite runs in a few minutes within a few GiB on
+the current build; they are not claims about how any axis "should" scale.
+
+## Methodology
+
+- **Time**: minimum wall-clock time over a few repetitions (`min` is robust to
+  noise on a shared machine). Parsing/static analysis is done once per size and
+  excluded from the hierarchy timings; everything else is end-to-end `compile()`.
+- **Memory**: a `#[global_allocator]` compiled only into the test binary
+  (`bench_alloc::Tracking` in `lib.rs`) tracks live and peak heap bytes. We
+  report the peak heap *growth* during a single `compile()`.
+- **Build**: release profile. Numbers below were collected on a Linux machine;
+  absolute values are machine-dependent but the *scaling* is not.
+
+## Results
+
+The numbers below are a **snapshot** from one release build on the development
+machine; they are produced by the commands above and meant to be regenerated
+(absolute values are machine- and build-dependent). `n` is the per-axis size
+parameter; "peak" is peak heap allocated during compilation.
+
+| Axis | largest `n` | time @ largest | peak mem @ largest | empirical scaling |
+| ---- | ----------- | -------------- | ------------------ | ----------------- |
+| Shapes (recursion)           | 32 000 rects   | 1.53 s  | 0.94 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) |
+| Instances                    | 64 000 insts   | 3.14 s  | 1.29 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) |
+| Hierarchy, 1 child ref       | depth 128      | 0.09 s  | 0.12 GiB | **polynomial** (`∝ depth^1.3–1.4`) |
+| Coupled constraints          | 1 024 rects    | 21.7 s  | 0.13 GiB | **super-cubic in time** (see below) |
+| Shapes (`for`-loop)          | 2 000 rects    | 0.59 s  | 4.1 GiB  | **quadratic** (mem `∝ n^2`) |
+| Hierarchy, 2 child refs      | depth 18       | 11.5 s  | 3.6 GiB  | **exponential** (`×1.9` per level) |
+
+### Interpretation
+
+- **Geometry and instances scale linearly.** Compiling a single flat cell with
+  tens of thousands of fully-constrained rectangles, or with tens of thousands
+  of instances of a cached cell, is linear in both time and memory. Each shape
+  contributes 4 solver variables and each instance 2, and because their
+  constraints pin one variable at a time the solver resolves them by
+  back-substitution without ever forming a matrix. This is the common case for
+  real parametric cells and it scales comfortably to "thousands of rectangles".
+
+- **Coupled constraints are the expensive axis.** When constraints form one
+  large connected component that *cannot* be back-substituted (here, a ring of
+  mutually-coupled edges), Argon falls back to its general linear solver, which
+  builds a dense matrix and takes an SVD. The per-doubling cost climbs from ~4×
+  at `n=64→128` to ~15× at `n=512→1024`, i.e. it steepens toward the `O(n^3)`
+  of dense factorization (and worse, because `solve()` is re-run as the system
+  is assembled). This is the "general linear constraint solving (slow)" caveat
+  in the top-level README, quantified: ~1 000 coupled editable variables take
+  ~20 s. Layouts whose constraints decompose into many small independent groups
+  (the typical case) avoid this entirely.
+
+- **Hierarchy depth is limited by the type representation.** A cell's static
+  type (`CellTy`) stores the full structural type of every field, including
+  instantiated sub-cells. If a cell references its child **once** (e.g.
+  `let i = inst(child());`), depth scales polynomially (`~depth^1.4`) and is
+  fine to ~128 levels. If it references the child **twice** (e.g. the
+  `let c = child(); let i = inst(c);` idiom from the tutorial), the type of
+  `h{k}` contains two copies of the type of `h{k-1}`, so the representation —
+  and hence compile time and memory — **doubles with every level** (`×1.9` per
+  level measured). Beyond ~depth 20 this exhausts memory (depth 20 alone needs
+  ~14.5 GiB / 50 s; depth 18 is ~3.6 GiB / 11.5 s, which is where this sweep is
+  capped). Very deep hierarchies additionally hit a native-recursion stack
+  limit in the compiler at a few hundred levels.
+
+- **Recursion vs. iteration measures the list/iteration machinery.** `shapes`
+  and `shapes_loop` emit identical geometry; the only difference is that
+  `shapes_loop` builds and iterates a `std::range` list. On the build measured
+  here that list path is markedly heavier (≈4 GiB to emit 2 000 rectangles via
+  a `for` loop, vs. 32 000 by recursion in under 1 GiB), so the gap between the
+  two series is a direct measure of the cost of the list representation rather
+  than of the geometry or solver. Re-running both series (e.g. with
+  `ARGON_BENCH_SHAPES_LOOP` set to the same sizes as `bench_shapes`) is the way
+  to see that cost change as the iteration/list machinery is optimized.
+
+The takeaways for the paper: editable-object count and instance count scale
+linearly; the practically-relevant limits are the dense general constraint
+solver on large *coupled* systems and structural type expansion on deep
+hierarchies — both of which line up with the future-work items already listed
+in the project README (faster linear constraint solving; incremental
+compilation). The bullets above describe the build at the time of measurement;
+because every axis is re-runnable (and size-configurable), the same harness can
+be used to confirm improvements from compiler optimizations.
+
+![Argon scaling](argon_scaling.png)
diff --git a/bench/argon_scaling.pdf b/bench/argon_scaling.pdf
new file mode 100644
index 0000000..aa37899
Binary files /dev/null and b/bench/argon_scaling.pdf differ
diff --git a/bench/argon_scaling.png b/bench/argon_scaling.png
new file mode 100644
index 0000000..7984513
Binary files /dev/null and b/bench/argon_scaling.png differ
diff --git a/bench/plot_scaling.py b/bench/plot_scaling.py
new file mode 100644
index 0000000..68c29b7
--- /dev/null
+++ b/bench/plot_scaling.py
@@ -0,0 +1,156 @@
+#!/usr/bin/env python3
+"""Plot Argon compile-time and memory scaling from the benchmark CSVs.
+
+The CSVs are produced by the `bench_*` tests in `core/compiler/src/lib.rs`
+(see ../bench/README.md for how to run them). Each CSV has the columns
+
+    size,time_s,peak_bytes,n_objects
+
+where `size` is the swept parameter for that axis (number of shapes, number of
+coupled constraints, number of instances, or hierarchy depth).
+
+Usage:
+    python3 bench/plot_scaling.py                 # reads bench/results/*.csv
+    python3 bench/plot_scaling.py --results DIR --out FILE
+"""
+import argparse
+import csv
+import math
+import os
+import sys
+
+# Series in the order we want them drawn. Each entry is
+#   (csv_basename, display_label, size_unit, model)
+# where `model` is "poly" (fit a power law y ~ n^k) or "exp" (fit y ~ b^n,
+# appropriate for the exponentially-scaling hierarchy variant).
+SERIES = [
+    ("shapes", "Shapes (recursion)", "# rectangles", "poly"),
+    ("shapes_loop", "Shapes (for-loop / cons list)", "# rectangles", "poly"),
+    ("instances", "Instances", "# instances", "poly"),
+    ("constraints", "Coupled constraints", "# coupled rects", "poly"),
+    ("hierarchy_single_ref", "Hierarchy (1 child ref)", "depth", "poly"),
+    ("hierarchy_double_ref", "Hierarchy (2 child refs)", "depth", "exp"),
+]
+
+
+def load(path):
+    xs, ts, ms = [], [], []
+    with open(path, newline="") as f:
+        for row in csv.DictReader(f):
+            xs.append(float(row["size"]))
+            ts.append(float(row["time_s"]))
+            ms.append(float(row["peak_bytes"]))
+    return xs, ts, ms
+
+
+def _slope(pairs):
+    """Least-squares slope of a list of (x, y) points."""
+    n = len(pairs)
+    if n < 2:
+        return float("nan")
+    sx = sum(p[0] for p in pairs)
+    sy = sum(p[1] for p in pairs)
+    sxx = sum(p[0] * p[0] for p in pairs)
+    sxy = sum(p[0] * p[1] for p in pairs)
+    denom = n * sxx - sx * sx
+    if abs(denom) < 1e-12:
+        return float("nan")
+    return (n * sxy - sx * sy) / denom
+
+
+def fit_exponent(xs, ys):
+    """Power-law exponent: slope of log(y) vs log(x)."""
+    return _slope([(math.log(x), math.log(y)) for x, y in zip(xs, ys) if x > 0 and y > 0])
+
+
+def fit_base(xs, ys):
+    """Exponential base b for y ~ b^x: from the slope of log(y) vs x."""
+    s = _slope([(x, math.log(y)) for x, y in zip(xs, ys) if y > 0])
+    return math.exp(s)
+
+
+def describe(model, xs, ys):
+    """Return (legend_suffix, summary_string) for the fitted scaling model."""
+    if model == "exp":
+        b = fit_base(xs, ys)
+        return f"exp., $\\times{b:.1f}$/step", f"exponential (x{b:.2f} per unit)"
+    k = fit_exponent(xs, ys)
+    return f"$\\propto n^{{{k:.1f}}}$", f"~n^{k:.2f}"
+
+
+def main():
+    here = os.path.dirname(os.path.abspath(__file__))
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--results", default=os.path.join(here, "results"))
+    ap.add_argument("--out", default=os.path.join(here, "argon_scaling"))
+    args = ap.parse_args()
+
+    data = {}
+    for key, label, unit, model in SERIES:
+        path = os.path.join(args.results, f"{key}.csv")
+        if os.path.exists(path):
+            xs, ts, ms = load(path)
+            if xs:
+                data[key] = (label, unit, model, xs, ts, ms)
+
+    if not data:
+        sys.exit(
+            f"No benchmark CSVs found in {args.results}.\n"
+            "Run the benchmarks first (see bench/README.md)."
+        )
+
+    # Print a summary table of fitted scaling models.
+    print(f"{'series':<30}{'points':>7}  {'time scaling':<22}{'mem scaling':<22}max(time, mem)")
+    for key, _, _, _ in SERIES:
+        if key not in data:
+            continue
+        label, unit, model, xs, ts, ms = data[key]
+        _, t_desc = describe(model, xs, ts)
+        _, m_desc = describe(model, xs, ms)
+        print(
+            f"{label:<30}{len(xs):>7}  {t_desc:<22}{m_desc:<22}"
+            f"{max(ts):.3f} s / {max(ms) / 2**20:.0f} MiB"
+        )
+
+    try:
+        import matplotlib
+
+        matplotlib.use("Agg")
+        import matplotlib.pyplot as plt
+    except ImportError:
+        sys.exit("\nmatplotlib not installed; printed summary only. `pip install matplotlib` to draw.")
+
+    fig, (ax_t, ax_m) = plt.subplots(1, 2, figsize=(13, 5.2))
+    markers = ["o", "s", "^", "D", "v", "P"]
+    for (key, _, _, _), marker in zip(SERIES, markers):
+        if key not in data:
+            continue
+        label, unit, model, xs, ts, ms = data[key]
+        t_suffix, _ = describe(model, xs, ts)
+        m_suffix, _ = describe(model, xs, ms)
+        ax_t.plot(xs, ts, marker=marker, label=f"{label}  ({t_suffix})")
+        ax_m.plot(xs, [m / 2**20 for m in ms], marker=marker,
+                  label=f"{label}  ({m_suffix})")
+
+    for ax in (ax_t, ax_m):
+        ax.set_xscale("log")
+        ax.set_yscale("log")
+        ax.set_xlabel("problem size $n$ (rectangles / constraints / instances / depth)")
+        ax.grid(True, which="both", ls=":", alpha=0.4)
+
+    ax_t.set_ylabel("compile time (s)")
+    ax_t.set_title("Argon compile-time scaling")
+    ax_m.set_ylabel("peak heap allocated (MiB)")
+    ax_m.set_title("Argon memory scaling")
+    ax_t.legend(fontsize=8, loc="upper left")
+    ax_m.legend(fontsize=8, loc="upper left")
+    fig.tight_layout()
+
+    for ext in ("png", "pdf"):
+        out = f"{args.out}.{ext}"
+        fig.savefig(out, dpi=150, bbox_inches="tight")
+        print(f"wrote {out}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/bench/results/constraints.csv b/bench/results/constraints.csv
new file mode 100644
index 0000000..98cc1d8
--- /dev/null
+++ b/bench/results/constraints.csv
@@ -0,0 +1,7 @@
+size,time_s,peak_bytes,n_objects
+32,0.00383075,2585876,33
+64,0.00726206,3829492,65
+128,0.029626408,6749992,129
+256,0.21347691,15396760,257
+512,1.4369801039999999,42127416,513
+1024,21.745505947,133337720,1025
diff --git a/bench/results/hierarchy_double_ref.csv b/bench/results/hierarchy_double_ref.csv
new file mode 100644
index 0000000..723e66f
--- /dev/null
+++ b/bench/results/hierarchy_double_ref.csv
@@ -0,0 +1,10 @@
+size,time_s,peak_bytes,n_objects
+2,0.000971778,1325410,5
+4,0.001176136,1653373,9
+6,0.001882165,2474335,13
+8,0.004620015,5364013,17
+10,0.015455478,16602635,21
+12,0.088864192,61484713,25
+14,0.481928844,240102311,29
+16,2.506498165,954179013,33
+18,11.538940627,3810189475,37
diff --git a/bench/results/hierarchy_single_ref.csv b/bench/results/hierarchy_single_ref.csv
new file mode 100644
index 0000000..e2fd0f5
--- /dev/null
+++ b/bench/results/hierarchy_single_ref.csv
@@ -0,0 +1,9 @@
+size,time_s,peak_bytes,n_objects
+4,0.000845819,1556216,9
+8,0.001178198,2140495,17
+16,0.002346772,4034443,33
+32,0.005942098,10387267,65
+48,0.012042876,19826107,97
+64,0.022953693,33362240,129
+96,0.052964538,69348808,193
+128,0.090173076,120382868,257
diff --git a/bench/results/instances.csv b/bench/results/instances.csv
new file mode 100644
index 0000000..b5d5cdd
--- /dev/null
+++ b/bench/results/instances.csv
@@ -0,0 +1,9 @@
+size,time_s,peak_bytes,n_objects
+500,0.012283447,11806766,501
+1000,0.02516376,22390998,1001
+2000,0.056507946,43559462,2001
+4000,0.147355775,85896390,4001
+8000,0.310525996,170570230,8001
+16000,0.689046789,339917926,16001
+32000,1.457187325,678613350,32001
+64000,3.140683519,1356004118,64001
diff --git a/bench/results/shapes.csv b/bench/results/shapes.csv
new file mode 100644
index 0000000..b9305b0
--- /dev/null
+++ b/bench/results/shapes.csv
@@ -0,0 +1,8 @@
+size,time_s,peak_bytes,n_objects
+500,0.012574383,16159584,500
+1000,0.028867058,31116160,1000
+2000,0.071651056,61029296,2000
+4000,0.158150608,120855616,4000
+8000,0.329667899,240508160,8000
+16000,0.70516885,479813376,16000
+32000,1.530754693,958423696,32000
diff --git a/bench/results/shapes_loop.csv b/bench/results/shapes_loop.csv
new file mode 100644
index 0000000..90cde88
--- /dev/null
+++ b/bench/results/shapes_loop.csv
@@ -0,0 +1,5 @@
+size,time_s,peak_bytes,n_objects
+250,0.026441618,73227389,250
+500,0.091560351,274248679,500
+1000,0.269871454,1063292012,1000
+2000,0.589680644,4189379419,2000
diff --git a/core/compiler/src/lib.rs b/core/compiler/src/lib.rs
index 389a608..76c16f7 100644
--- a/core/compiler/src/lib.rs
+++ b/core/compiler/src/lib.rs
@@ -7,6 +7,85 @@ pub mod layer;
 pub mod parse;
 pub mod solver;
 
+/// A global allocator that tracks live and peak heap usage so that the scaling
+/// benchmarks in the test module can report memory consumption alongside
+/// runtime. It forwards every request to the system allocator and only adds
+/// atomic byte counters, so behavior is otherwise unchanged.
+///
+/// This allocator is only compiled into the test binary (`cfg(test)`); release
+/// and library builds use the default allocator. The counters are process-wide,
+/// so the benchmarks that read them must be run serially
+/// (`--test-threads=1`); see `bench/README.md`.
+#[cfg(test)]
+mod bench_alloc {
+    use std::alloc::{GlobalAlloc, Layout, System};
+    use std::sync::atomic::{AtomicUsize, Ordering};
+
+    pub static LIVE: AtomicUsize = AtomicUsize::new(0);
+    pub static PEAK: AtomicUsize = AtomicUsize::new(0);
+
+    pub struct Tracking;
+
+    #[inline]
+    fn record_growth(delta: usize) {
+        let live = LIVE.fetch_add(delta, Ordering::Relaxed) + delta;
+        PEAK.fetch_max(live, Ordering::Relaxed);
+    }
+
+    unsafe impl GlobalAlloc for Tracking {
+        unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
+            let ptr = unsafe { System.alloc(layout) };
+            if !ptr.is_null() {
+                record_growth(layout.size());
+            }
+            ptr
+        }
+
+        unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
+            unsafe { System.dealloc(ptr, layout) };
+            LIVE.fetch_sub(layout.size(), Ordering::Relaxed);
+        }
+
+        unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 {
+            let ptr = unsafe { System.alloc_zeroed(layout) };
+            if !ptr.is_null() {
+                record_growth(layout.size());
+            }
+            ptr
+        }
+
+        unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {
+            let new_ptr = unsafe { System.realloc(ptr, layout, new_size) };
+            if !new_ptr.is_null() {
+                if new_size >= layout.size() {
+                    record_growth(new_size - layout.size());
+                } else {
+                    LIVE.fetch_sub(layout.size() - new_size, Ordering::Relaxed);
+                }
+            }
+            new_ptr
+        }
+    }
+
+    /// Resets the peak counter to the current live usage. Call this immediately
+    /// before the region of interest, then read [`peak`] afterwards.
+    pub fn reset_peak() {
+        PEAK.store(LIVE.load(Ordering::Relaxed), Ordering::Relaxed);
+    }
+
+    pub fn live() -> usize {
+        LIVE.load(Ordering::Relaxed)
+    }
+
+    pub fn peak() -> usize {
+        PEAK.load(Ordering::Relaxed)
+    }
+}
+
+#[cfg(test)]
+#[global_allocator]
+static BENCH_ALLOC: bench_alloc::Tracking = bench_alloc::Tracking;
+
 #[cfg(test)]
 mod tests {
 
@@ -74,6 +153,448 @@ mod tests {
     const ARGON_SSE_BASIC: &str = concatcp!(EXAMPLES_DIR, "/sse_basic/lib.ar");
     const ARGON_PRECEDENCE: &str = concatcp!(EXAMPLES_DIR, "/precedence/lib.ar");
 
+    // ---------------------------------------------------------------------
+    // Scaling / stress benchmarks.
+    //
+    // These exercise Argon along the axes raised in review: number of shapes,
+    // number of (coupled) constraints, number of cell instances, and depth of
+    // hierarchy. Each `bench_*` test sweeps a size parameter, records compile
+    // time and peak heap usage, and writes a CSV to `bench/results/` that
+    // `bench/plot_scaling.py` turns into the scaling figure.
+    //
+    // The `bench_*` tests are `#[ignore]`d because the larger sizes take well
+    // over 6 s in a debug build. Run them in release, serially (peak-memory
+    // tracking is process-global), e.g.:
+    //
+    //     RUSTFLAGS=... cargo test -p compiler --release -- \
+    //         --ignored --test-threads=1 bench_
+    //
+    // The `stress_*_smoke` tests below run in the normal (debug) test suite and
+    // just check that each example still compiles.
+    // ---------------------------------------------------------------------
+    const ARGON_STRESS_SHAPES: &str = concatcp!(EXAMPLES_DIR, "/stress_shapes/lib.ar");
+    const ARGON_STRESS_CONSTRAINTS: &str = concatcp!(EXAMPLES_DIR, "/stress_constraints/lib.ar");
+    const ARGON_STRESS_INSTANCES: &str = concatcp!(EXAMPLES_DIR, "/stress_instances/lib.ar");
+    const ARGON_STRESS_HIERARCHY: &str = concatcp!(EXAMPLES_DIR, "/stress_hierarchy/lib.ar");
+
+    use crate::compile::CompileOutput;
+
+    /// Serializes the memory/timing-sensitive benchmarks. Even when the test
+    /// runner is given multiple threads, holding this lock ensures only one
+    /// `bench_*` body runs at a time, so the process-global allocator counters
+    /// and wall-clock timings are not perturbed by a concurrent benchmark.
+    static BENCH_LOCK: std::sync::Mutex<()> = std::sync::Mutex::new(());
+
+    fn bench_guard() -> std::sync::MutexGuard<'static, ()> {
+        // Recover from poisoning: a panic in one benchmark should not wedge the
+        // others, and the lock guards only measurement isolation.
+        BENCH_LOCK.lock().unwrap_or_else(|e| e.into_inner())
+    }
+
+    /// Runs `f` `reps` times, returning the minimum wall-clock time (robust to
+    /// noise on a shared machine), the maximum peak heap growth observed during
+    /// a run, and the result of the final run.
+    fn measure<R>(reps: u32, f: impl Fn() -> R) -> (std::time::Duration, usize, R) {
+        assert!(reps >= 1);
+        let mut best = std::time::Duration::MAX;
+        let mut peak = 0usize;
+        let mut result = None;
+        for _ in 0..reps {
+            // Free the previous run's result so each measurement starts from
+            // the same baseline.
+            drop(result.take());
+            crate::bench_alloc::reset_peak();
+            let base = crate::bench_alloc::live();
+            let start = std::time::Instant::now();
+            let r = f();
+            best = best.min(start.elapsed());
+            peak = peak.max(crate::bench_alloc::peak().saturating_sub(base));
+            result = Some(r);
+        }
+        (best, peak, result.unwrap())
+    }
+
+    fn count_objects(o: &CompileOutput) -> usize {
+        let data = match o {
+            CompileOutput::Valid(d) => Some(d),
+            CompileOutput::ExecErrors(e) => e.output.as_ref(),
+            _ => None,
+        };
+        data.map(|d| d.cells.values().map(|c| c.objects.len()).sum())
+            .unwrap_or(0)
+    }
+
+    fn count_cells(o: &CompileOutput) -> usize {
+        match o {
+            CompileOutput::Valid(d) => d.cells.len(),
+            CompileOutput::ExecErrors(e) => e.output.as_ref().map(|d| d.cells.len()).unwrap_or(0),
+            _ => 0,
+        }
+    }
+
+    /// Sweep sizes for a benchmark axis. Returns `default` unless the named
+    /// environment variable is set to a comma-separated list of sizes, in which
+    /// case that list is used. This keeps the benchmarks general-purpose: the
+    /// same test can be re-run at a larger (or smaller) scale without editing
+    /// the source, e.g. after a compiler optimization changes how an axis
+    /// scales:
+    ///
+    ///     ARGON_BENCH_SHAPES_LOOP=500,1000,2000,4000,8000,16000,32000 \
+    ///         cargo test -p compiler --release -- --ignored --test-threads=1 \
+    ///         --nocapture bench_shapes_loop
+    ///
+    /// The defaults are chosen so the whole suite runs in a few minutes and
+    /// stays within a few GiB on the current build; they are not assumptions
+    /// about how any axis "should" scale.
+    fn bench_sizes(env_var: &str, default: &[i64]) -> Vec<i64> {
+        match std::env::var(env_var) {
+            Ok(s) if !s.trim().is_empty() => s
+                .split(',')
+                .filter_map(|x| x.trim().parse::<i64>().ok())
+                .collect(),
+            _ => default.to_vec(),
+        }
+    }
+
+    fn write_bench_csv(name: &str, rows: &[(f64, f64, usize, usize)]) {
+        use std::fmt::Write;
+        let dir = PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("../../bench/results");
+        std::fs::create_dir_all(&dir).unwrap();
+        let mut s = String::from("size,time_s,peak_bytes,n_objects\n");
+        for (size, t, mem, nobj) in rows {
+            writeln!(s, "{size},{t},{mem},{nobj}").unwrap();
+        }
+        let path = dir.join(format!("{name}.csv"));
+        std::fs::write(&path, s).unwrap();
+        eprintln!("wrote {}", path.display());
+    }
+
+    /// Generates a workspace of `depth + 1` cells `h0..h{depth}` where each
+    /// `h{k}` instantiates `h{k-1}`. With `double_ref = false` the child is
+    /// referenced by a single (instance) binding; with `double_ref = true` the
+    /// child cell is also bound to a `let`, which makes the structural cell type
+    /// of `h{k}` contain two copies of the type of `h{k-1}`.
+    fn gen_hier(depth: usize, double_ref: bool) -> String {
+        let mut s =
+            String::from("cell h0() {\n    rect(\"met1\", x0=0., y0=0., x1=10., y1=10.);\n}\n");
+        for k in 1..=depth {
+            let body = if double_ref {
+                format!("    let child = h{}();\n    let i = inst(child);\n", k - 1)
+            } else {
+                format!("    let i = inst(h{}());\n", k - 1)
+            };
+            s.push_str(&format!(
+                "cell h{k}() {{\n    rect(\"met1\", x0=0., y0=0., x1=10., y1=10.);\n{body}    eq(i.x, 0.);\n    eq(i.y, 10.);\n}}\n",
+            ));
+        }
+        s
+    }
+
+    /// Axis 1: number of independent shapes in a single cell.
+    #[test]
+    #[ignore = "scaling benchmark; run in release, serially: cargo test -p compiler --release -- --ignored --test-threads=1 bench_"]
+    fn bench_shapes() {
+        let _g = bench_guard();
+        let o = parse_workspace_with_std(ARGON_STRESS_SHAPES);
+        assert!(o.static_errors().is_empty(), "{:?}", o.static_errors());
+        let ast = o.ast();
+        let mut rows = Vec::new();
+        for &n in &bench_sizes(
+            "ARGON_BENCH_SHAPES",
+            &[500, 1000, 2000, 4000, 8000, 16000, 32000],
+        ) {
+            let (dt, mem, out) = measure(3, || {
+                compile(
+                    &ast,
+                    CompileInput {
+                        cell: &["shapes"],
+                        args: vec![CellArg::Int(n)],
+                        lyp_file: &PathBuf::from(BASIC_LYP),
+                    },
+                )
+            });
+            assert!(out.is_valid(), "shapes(n={n}) invalid");
+            let nobj = count_objects(&out);
+            eprintln!(
+                "shapes        n={n:>6} objects={nobj:>6} time={dt:>11.3?} peak={:>8.2} MiB",
+                mem as f64 / (1usize << 20) as f64
+            );
+            rows.push((n as f64, dt.as_secs_f64(), mem, nobj));
+        }
+        write_bench_csv("shapes", &rows);
+    }
+
+    /// Axis 1b: the same geometry generated with an idiomatic `for` loop over
+    /// `std::range`, which additionally exercises Argon's functional list
+    /// representation (`cons`). Capped at a smaller size because list
+    /// construction is super-linear.
+    #[test]
+    #[ignore = "scaling benchmark; run in release, serially: cargo test -p compiler --release -- --ignored --test-threads=1 bench_"]
+    fn bench_shapes_loop() {
+        let _g = bench_guard();
+        let o = parse_workspace_with_std(ARGON_STRESS_SHAPES);
+        assert!(o.static_errors().is_empty(), "{:?}", o.static_errors());
+        let ast = o.ast();
+        // This variant generates the same geometry as `bench_shapes` but with a
+        // `for` loop over `std::range`, so its cost also includes building and
+        // iterating the list. The default sweep is kept smaller than
+        // `bench_shapes` only so the default run stays bounded in memory on the
+        // current build; override `ARGON_BENCH_SHAPES_LOOP` to sweep to the same
+        // sizes as `bench_shapes` (e.g. to compare the two after changes to the
+        // list representation).
+        let mut rows = Vec::new();
+        for &n in &bench_sizes("ARGON_BENCH_SHAPES_LOOP", &[250, 500, 1000, 2000]) {
+            let (dt, mem, out) = measure(2, || {
+                compile(
+                    &ast,
+                    CompileInput {
+                        cell: &["shapes_loop"],
+                        args: vec![CellArg::Int(n)],
+                        lyp_file: &PathBuf::from(BASIC_LYP),
+                    },
+                )
+            });
+            assert!(out.is_valid(), "shapes_loop(n={n}) invalid");
+            let nobj = count_objects(&out);
+            eprintln!(
+                "shapes_loop   n={n:>6} objects={nobj:>6} time={dt:>11.3?} peak={:>8.2} MiB",
+                mem as f64 / (1usize << 20) as f64
+            );
+            rows.push((n as f64, dt.as_secs_f64(), mem, nobj));
+        }
+        write_bench_csv("shapes_loop", &rows);
+    }
+
+    /// Axis 2: number of mutually-coupled constraints solved by the general
+    /// (dense) linear-constraint solver.
+    #[test]
+    #[ignore = "scaling benchmark; run in release, serially: cargo test -p compiler --release -- --ignored --test-threads=1 bench_"]
+    fn bench_constraints() {
+        let _g = bench_guard();
+        let o = parse_workspace_with_std(ARGON_STRESS_CONSTRAINTS);
+        assert!(o.static_errors().is_empty(), "{:?}", o.static_errors());
+        let ast = o.ast();
+        let mut rows = Vec::new();
+        for &n in &bench_sizes("ARGON_BENCH_CONSTRAINTS", &[32, 64, 128, 256, 512, 1024]) {
+            let (dt, mem, out) = measure(1, || {
+                compile(
+                    &ast,
+                    CompileInput {
+                        cell: &["constraints"],
+                        args: vec![CellArg::Int(n)],
+                        lyp_file: &PathBuf::from(BASIC_LYP),
+                    },
+                )
+            });
+            assert!(out.is_valid(), "constraints(n={n}) invalid");
+            let nobj = count_objects(&out);
+            eprintln!(
+                "constraints   n={n:>6} objects={nobj:>6} time={dt:>11.3?} peak={:>8.2} MiB",
+                mem as f64 / (1usize << 20) as f64
+            );
+            rows.push((n as f64, dt.as_secs_f64(), mem, nobj));
+        }
+        write_bench_csv("constraints", &rows);
+    }
+
+    /// Axis 3: number of instances of a single (cached) leaf cell.
+    #[test]
+    #[ignore = "scaling benchmark; run in release, serially: cargo test -p compiler --release -- --ignored --test-threads=1 bench_"]
+    fn bench_instances() {
+        let _g = bench_guard();
+        let o = parse_workspace_with_std(ARGON_STRESS_INSTANCES);
+        assert!(o.static_errors().is_empty(), "{:?}", o.static_errors());
+        let ast = o.ast();
+        let mut rows = Vec::new();
+        for &n in &bench_sizes(
+            "ARGON_BENCH_INSTANCES",
+            &[500, 1000, 2000, 4000, 8000, 16000, 32000, 64000],
+        ) {
+            let (dt, mem, out) = measure(3, || {
+                compile(
+                    &ast,
+                    CompileInput {
+                        cell: &["instances"],
+                        args: vec![CellArg::Int(n)],
+                        lyp_file: &PathBuf::from(BASIC_LYP),
+                    },
+                )
+            });
+            assert!(out.is_valid(), "instances(n={n}) invalid");
+            let nobj = count_objects(&out);
+            eprintln!(
+                "instances     n={n:>6} objects={nobj:>6} time={dt:>11.3?} peak={:>8.2} MiB",
+                mem as f64 / (1usize << 20) as f64
+            );
+            rows.push((n as f64, dt.as_secs_f64(), mem, nobj));
+        }
+        write_bench_csv("instances", &rows);
+    }
+
+    /// Axis 4: depth of cell hierarchy. Two series are produced: `single_ref`
+    /// references each child once (polynomial), and `double_ref` references it
+    /// twice, which triggers exponential structural-type expansion.
+    #[test]
+    #[ignore = "scaling benchmark; run in release, serially: cargo test -p compiler --release -- --ignored --test-threads=1 bench_"]
+    fn bench_hierarchy() {
+        let _g = bench_guard();
+        let dir = PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("build/bench_hier");
+        std::fs::create_dir_all(&dir).unwrap();
+        let lib = dir.join("lib.ar");
+
+        let mut rows = Vec::new();
+        for depth in bench_sizes("ARGON_BENCH_HIER_SINGLE", &[4, 8, 16, 32, 48, 64, 96, 128])
+            .into_iter()
+            .map(|d| d as usize)
+        {
+            std::fs::write(&lib, gen_hier(depth, false)).unwrap();
+            let o = parse_workspace_with_std(&lib);
+            assert!(o.static_errors().is_empty(), "{:?}", o.static_errors());
+            let ast = o.ast();
+            let cellname = format!("h{depth}");
+            let (dt, mem, out) = measure(2, || {
+                compile(
+                    &ast,
+                    CompileInput {
+                        cell: &[&cellname],
+                        args: vec![],
+                        lyp_file: &PathBuf::from(BASIC_LYP),
+                    },
+                )
+            });
+            assert!(out.is_valid(), "hierarchy single-ref depth={depth} invalid");
+            let nobj = count_objects(&out);
+            eprintln!(
+                "hier(1 ref)   depth={depth:>4} cells={:>4} time={dt:>11.3?} peak={:>8.2} MiB",
+                count_cells(&out),
+                mem as f64 / (1usize << 20) as f64
+            );
+            rows.push((depth as f64, dt.as_secs_f64(), mem, nobj));
+        }
+        write_bench_csv("hierarchy_single_ref", &rows);
+
+        // `double_ref` binds the child cell twice, which (on the current build)
+        // makes the structural cell type grow quickly with depth, so the
+        // default sweep is kept shallow to stay within a few GiB. Override
+        // `ARGON_BENCH_HIER_DOUBLE` to push deeper.
+        let mut rows = Vec::new();
+        for depth in bench_sizes("ARGON_BENCH_HIER_DOUBLE", &[2, 4, 6, 8, 10, 12, 14, 16, 18])
+            .into_iter()
+            .map(|d| d as usize)
+        {
+            std::fs::write(&lib, gen_hier(depth, true)).unwrap();
+            let o = parse_workspace_with_std(&lib);
+            assert!(o.static_errors().is_empty(), "{:?}", o.static_errors());
+            let ast = o.ast();
+            let cellname = format!("h{depth}");
+            let (dt, mem, out) = measure(1, || {
+                compile(
+                    &ast,
+                    CompileInput {
+                        cell: &[&cellname],
+                        args: vec![],
+                        lyp_file: &PathBuf::from(BASIC_LYP),
+                    },
+                )
+            });
+            assert!(out.is_valid(), "hierarchy double-ref depth={depth} invalid");
+            let nobj = count_objects(&out);
+            eprintln!(
+                "hier(2 refs)  depth={depth:>4} time={dt:>11.3?} peak={:>8.2} MiB",
+                mem as f64 / (1usize << 20) as f64
+            );
+            rows.push((depth as f64, dt.as_secs_f64(), mem, nobj));
+        }
+        write_bench_csv("hierarchy_double_ref", &rows);
+    }
+
+    // --- Smoke tests (run in the normal suite; keep these fast) ---
+
+    #[test]
+    fn stress_shapes_smoke() {
+        let o = parse_workspace_with_std(ARGON_STRESS_SHAPES);
+        assert!(o.static_errors().is_empty(), "{:?}", o.static_errors());
+        let ast = o.ast();
+        for cell in ["shapes", "shapes_loop"] {
+            let out = compile(
+                &ast,
+                CompileInput {
+                    cell: &[cell],
+                    args: vec![CellArg::Int(64)],
+                    lyp_file: &PathBuf::from(BASIC_LYP),
+                },
+            );
+            let d = out.unwrap_valid();
+            let nrects = d
+                .cells
+                .values()
+                .flat_map(|c| c.objects.values())
+                .filter(|o| matches!(o, SolvedValue::Rect(r) if !r.construction))
+                .count();
+            assert_eq!(nrects, 64, "{cell} should emit 64 rectangles");
+        }
+    }
+
+    #[test]
+    fn stress_constraints_smoke() {
+        let o = parse_workspace_with_std(ARGON_STRESS_CONSTRAINTS);
+        assert!(o.static_errors().is_empty(), "{:?}", o.static_errors());
+        let ast = o.ast();
+        let out = compile(
+            &ast,
+            CompileInput {
+                cell: &["constraints"],
+                args: vec![CellArg::Int(32)],
+                lyp_file: &PathBuf::from(BASIC_LYP),
+            },
+        );
+        assert!(
+            out.is_valid(),
+            "constraints ring should be fully determined: {out:?}"
+        );
+    }
+
+    #[test]
+    fn stress_instances_smoke() {
+        let o = parse_workspace_with_std(ARGON_STRESS_INSTANCES);
+        assert!(o.static_errors().is_empty(), "{:?}", o.static_errors());
+        let ast = o.ast();
+        let out = compile(
+            &ast,
+            CompileInput {
+                cell: &["instances"],
+                args: vec![CellArg::Int(64)],
+                lyp_file: &PathBuf::from(BASIC_LYP),
+            },
+        );
+        let d = out.unwrap_valid();
+        let ninsts = d
+            .cells
+            .values()
+            .flat_map(|c| c.objects.values())
+            .filter(|o| matches!(o, SolvedValue::Instance(_)))
+            .count();
+        assert_eq!(ninsts, 64, "instances(64) should place 64 instances");
+    }
+
+    #[test]
+    fn stress_hierarchy_smoke() {
+        let o = parse_workspace_with_std(ARGON_STRESS_HIERARCHY);
+        assert!(o.static_errors().is_empty(), "{:?}", o.static_errors());
+        let ast = o.ast();
+        let out = compile(
+            &ast,
+            CompileInput {
+                cell: &["h8"],
+                args: vec![],
+                lyp_file: &PathBuf::from(BASIC_LYP),
+            },
+        );
+        let d = out.unwrap_valid();
+        // h0..h8 = 9 cells of hierarchy.
+        assert_eq!(d.cells.len(), 9, "h8 should instantiate 9 cells deep");
+    }
+
     #[test]
     fn argon_scopes() {
         let o = parse_workspace_with_std(ARGON_SCOPES);
diff --git a/examples/stress_constraints/lib.ar b/examples/stress_constraints/lib.ar
new file mode 100644
index 0000000..d8c677e
--- /dev/null
+++ b/examples/stress_constraints/lib.ar
@@ -0,0 +1,32 @@
+fn coupled_ring(prev: Rect, first: Rect, n: Int) {
+    // Build a *ring* of rectangles whose left edges are mutually coupled.
+    // Each step adds a rectangle `cur` and a two-variable difference
+    // constraint `prev.x0 - cur.x0 = 5` relating it to the previous one.
+    // Because every constraint involves two unknowns, none can be resolved
+    // by back-substitution: the whole ring must be solved simultaneously by
+    // the general linear-constraint solver (a dense SVD over the coupled
+    // component). The base case closes the ring with a single *sum*
+    // constraint, which makes the otherwise-underconstrained chain fully
+    // determined regardless of `n`.
+    #scope0 if n <= 0 {
+        eq(prev.x0 + first.x0, 100.);
+    } else {
+        let cur = crect();
+        eq(cur.y0, 0.);
+        eq(cur.y1, 10.);
+        eq(cur.x1, cur.x0 + 10.);
+        eq(prev.x0 - cur.x0, 5.);
+        #scope1 coupled_ring(cur, first, n - 1);
+    }
+}
+
+// Stress axis: number of *coupled* constraints. `constraints(n)` produces a
+// single connected constraint component spanning `n + 1` rectangles, forcing
+// the general solver to factor an O(n) x O(n) system.
+cell constraints(n: Int) {
+    let first = crect();
+    eq(first.y0, 0.);
+    eq(first.y1, 10.);
+    eq(first.x1, first.x0 + 10.);
+    #scope0 coupled_ring(first, first, n);
+}
diff --git a/examples/stress_hierarchy/lib.ar b/examples/stress_hierarchy/lib.ar
new file mode 100644
index 0000000..6784905
--- /dev/null
+++ b/examples/stress_hierarchy/lib.ar
@@ -0,0 +1,57 @@
+cell h0() {
+    rect("met1", x0=0., y0=0., x1=10., y1=10.);
+}
+
+// Each level `h{k}` instantiates the level below it (`h{k-1}`) and adds one
+// rectangle, building a layout that is `k` cells deep. Compiling `h{k}` therefore
+// exercises `k` levels of hierarchy. The instance is bound to a single variable
+// (`i`); binding the child cell to an additional variable would cause the
+// structural cell type to expand exponentially with depth (see bench/README.md).
+cell h1() {
+    rect("met1", x0=0., y0=0., x1=10., y1=10.);
+    let i = inst(h0());
+    eq(i.x, 0.);
+    eq(i.y, 10.);
+}
+cell h2() {
+    rect("met1", x0=0., y0=0., x1=10., y1=10.);
+    let i = inst(h1());
+    eq(i.x, 0.);
+    eq(i.y, 10.);
+}
+cell h3() {
+    rect("met1", x0=0., y0=0., x1=10., y1=10.);
+    let i = inst(h2());
+    eq(i.x, 0.);
+    eq(i.y, 10.);
+}
+cell h4() {
+    rect("met1", x0=0., y0=0., x1=10., y1=10.);
+    let i = inst(h3());
+    eq(i.x, 0.);
+    eq(i.y, 10.);
+}
+cell h5() {
+    rect("met1", x0=0., y0=0., x1=10., y1=10.);
+    let i = inst(h4());
+    eq(i.x, 0.);
+    eq(i.y, 10.);
+}
+cell h6() {
+    rect("met1", x0=0., y0=0., x1=10., y1=10.);
+    let i = inst(h5());
+    eq(i.x, 0.);
+    eq(i.y, 10.);
+}
+cell h7() {
+    rect("met1", x0=0., y0=0., x1=10., y1=10.);
+    let i = inst(h6());
+    eq(i.x, 0.);
+    eq(i.y, 10.);
+}
+cell h8() {
+    rect("met1", x0=0., y0=0., x1=10., y1=10.);
+    let i = inst(h7());
+    eq(i.x, 0.);
+    eq(i.y, 10.);
+}
diff --git a/examples/stress_instances/lib.ar b/examples/stress_instances/lib.ar
new file mode 100644
index 0000000..0fbbcaf
--- /dev/null
+++ b/examples/stress_instances/lib.ar
@@ -0,0 +1,26 @@
+cell leaf() {
+    // A small leaf cell that is instantiated many times below. It is compiled
+    // exactly once and cached; every instance simply references it.
+    rect("met1", x0=0., y0=0., x1=10., y1=10.);
+}
+
+fn place(c: Any, n: Int) {
+    // Recursively create `n` instances of cell `c`, each fully constrained to
+    // an absolute location. Every instance adds two solver variables (its x/y
+    // origin) resolved by back-substitution.
+    #scope0 if n <= 0 {
+    } else {
+        let it = inst(c);
+        eq(it.x, (n as Float) * 20.);
+        eq(it.y, 0.);
+        #scope1 place(c, n - 1);
+    }
+}
+
+// Stress axis: number of cell instances. `instances(n)` places `n` copies of
+// the (cached) `leaf` cell, stressing instance bookkeeping and GDS hierarchy
+// emission while keeping per-leaf compilation cost constant.
+cell instances(n: Int) {
+    let c = leaf();
+    #scope0 place(c, n);
+}
diff --git a/examples/stress_shapes/lib.ar b/examples/stress_shapes/lib.ar
new file mode 100644
index 0000000..0fc644e
--- /dev/null
+++ b/examples/stress_shapes/lib.ar
@@ -0,0 +1,34 @@
+fn emit_shapes(n: Int) {
+    // Recursively emit `n` independent, fully-constrained rectangles.
+    // Each rectangle introduces 4 solver variables that are pinned directly
+    // by their kwargs, so the solver resolves them by back-substitution
+    // (no dense linear-algebra step is required).
+    #scope0 if n <= 0 {
+    } else {
+        let x = (n as Float) * 10.;
+        rect("met1", x0=x, y0=0., x1=x + 8., y1=8.);
+        #scope1 emit_shapes(n - 1);
+    }
+}
+
+// Stress axis: number of shapes / editable objects in a single cell.
+// `shapes(n)` produces a flat cell containing `n` rectangles.
+cell shapes(n: Int) {
+    #scope0 emit_shapes(n);
+}
+
+fn emit_shapes_loop(lst: [Int]) {
+    // Same geometry as `shapes`, but generated by iterating over a list
+    // produced by `std::range`. This exercises Argon's functional list
+    // representation (`cons`) in addition to geometry emission.
+    for i in lst {
+        let x = (i as Float) * 10.;
+        rect("met1", x0=x, y0=0., x1=x + 8., y1=8.);
+    }
+}
+
+// Loop-based variant used to contrast iteration strategies in the benchmark.
+cell shapes_loop(n: Int) {
+    let lst = #scope0 std::range(n);
+    #scope1 emit_shapes_loop(lst);
+}