Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 163 additions & 0 deletions bench/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# Argon scaling benchmarks

These benchmarks stress the Argon compiler along the axes raised in review —
**number of shapes**, **number of (coupled) constraints**, **number of cell
instances**, and **depth of hierarchy** — and record how compile time and peak
memory scale with each. They exist to answer questions of the form:

> *How does the framework scale to layouts with substantially more hierarchy,
> more constraints, and a larger number of editable objects?*

The Argon sources that are swept live in [`../examples/`](../examples):

| Example | Cell(s) | Axis stressed |
| -------------------------------- | ----------------------- | ------------- |
| `examples/stress_shapes` | `shapes(n)` | `n` independent rectangles in one cell (generated by recursion) |
| `examples/stress_shapes` | `shapes_loop(n)` | the same geometry generated with a `for` loop over `std::range` (also stresses the functional list representation) |
| `examples/stress_constraints` | `constraints(n)` | a ring of `n+1` rectangles whose edges are mutually coupled, forcing the general (dense) linear solver |
| `examples/stress_instances` | `instances(n)` | `n` instances of a single cached leaf cell |
| `examples/stress_hierarchy` | `h0 .. h8` | a chain of cells `h{k}` each instantiating `h{k-1}`; compiling `h{k}` exercises `k` levels of hierarchy |

The benchmark *drivers* are the `bench_*` tests in
[`../core/compiler/src/lib.rs`](../core/compiler/src/lib.rs). For the
hierarchy axis the driver generates `h0..h{depth}` workspaces on the fly (a
single `.ar` file cannot express a runtime-variable depth because Argon cells
cannot be recursive or forward-referenced).

## Running

The `bench_*` tests are marked `#[ignore]` because the larger sizes take well
over 6 s in a debug build. Run them in **release**, **serially** (peak-memory
tracking uses a process-global allocator, so concurrent tests would corrupt the
measurements):

```bash
cargo test -p compiler --release -- --ignored --test-threads=1 --nocapture bench_
```

Each test writes a CSV to `bench/results/<axis>.csv` with columns
`size,time_s,peak_bytes,n_objects`. Then render the figure (and print a summary
table of fitted scaling models):

```bash
python3 bench/plot_scaling.py # writes bench/argon_scaling.{png,pdf}
```

`plot_scaling.py` needs only the standard library to print the summary table;
`matplotlib` is required to draw the figure.

To run a single axis, e.g. just the instance sweep:

```bash
cargo test -p compiler --release -- --ignored --test-threads=1 --nocapture bench_instances
```

The fast `stress_*_smoke` tests (which just check that each example still
compiles) run in the normal `cargo test` suite and are **not** ignored.

### Configuring the sweeps

Every axis reads its list of sizes from an environment variable, falling back
to a default. This keeps the benchmarks general-purpose: the same test can be
re-run at a different scale — for example after a compiler optimization changes
how an axis scales — without editing any source. Pass a comma-separated list:

| Env var | Axis | Default |
| ------- | ---- | ------- |
| `ARGON_BENCH_SHAPES` | shapes (recursion) | `500,1000,2000,4000,8000,16000,32000` |
| `ARGON_BENCH_SHAPES_LOOP` | shapes (`for` loop) | `250,500,1000,2000` |
| `ARGON_BENCH_INSTANCES` | instances | `500,…,64000` |
| `ARGON_BENCH_CONSTRAINTS` | coupled constraints | `32,64,128,256,512,1024` |
| `ARGON_BENCH_HIER_SINGLE` | hierarchy (1 ref) | `4,8,16,32,48,64,96,128` |
| `ARGON_BENCH_HIER_DOUBLE` | hierarchy (2 refs) | `2,4,6,8,10,12,14,16,18` |

```bash
# e.g. sweep the for-loop variant out to the same sizes as bench_shapes
ARGON_BENCH_SHAPES_LOOP=500,1000,2000,4000,8000,16000,32000 \
cargo test -p compiler --release -- --ignored --test-threads=1 --nocapture bench_shapes_loop
```

The defaults are sized so the suite runs in a few minutes within a few GiB on
the current build; they are not claims about how any axis "should" scale.

## Methodology

- **Time**: minimum wall-clock time over a few repetitions (`min` is robust to
noise on a shared machine). Parsing/static analysis is done once per size and
excluded from the hierarchy timings; everything else is end-to-end `compile()`.
- **Memory**: a `#[global_allocator]` compiled only into the test binary
(`bench_alloc::Tracking` in `lib.rs`) tracks live and peak heap bytes. We
report the peak heap *growth* during a single `compile()`.
- **Build**: release profile. Numbers below were collected on a Linux machine;
absolute values are machine-dependent but the *scaling* is not.

## Results

The numbers below are a **snapshot** from one release build on the development
machine; they are produced by the commands above and meant to be regenerated
(absolute values are machine- and build-dependent). `n` is the per-axis size
parameter; "peak" is peak heap allocated during compilation.

| Axis | largest `n` | time @ largest | peak mem @ largest | empirical scaling |
| ---- | ----------- | -------------- | ------------------ | ----------------- |
| Shapes (recursion) | 32 000 rects | 1.53 s | 0.94 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) |
| Instances | 64 000 insts | 3.14 s | 1.29 GiB | **~linear** (time `∝ n^1.2`, mem `∝ n^1.0`) |
| Hierarchy, 1 child ref | depth 128 | 0.09 s | 0.12 GiB | **polynomial** (`∝ depth^1.3–1.4`) |
| Coupled constraints | 1 024 rects | 21.7 s | 0.13 GiB | **super-cubic in time** (see below) |
| Shapes (`for`-loop) | 2 000 rects | 0.59 s | 4.1 GiB | **quadratic** (mem `∝ n^2`) |
| Hierarchy, 2 child refs | depth 18 | 11.5 s | 3.6 GiB | **exponential** (`×1.9` per level) |

### Interpretation

- **Geometry and instances scale linearly.** Compiling a single flat cell with
tens of thousands of fully-constrained rectangles, or with tens of thousands
of instances of a cached cell, is linear in both time and memory. Each shape
contributes 4 solver variables and each instance 2, and because their
constraints pin one variable at a time the solver resolves them by
back-substitution without ever forming a matrix. This is the common case for
real parametric cells and it scales comfortably to "thousands of rectangles".

- **Coupled constraints are the expensive axis.** When constraints form one
large connected component that *cannot* be back-substituted (here, a ring of
mutually-coupled edges), Argon falls back to its general linear solver, which
builds a dense matrix and takes an SVD. The per-doubling cost climbs from ~4×
at `n=64→128` to ~15× at `n=512→1024`, i.e. it steepens toward the `O(n^3)`
of dense factorization (and worse, because `solve()` is re-run as the system
is assembled). This is the "general linear constraint solving (slow)" caveat
in the top-level README, quantified: ~1 000 coupled editable variables take
~20 s. Layouts whose constraints decompose into many small independent groups
(the typical case) avoid this entirely.

- **Hierarchy depth is limited by the type representation.** A cell's static
type (`CellTy`) stores the full structural type of every field, including
instantiated sub-cells. If a cell references its child **once** (e.g.
`let i = inst(child());`), depth scales polynomially (`~depth^1.4`) and is
fine to ~128 levels. If it references the child **twice** (e.g. the
`let c = child(); let i = inst(c);` idiom from the tutorial), the type of
`h{k}` contains two copies of the type of `h{k-1}`, so the representation —
and hence compile time and memory — **doubles with every level** (`×1.9` per
level measured). Beyond ~depth 20 this exhausts memory (depth 20 alone needs
~14.5 GiB / 50 s; depth 18 is ~3.6 GiB / 11.5 s, which is where this sweep is
capped). Very deep hierarchies additionally hit a native-recursion stack
limit in the compiler at a few hundred levels.

- **Recursion vs. iteration measures the list/iteration machinery.** `shapes`
and `shapes_loop` emit identical geometry; the only difference is that
`shapes_loop` builds and iterates a `std::range` list. On the build measured
here that list path is markedly heavier (≈4 GiB to emit 2 000 rectangles via
a `for` loop, vs. 32 000 by recursion in under 1 GiB), so the gap between the
two series is a direct measure of the cost of the list representation rather
than of the geometry or solver. Re-running both series (e.g. with
`ARGON_BENCH_SHAPES_LOOP` set to the same sizes as `bench_shapes`) is the way
to see that cost change as the iteration/list machinery is optimized.

The takeaways for the paper: editable-object count and instance count scale
linearly; the practically-relevant limits are the dense general constraint
solver on large *coupled* systems and structural type expansion on deep
hierarchies — both of which line up with the future-work items already listed
in the project README (faster linear constraint solving; incremental
compilation). The bullets above describe the build at the time of measurement;
because every axis is re-runnable (and size-configurable), the same harness can
be used to confirm improvements from compiler optimizations.

![Argon scaling](argon_scaling.png)
Binary file added bench/argon_scaling.pdf
Binary file not shown.
Binary file added bench/argon_scaling.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
156 changes: 156 additions & 0 deletions bench/plot_scaling.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
#!/usr/bin/env python3
"""Plot Argon compile-time and memory scaling from the benchmark CSVs.

The CSVs are produced by the `bench_*` tests in `core/compiler/src/lib.rs`
(see ../bench/README.md for how to run them). Each CSV has the columns

size,time_s,peak_bytes,n_objects

where `size` is the swept parameter for that axis (number of shapes, number of
coupled constraints, number of instances, or hierarchy depth).

Usage:
python3 bench/plot_scaling.py # reads bench/results/*.csv
python3 bench/plot_scaling.py --results DIR --out FILE
"""
import argparse
import csv
import math
import os
import sys

# Series in the order we want them drawn. Each entry is
# (csv_basename, display_label, size_unit, model)
# where `model` is "poly" (fit a power law y ~ n^k) or "exp" (fit y ~ b^n,
# appropriate for the exponentially-scaling hierarchy variant).
SERIES = [
("shapes", "Shapes (recursion)", "# rectangles", "poly"),
("shapes_loop", "Shapes (for-loop / cons list)", "# rectangles", "poly"),
("instances", "Instances", "# instances", "poly"),
("constraints", "Coupled constraints", "# coupled rects", "poly"),
("hierarchy_single_ref", "Hierarchy (1 child ref)", "depth", "poly"),
("hierarchy_double_ref", "Hierarchy (2 child refs)", "depth", "exp"),
]


def load(path):
xs, ts, ms = [], [], []
with open(path, newline="") as f:
for row in csv.DictReader(f):
xs.append(float(row["size"]))
ts.append(float(row["time_s"]))
ms.append(float(row["peak_bytes"]))
return xs, ts, ms


def _slope(pairs):
"""Least-squares slope of a list of (x, y) points."""
n = len(pairs)
if n < 2:
return float("nan")
sx = sum(p[0] for p in pairs)
sy = sum(p[1] for p in pairs)
sxx = sum(p[0] * p[0] for p in pairs)
sxy = sum(p[0] * p[1] for p in pairs)
denom = n * sxx - sx * sx
if abs(denom) < 1e-12:
return float("nan")
return (n * sxy - sx * sy) / denom


def fit_exponent(xs, ys):
"""Power-law exponent: slope of log(y) vs log(x)."""
return _slope([(math.log(x), math.log(y)) for x, y in zip(xs, ys) if x > 0 and y > 0])


def fit_base(xs, ys):
"""Exponential base b for y ~ b^x: from the slope of log(y) vs x."""
s = _slope([(x, math.log(y)) for x, y in zip(xs, ys) if y > 0])
return math.exp(s)


def describe(model, xs, ys):
"""Return (legend_suffix, summary_string) for the fitted scaling model."""
if model == "exp":
b = fit_base(xs, ys)
return f"exp., $\\times{b:.1f}$/step", f"exponential (x{b:.2f} per unit)"
k = fit_exponent(xs, ys)
return f"$\\propto n^{{{k:.1f}}}$", f"~n^{k:.2f}"


def main():
here = os.path.dirname(os.path.abspath(__file__))
ap = argparse.ArgumentParser()
ap.add_argument("--results", default=os.path.join(here, "results"))
ap.add_argument("--out", default=os.path.join(here, "argon_scaling"))
args = ap.parse_args()

data = {}
for key, label, unit, model in SERIES:
path = os.path.join(args.results, f"{key}.csv")
if os.path.exists(path):
xs, ts, ms = load(path)
if xs:
data[key] = (label, unit, model, xs, ts, ms)

if not data:
sys.exit(
f"No benchmark CSVs found in {args.results}.\n"
"Run the benchmarks first (see bench/README.md)."
)

# Print a summary table of fitted scaling models.
print(f"{'series':<30}{'points':>7} {'time scaling':<22}{'mem scaling':<22}max(time, mem)")
for key, _, _, _ in SERIES:
if key not in data:
continue
label, unit, model, xs, ts, ms = data[key]
_, t_desc = describe(model, xs, ts)
_, m_desc = describe(model, xs, ms)
print(
f"{label:<30}{len(xs):>7} {t_desc:<22}{m_desc:<22}"
f"{max(ts):.3f} s / {max(ms) / 2**20:.0f} MiB"
)

try:
import matplotlib

matplotlib.use("Agg")
import matplotlib.pyplot as plt
except ImportError:
sys.exit("\nmatplotlib not installed; printed summary only. `pip install matplotlib` to draw.")

fig, (ax_t, ax_m) = plt.subplots(1, 2, figsize=(13, 5.2))
markers = ["o", "s", "^", "D", "v", "P"]
for (key, _, _, _), marker in zip(SERIES, markers):
if key not in data:
continue
label, unit, model, xs, ts, ms = data[key]
t_suffix, _ = describe(model, xs, ts)
m_suffix, _ = describe(model, xs, ms)
ax_t.plot(xs, ts, marker=marker, label=f"{label} ({t_suffix})")
ax_m.plot(xs, [m / 2**20 for m in ms], marker=marker,
label=f"{label} ({m_suffix})")

for ax in (ax_t, ax_m):
ax.set_xscale("log")
ax.set_yscale("log")
ax.set_xlabel("problem size $n$ (rectangles / constraints / instances / depth)")
ax.grid(True, which="both", ls=":", alpha=0.4)

ax_t.set_ylabel("compile time (s)")
ax_t.set_title("Argon compile-time scaling")
ax_m.set_ylabel("peak heap allocated (MiB)")
ax_m.set_title("Argon memory scaling")
ax_t.legend(fontsize=8, loc="upper left")
ax_m.legend(fontsize=8, loc="upper left")
fig.tight_layout()

for ext in ("png", "pdf"):
out = f"{args.out}.{ext}"
fig.savefig(out, dpi=150, bbox_inches="tight")
print(f"wrote {out}")


if __name__ == "__main__":
main()
7 changes: 7 additions & 0 deletions bench/results/constraints.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
size,time_s,peak_bytes,n_objects
32,0.00383075,2585876,33
64,0.00726206,3829492,65
128,0.029626408,6749992,129
256,0.21347691,15396760,257
512,1.4369801039999999,42127416,513
1024,21.745505947,133337720,1025
10 changes: 10 additions & 0 deletions bench/results/hierarchy_double_ref.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
size,time_s,peak_bytes,n_objects
2,0.000971778,1325410,5
4,0.001176136,1653373,9
6,0.001882165,2474335,13
8,0.004620015,5364013,17
10,0.015455478,16602635,21
12,0.088864192,61484713,25
14,0.481928844,240102311,29
16,2.506498165,954179013,33
18,11.538940627,3810189475,37
9 changes: 9 additions & 0 deletions bench/results/hierarchy_single_ref.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
size,time_s,peak_bytes,n_objects
4,0.000845819,1556216,9
8,0.001178198,2140495,17
16,0.002346772,4034443,33
32,0.005942098,10387267,65
48,0.012042876,19826107,97
64,0.022953693,33362240,129
96,0.052964538,69348808,193
128,0.090173076,120382868,257
9 changes: 9 additions & 0 deletions bench/results/instances.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
size,time_s,peak_bytes,n_objects
500,0.012283447,11806766,501
1000,0.02516376,22390998,1001
2000,0.056507946,43559462,2001
4000,0.147355775,85896390,4001
8000,0.310525996,170570230,8001
16000,0.689046789,339917926,16001
32000,1.457187325,678613350,32001
64000,3.140683519,1356004118,64001
8 changes: 8 additions & 0 deletions bench/results/shapes.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
size,time_s,peak_bytes,n_objects
500,0.012574383,16159584,500
1000,0.028867058,31116160,1000
2000,0.071651056,61029296,2000
4000,0.158150608,120855616,4000
8000,0.329667899,240508160,8000
16000,0.70516885,479813376,16000
32000,1.530754693,958423696,32000
5 changes: 5 additions & 0 deletions bench/results/shapes_loop.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
size,time_s,peak_bytes,n_objects
250,0.026441618,73227389,250
500,0.091560351,274248679,500
1000,0.269871454,1063292012,1000
2000,0.589680644,4189379419,2000
Loading
Loading