Skip to content

DRAFT: multitime subquery index#4426

Draft
robacourt wants to merge 40 commits into
mainfrom
rob/multitime-subquery-index
Draft

DRAFT: multitime subquery index#4426
robacourt wants to merge 40 commits into
mainfrom
rob/multitime-subquery-index

Conversation

@robacourt
Copy link
Copy Markdown
Contributor

@robacourt robacourt commented May 27, 2026

Basic memory use has halved from 2KB per row per subquery to 1KB per row per subquery:
unique_subquery_movein_memory_1_6_1_vs_new_subquery_index_settled

With 1.6.1 memory use increased with each outer shape that used the subquery, with this new version the overhead of additional outer shapes using the same subquery is minimal:
shared_subquery_movein_memory_1_6_1_vs_new_subquery_index_settled

PR stats: rob/multitime-subquery-index

Local measurements on macOS / Apple M3 Pro / OTP 28 / Elixir 1.19.5.
Drivers under scripts/; see Reproducing at the bottom.

Filter ETS memory

N outer shapes, each with one positive sublink seeded with M values.
"Shared" reuses the same dep handle across all shapes; "unique" gives
every shape its own. Total bytes across every ETS table the Filter
owns.

Scenario main branch Reduction
1 shape × 1 000 values, unique 468.3 KiB 216.9 KiB 54 %
10 shapes × 1 000 values, unique 4.42 MiB 1.84 MiB 58 %
100 shapes × 1 000 values, unique 44.03 MiB 18.21 MiB 59 %
100 shapes × 10 000 values, unique 435.58 MiB 176.15 MiB 60 %
10 shapes × 1 000 values, shared 4.42 MiB 260.2 KiB 94 %
100 shapes × 1 000 values, shared 44.03 MiB 712.8 KiB 98 %
100 shapes × 10 000 values, shared 435.59 MiB 2.27 MiB 99 %

On main shared and unique sizes are identical — the old SubqueryIndex
stores a per-shape view regardless of dep-handle sharing.

Filter API latency

Median μs over a populated filter. + = branch faster.

Routing — Filter.affected_shapes/2

Single outer shape, axis is values_in_subquery:

values_in_subquery main branch Δ
positive, value matches
10 81 71 +12 %
100 68 63 +7 %
1 000 66 60 +9 %
10 000 60 59 +2 %
positive, value misses
10 30 59 −97 %
100 30 65 −117 %
1 000 29 59 −103 %
10 000 30 59 −97 %

The miss path is the only consistent regression: the conservative
MultiTimeView.member_at_some_time? adds ~30 µs of fixed cost per WAL
record that misses every subquery.

Many shapes through one node:

Scenario main branch Δ
100 shapes × 1 000 values, shared subquery (match) 3 003 90 +97 %
100 shapes × 10 000 values, unique subqueries 3 004 3 079 −3 %
1 000 negated shapes in one group 482 368 +24 %

Shape lifecycle — Filter.add_shape/3 / Filter.remove_shape/2

Same axis (values_in_subquery).

Operation, N values_in_subquery main branch Δ
add_shape, sharing existing child
  N=1 000 6 5 +17 %
  N=10 000 9 7 +22 %
add_shape, new subquery †
  N=1 000 7 242 regressed
  N=10 000 17 2 668 regressed
remove_shape, others remain on child
  N=1 000 219 3 +99 %
  N=10 000 2 863 3 +99.9 %
remove_shape, last shape on child
  N=1 000 222 88 +60 %
  N=10 000 2 778 1 106 +60 %

add_shape, new subquery is not a real regression: the per-value
seeding work moved into add_shape on this branch (it walks the
already-populated MTV and seeds positive routes). On main the same work
happens in a separate SubqueryIndex.seed_membership/5 call after
add_shape. End-to-end "set up a routable shape" is O(N) either way.

The headline shape-lifecycle result is remove_shape, others remain
flat at ~3 µs on this branch vs O(values) on main, ~950× faster at
N = 10 000
.

Branch-only operations

These are new in this branch — no main equivalent. Numbers verify the
RFC's expected scaling claims (Benchee medians from
bench/subquery_index_bench.exs).

Operation, varying its size axis N=10 N=100 N=1 000 N=10 000
MultiTimeView.member?/4 (history length) 83 ns 125 ns 458 ns 3.92 µs
MultiTimeView.mark_in/4 (history length) 1.1 µs 3.4 µs 27 µs
add_positive_route/3 (positive children) 0.6 µs 3.5 µs 38 µs 355 µs
remove_positive_route/3 (positive children) 0.5 µs 2.5 µs 28 µs 306 µs
remove_subquery/2 (values_in_subquery) 6 µs 23 µs 222 µs 2.37 ms
Compactor pass (dirty values) 8 µs 78 µs 786 µs 8.76 ms
notify_processed_up_to/4 (single consumer) 2.95 µs
mark_ready/2 (single subquery) 2.9 µs

All operations scale as the RFC predicts.

robacourt and others added 30 commits May 27, 2026 13:27
Materializer populates `MultiTimeView` during initial materialization,
emits dependency move events with `from_time`/`to_time`/`changed_values`,
and writes positive routing rows to `SubqueryIndex` as values enter the
shared view. New `Materializer.register_subquery_consumer/3` blocks on
readiness, registers with `SubqueryProgressMonitor`, and returns the
current logical time.

`EventHandlerBuilder` now hands consumers compact
`subquery_refs: %{ref => %{subquery_id, time}}` references in addition
to the existing `views` map (kept as a derived cache; deleted in 2b).
`SetupEffects` calls `SubqueryIndex.set_shape_subquery` and
`mark_ready/2` so the routing path can use logical times.

`SubqueryProgressMonitor` is supervised under `ShapeLogCollector`'s
supervisor so tests that use `with_shape_log_collector` get it for free.

Tests that drive moves via the deleted per-shape `add_value` /
`seed_membership` API are re-tagged `:uses_legacy_subquery_api` and
will be rewritten or replaced in 2b.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The consumer event handler no longer carries a per-instance `views`
MapSet cache for each dependency. `Steady` and `Buffering` derive every
view they need from the shared `MultiTimeView` indexed by the
consumer's currently-pinned logical time (held in `subquery_refs`),
and the `Views` module is gone.

To make this work correctly when a move-in is buffered:
- `ActiveMove` now stores `subquery_id`, `from_time`, `to_time`,
  `dep_index`, `dep_move_kind`, `subquery_ref`, and `values` directly,
  with no embedded view snapshots.
- `MoveQueue` tracks the latest payload `to_time` per dep, so a
  queued batch drained after a splice carries its materializer
  logical time forward into the next move-in query.
- `Buffering.splice` advances `SubqueryIndex.set_shape_subquery` /
  `ProgressMonitor.notify_processed_up_to` and updates
  `subquery_refs[ref].time` so subsequent routing reads at the
  right time.
- When a move-out is drained ahead of a same-dep queued move-in,
  the time advance is deferred to the upcoming Buffering splice so
  that the move-in query computes `views_before - views_after`
  correctly.

`StartMoveInQuery` and `SplicePlan` carry `from_time`/`to_time` and
read views via MTV at splice time rather than receiving pre-built
MapSets. `Effects.query_move_in_async` builds the same views map by
reading MTV at the consumer's pinned time.

The legacy in-process subquery test file is renamed to `.legacy`;
its replacement lives in `consumer_test.exs` (router-level move-in
tests) and the per-module unit tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Electric.Shapes.Filter.Indexes.SubqueryIndex.Compactor, a per-stack
GenServer that periodically (every 10s by default) walks every subquery
tracked by `MultiTimeView`, advances its `min_required_time` to the
minimum required by any registered consumer (via
`SubqueryProgressMonitor.min_required_time/2`), and removes the
positive-routing rows for any value whose history compacts to empty.

To support cascading GC, `MultiTimeView.set_min_required_time/3` now
returns the list of values whose history was deleted, and a new
`MultiTimeView.subquery_ids/1` enumerates the subqueries currently
tracked.

The Compactor is supervised under `ShapeLogCollector.Supervisor`
alongside `ProgressMonitor` so every stack gets one automatically.

The resolver-pattern refactor for `Querying.move_in_where_clause/5`
and `SplicePlan` outlined in the plan is deferred — the current call
sites already pass MapSets built from MTV at splice time, so the
refactor is cosmetic. The materializer's `link_values_table` ETS cache
is similarly left in place; it still has external callers
(`Materializer.get_all_as_refs/2`) and untangling them is its own
patch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The per-stack `link_values_table` ETS cache and its
`Materializer.{init_link_values_table, get_link_values, get_all_as_refs,
delete_link_values}` API are no longer needed: every consumer that used
to read it now reads `MultiTimeView` directly at its own pinned logical
time, so the cache was just a stale copy.

Removes:
- `Materializer.init_link_values_table/1` (and its call from
  `ConsumerRegistry.new/2`)
- `Materializer.get_link_values/1` + `:get_link_values` handle_call
- `Materializer.get_all_as_refs/2` (was unused outside this module)
- `Materializer.delete_link_values/2` (was unused)
- internal `write_link_values/1`, `link_values_table_name/1`,
  `link_values_from_counts/1` helpers

The materializer test file gains a local `link_values/1` helper that
reads `MultiTimeView.values/3` at the materializer's current logical
time, with all 73 assertion sites updated to use it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes that swap pre-built MapSet view copies for closures that
read MultiTimeView on demand.

1. `Querying.move_in_where_clause/4` now takes a single
   `values_for.(subquery_ref, :before | :after)` resolver instead of
   the old `views_before_move` and `views_after_move` map arguments.
   `position_to_sql/4` and `build_disjuncts_sql/5` take a per-call
   resolver closure, and `metadata_sql`'s `:views` opt is renamed to
   `:values_for_ref` (a 1-arg closure). `query_move_in_async` builds
   the resolver once, capturing `mtv`, `from_time`, `to_time`, and
   `subquery_refs`, so SQL build can read membership lazily.

2. `Shape.convert_change`'s `extra_refs` now accepts
   `{old_subquery_member?, new_subquery_member?}` — two 2-arity
   callbacks of shape `(subquery_ref, value) -> boolean`. The DNF
   evaluator routes them through `Runner.execute(..., subquery_member?:
   callback)`, which has supported the closure path all along. The
   pre-RFC `{old_views_map, new_views_map}` map form is still accepted
   via an internal `normalise_extra_refs/1` adapter so existing tests
   and external callers keep working.

   `SplicePlan.build/3` and `Steady.append_txn_effects/2` now build
   `WhereClause.subquery_member_from_mtv/3` callbacks instead of
   materialising full view maps. The new helper looks up the consumer's
   pinned time per ref and supports a single-ref `time_override` so
   splice-plan can read the trigger ref at `from_time`/`to_time` while
   every other ref stays at its pinned time.

`Shape.get_row_metadata/6` keeps a backward-compatible head for the
map form to preserve the `shape_test.exs` direct-call entry point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the `:uses_legacy_subquery_api` tag entirely now that every test
under it has been rewritten or replaced.

- `test/electric/shapes/consumer/event_handler/subqueries_test.exs.legacy`
  deleted (963 lines). It pokes at the pre-RFC `Steady.views` /
  `ActiveMove.views_*` MapSet fields that no longer exist. Its
  behavioural coverage is now provided by the consumer- and
  router-level integration tests (`consumer_test.exs` move-in/buffering
  tests + `router_test.exs` subqueries describe block).

- `consumer_test.exs`: the two tagged tests are rewritten against the
  new API. The startup test now asserts `has_positions?` + `not
  fallback?` + `get_shape_subquery/3` instead of the removed
  `positions_for_shape/2`. The "move_in adds value to the index" test
  is updated for the new logical-time semantics — `member?/4` reads at
  the shape's stored time, so the new value only becomes visible
  through `member?` after the splice advances the shape's subquery
  time (the old version asserted visibility mid-buffering, which the
  current implementation correctly defers).

- `filter_test.exs` "subquery shapes routing in filter" describe
  block: the legacy `SubqueryIndex.seed_membership/5` and
  `SubqueryIndex.add_value/5` calls are replaced with two local test
  helpers (`seed_membership/5`, `add_value/5`) that look up the
  per-shape subquery_id `Filter.add_shape` stored, seed `MultiTimeView`
  with the values, bind the shape's `subquery_ref` via
  `set_shape_subquery/5`, and wire positive routing via
  `add_positive_route/3`. The `remove_shape` test's `:ets.tab2list/1`
  state-comparison is replaced with behavioural assertions
  (`Filter.affected_shapes/2`, `SubqueryIndex.has_positions?/2`,
  `SubqueryIndex.get_shape_subquery/3`).

- Standalone `remove_shape/2 removes seeded subquery index state` test
  at the top of `filter_test.exs` is deleted as redundant with the
  in-describe-block `remove_shape cleans up subquery index metadata
  and values` test.

- `test_helper.exs`: `:uses_legacy_subquery_api` is no longer in the
  default-exclude list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
robacourt and others added 8 commits May 27, 2026 13:27
Multi-valued routes were stored as `{:positive, group_id, value, cnid}`
in a `:set` ETS table and read with partial `:ets.match`, forcing a
full-table scan on the hot routing path. Switch the table to `:bag`,
move the discriminator (cnid, shape, branch) into the value position
for the six multi-value row types, and read with exact `:ets.lookup`.
Targeted deletes now use `:ets.delete_object` since `:ets.delete/2` on
a bag wipes every row with the key. `lookup_child_for_shape` walks the
shape's own attachments instead of the whole group. Matches the RFC's
`{:positive, group_id, value} -> child_node_id` cost model.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move `MultiTimeView.values/3` materialisation and
`Querying.move_in_where_clause/4` construction inside
`Task.Supervisor.start_child` in `query_move_in_async/4` so the
dependency-view arrays live on the short-lived task heap rather than
on the long-lived consumer process. Also extract just the fields the
task needs so the closure no longer captures the whole `consumer_state`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The reduce path only used the dep_view to answer "is this value in the
base view?". Passing a `(value) -> boolean()` callback lets the consumer
skip materialising the full dependency view on its heap — production
now closes over `MultiTimeView.member?/4` per value. Steady and
Buffering event handlers no longer build a transient MapSet per
materializer_changes event.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ActiveMove.view_before_move/2 and view_after_move/2 were defined but
had no callers (production or tests). Splice and TransactionConverter
already read MultiTimeView directly when they need membership at
from_time / to_time. Drop the helpers and the now-unused MultiTimeView
alias.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@@ -1,957 +0,0 @@
defmodule Electric.Shapes.Consumer.EventHandler.SubqueriesTest do
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What replaces these tests?

Comment thread packages/sync-service/lib/electric/shapes/filter/where_condition.ex Outdated
`other_shape_matches?` previously used `subquery_member_unknown/0`, which
raised on any sublink in the residual `and_where`. The runner errored out
and the caller treated it as "include" — every record routed through any
shape with a residual sublink.

Use the same conservative MTV-based check the routing path uses
(RFC §Routing): positive sublinks consult
`MultiTimeView.member_at_some_time?/3`, negated sublinks consult
`MultiTimeView.member_at_all_times?/3`. Values the MTV can prove are
absent at every retained time (positive) or present at every retained
time (negated) get pruned here instead of over-routing. Ambiguous values
still over-route and the consumer's exact check refines.

SubqueryIndex now stores `{dep_handle, polarity}` per
`(shape_handle, dep_index)` so the new callback can pick the right MTV
predicate per sublink. New public `SubqueryIndex.lookup_dep!/3` is the
read path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.38%. Comparing base (319e405) to head (008612f).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4426      +/-   ##
==========================================
+ Coverage   56.08%   59.38%   +3.30%     
==========================================
  Files         263      308      +45     
  Lines       28601    32332    +3731     
  Branches     8003     8893     +890     
==========================================
+ Hits        16040    19200    +3160     
- Misses      12546    13114     +568     
- Partials       15       18       +3     
Flag Coverage Δ
packages/agents 68.47% <ø> (ø)
packages/agents-mcp 77.54% <ø> (?)
packages/agents-runtime 81.82% <ø> (+0.01%) ⬆️
packages/agents-server 74.67% <ø> (-0.66%) ⬇️
packages/agents-server-ui 5.71% <ø> (ø)
packages/electric-ax 43.81% <ø> (ø)
packages/experimental 87.73% <ø> (?)
packages/react-hooks 86.48% <ø> (?)
packages/start 82.83% <ø> (?)
packages/typescript-client 94.39% <ø> (?)
packages/y-electric 56.05% <ø> (?)
typescript 59.38% <ø> (+3.30%) ⬆️
unit-tests 59.38% <ø> (+3.30%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant