DRAFT: multitime subquery index#4426
Draft
robacourt wants to merge 40 commits into
Draft
Conversation
Materializer populates `MultiTimeView` during initial materialization,
emits dependency move events with `from_time`/`to_time`/`changed_values`,
and writes positive routing rows to `SubqueryIndex` as values enter the
shared view. New `Materializer.register_subquery_consumer/3` blocks on
readiness, registers with `SubqueryProgressMonitor`, and returns the
current logical time.
`EventHandlerBuilder` now hands consumers compact
`subquery_refs: %{ref => %{subquery_id, time}}` references in addition
to the existing `views` map (kept as a derived cache; deleted in 2b).
`SetupEffects` calls `SubqueryIndex.set_shape_subquery` and
`mark_ready/2` so the routing path can use logical times.
`SubqueryProgressMonitor` is supervised under `ShapeLogCollector`'s
supervisor so tests that use `with_shape_log_collector` get it for free.
Tests that drive moves via the deleted per-shape `add_value` /
`seed_membership` API are re-tagged `:uses_legacy_subquery_api` and
will be rewritten or replaced in 2b.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The consumer event handler no longer carries a per-instance `views` MapSet cache for each dependency. `Steady` and `Buffering` derive every view they need from the shared `MultiTimeView` indexed by the consumer's currently-pinned logical time (held in `subquery_refs`), and the `Views` module is gone. To make this work correctly when a move-in is buffered: - `ActiveMove` now stores `subquery_id`, `from_time`, `to_time`, `dep_index`, `dep_move_kind`, `subquery_ref`, and `values` directly, with no embedded view snapshots. - `MoveQueue` tracks the latest payload `to_time` per dep, so a queued batch drained after a splice carries its materializer logical time forward into the next move-in query. - `Buffering.splice` advances `SubqueryIndex.set_shape_subquery` / `ProgressMonitor.notify_processed_up_to` and updates `subquery_refs[ref].time` so subsequent routing reads at the right time. - When a move-out is drained ahead of a same-dep queued move-in, the time advance is deferred to the upcoming Buffering splice so that the move-in query computes `views_before - views_after` correctly. `StartMoveInQuery` and `SplicePlan` carry `from_time`/`to_time` and read views via MTV at splice time rather than receiving pre-built MapSets. `Effects.query_move_in_async` builds the same views map by reading MTV at the consumer's pinned time. The legacy in-process subquery test file is renamed to `.legacy`; its replacement lives in `consumer_test.exs` (router-level move-in tests) and the per-module unit tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Electric.Shapes.Filter.Indexes.SubqueryIndex.Compactor, a per-stack GenServer that periodically (every 10s by default) walks every subquery tracked by `MultiTimeView`, advances its `min_required_time` to the minimum required by any registered consumer (via `SubqueryProgressMonitor.min_required_time/2`), and removes the positive-routing rows for any value whose history compacts to empty. To support cascading GC, `MultiTimeView.set_min_required_time/3` now returns the list of values whose history was deleted, and a new `MultiTimeView.subquery_ids/1` enumerates the subqueries currently tracked. The Compactor is supervised under `ShapeLogCollector.Supervisor` alongside `ProgressMonitor` so every stack gets one automatically. The resolver-pattern refactor for `Querying.move_in_where_clause/5` and `SplicePlan` outlined in the plan is deferred — the current call sites already pass MapSets built from MTV at splice time, so the refactor is cosmetic. The materializer's `link_values_table` ETS cache is similarly left in place; it still has external callers (`Materializer.get_all_as_refs/2`) and untangling them is its own patch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The per-stack `link_values_table` ETS cache and its
`Materializer.{init_link_values_table, get_link_values, get_all_as_refs,
delete_link_values}` API are no longer needed: every consumer that used
to read it now reads `MultiTimeView` directly at its own pinned logical
time, so the cache was just a stale copy.
Removes:
- `Materializer.init_link_values_table/1` (and its call from
`ConsumerRegistry.new/2`)
- `Materializer.get_link_values/1` + `:get_link_values` handle_call
- `Materializer.get_all_as_refs/2` (was unused outside this module)
- `Materializer.delete_link_values/2` (was unused)
- internal `write_link_values/1`, `link_values_table_name/1`,
`link_values_from_counts/1` helpers
The materializer test file gains a local `link_values/1` helper that
reads `MultiTimeView.values/3` at the materializer's current logical
time, with all 73 assertion sites updated to use it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes that swap pre-built MapSet view copies for closures that
read MultiTimeView on demand.
1. `Querying.move_in_where_clause/4` now takes a single
`values_for.(subquery_ref, :before | :after)` resolver instead of
the old `views_before_move` and `views_after_move` map arguments.
`position_to_sql/4` and `build_disjuncts_sql/5` take a per-call
resolver closure, and `metadata_sql`'s `:views` opt is renamed to
`:values_for_ref` (a 1-arg closure). `query_move_in_async` builds
the resolver once, capturing `mtv`, `from_time`, `to_time`, and
`subquery_refs`, so SQL build can read membership lazily.
2. `Shape.convert_change`'s `extra_refs` now accepts
`{old_subquery_member?, new_subquery_member?}` — two 2-arity
callbacks of shape `(subquery_ref, value) -> boolean`. The DNF
evaluator routes them through `Runner.execute(..., subquery_member?:
callback)`, which has supported the closure path all along. The
pre-RFC `{old_views_map, new_views_map}` map form is still accepted
via an internal `normalise_extra_refs/1` adapter so existing tests
and external callers keep working.
`SplicePlan.build/3` and `Steady.append_txn_effects/2` now build
`WhereClause.subquery_member_from_mtv/3` callbacks instead of
materialising full view maps. The new helper looks up the consumer's
pinned time per ref and supports a single-ref `time_override` so
splice-plan can read the trigger ref at `from_time`/`to_time` while
every other ref stays at its pinned time.
`Shape.get_row_metadata/6` keeps a backward-compatible head for the
map form to preserve the `shape_test.exs` direct-call entry point.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the `:uses_legacy_subquery_api` tag entirely now that every test under it has been rewritten or replaced. - `test/electric/shapes/consumer/event_handler/subqueries_test.exs.legacy` deleted (963 lines). It pokes at the pre-RFC `Steady.views` / `ActiveMove.views_*` MapSet fields that no longer exist. Its behavioural coverage is now provided by the consumer- and router-level integration tests (`consumer_test.exs` move-in/buffering tests + `router_test.exs` subqueries describe block). - `consumer_test.exs`: the two tagged tests are rewritten against the new API. The startup test now asserts `has_positions?` + `not fallback?` + `get_shape_subquery/3` instead of the removed `positions_for_shape/2`. The "move_in adds value to the index" test is updated for the new logical-time semantics — `member?/4` reads at the shape's stored time, so the new value only becomes visible through `member?` after the splice advances the shape's subquery time (the old version asserted visibility mid-buffering, which the current implementation correctly defers). - `filter_test.exs` "subquery shapes routing in filter" describe block: the legacy `SubqueryIndex.seed_membership/5` and `SubqueryIndex.add_value/5` calls are replaced with two local test helpers (`seed_membership/5`, `add_value/5`) that look up the per-shape subquery_id `Filter.add_shape` stored, seed `MultiTimeView` with the values, bind the shape's `subquery_ref` via `set_shape_subquery/5`, and wire positive routing via `add_positive_route/3`. The `remove_shape` test's `:ets.tab2list/1` state-comparison is replaced with behavioural assertions (`Filter.affected_shapes/2`, `SubqueryIndex.has_positions?/2`, `SubqueryIndex.get_shape_subquery/3`). - Standalone `remove_shape/2 removes seeded subquery index state` test at the top of `filter_test.exs` is deleted as redundant with the in-describe-block `remove_shape cleans up subquery index metadata and values` test. - `test_helper.exs`: `:uses_legacy_subquery_api` is no longer in the default-exclude list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-valued routes were stored as `{:positive, group_id, value, cnid}`
in a `:set` ETS table and read with partial `:ets.match`, forcing a
full-table scan on the hot routing path. Switch the table to `:bag`,
move the discriminator (cnid, shape, branch) into the value position
for the six multi-value row types, and read with exact `:ets.lookup`.
Targeted deletes now use `:ets.delete_object` since `:ets.delete/2` on
a bag wipes every row with the key. `lookup_child_for_shape` walks the
shape's own attachments instead of the whole group. Matches the RFC's
`{:positive, group_id, value} -> child_node_id` cost model.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move `MultiTimeView.values/3` materialisation and `Querying.move_in_where_clause/4` construction inside `Task.Supervisor.start_child` in `query_move_in_async/4` so the dependency-view arrays live on the short-lived task heap rather than on the long-lived consumer process. Also extract just the fields the task needs so the closure no longer captures the whole `consumer_state`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The reduce path only used the dep_view to answer "is this value in the base view?". Passing a `(value) -> boolean()` callback lets the consumer skip materialising the full dependency view on its heap — production now closes over `MultiTimeView.member?/4` per value. Steady and Buffering event handlers no longer build a transient MapSet per materializer_changes event. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ActiveMove.view_before_move/2 and view_after_move/2 were defined but had no callers (production or tests). Splice and TransactionConverter already read MultiTimeView directly when they need membership at from_time / to_time. Drop the helpers and the now-unused MultiTimeView alias. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
robacourt
commented
May 27, 2026
| @@ -1,957 +0,0 @@ | |||
| defmodule Electric.Shapes.Consumer.EventHandler.SubqueriesTest do | |||
Contributor
Author
There was a problem hiding this comment.
What replaces these tests?
`other_shape_matches?` previously used `subquery_member_unknown/0`, which
raised on any sublink in the residual `and_where`. The runner errored out
and the caller treated it as "include" — every record routed through any
shape with a residual sublink.
Use the same conservative MTV-based check the routing path uses
(RFC §Routing): positive sublinks consult
`MultiTimeView.member_at_some_time?/3`, negated sublinks consult
`MultiTimeView.member_at_all_times?/3`. Values the MTV can prove are
absent at every retained time (positive) or present at every retained
time (negated) get pruned here instead of over-routing. Ambiguous values
still over-route and the consumer's exact check refines.
SubqueryIndex now stores `{dep_handle, polarity}` per
`(shape_handle, dep_index)` so the new callback can pick the right MTV
predicate per sublink. New public `SubqueryIndex.lookup_dep!/3` is the
read path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4426 +/- ##
==========================================
+ Coverage 56.08% 59.38% +3.30%
==========================================
Files 263 308 +45
Lines 28601 32332 +3731
Branches 8003 8893 +890
==========================================
+ Hits 16040 19200 +3160
- Misses 12546 13114 +568
- Partials 15 18 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Basic memory use has halved from 2KB per row per subquery to 1KB per row per subquery:

With 1.6.1 memory use increased with each outer shape that used the subquery, with this new version the overhead of additional outer shapes using the same subquery is minimal:

PR stats:
rob/multitime-subquery-indexLocal measurements on macOS / Apple M3 Pro / OTP 28 / Elixir 1.19.5.
Drivers under
scripts/; see Reproducing at the bottom.Filter ETS memory
N outer shapes, each with one positive sublink seeded with M values.
"Shared" reuses the same dep handle across all shapes; "unique" gives
every shape its own. Total bytes across every ETS table the
Filterowns.
On
mainshared and unique sizes are identical — the oldSubqueryIndexstores a per-shape view regardless of dep-handle sharing.
Filter API latency
Median μs over a populated filter.
+= branch faster.Routing —
Filter.affected_shapes/2Single outer shape, axis is
values_in_subquery:The miss path is the only consistent regression: the conservative
MultiTimeView.member_at_some_time?adds ~30 µs of fixed cost per WALrecord that misses every subquery.
Many shapes through one node:
Shape lifecycle —
Filter.add_shape/3/Filter.remove_shape/2Same axis (
values_in_subquery).add_shape, sharing existing childadd_shape, new subquery †remove_shape, others remain on childremove_shape, last shape on child†
add_shape, new subqueryis not a real regression: the per-valueseeding work moved into
add_shapeon this branch (it walks thealready-populated MTV and seeds positive routes). On main the same work
happens in a separate
SubqueryIndex.seed_membership/5call afteradd_shape. End-to-end "set up a routable shape" is O(N) either way.The headline shape-lifecycle result is
remove_shape, others remain—flat at ~3 µs on this branch vs O(values) on main, ~950× faster at
N = 10 000.
Branch-only operations
These are new in this branch — no main equivalent. Numbers verify the
RFC's expected scaling claims (Benchee medians from
bench/subquery_index_bench.exs).MultiTimeView.member?/4(history length)MultiTimeView.mark_in/4(history length)add_positive_route/3(positive children)remove_positive_route/3(positive children)remove_subquery/2(values_in_subquery)notify_processed_up_to/4(single consumer)mark_ready/2(single subquery)All operations scale as the RFC predicts.