Skip to content

Improved sliding windows#1116

Open
schrf wants to merge 17 commits into
scverse:mainfrom
schrf:improved_sliding_windows
Open

Improved sliding windows#1116
schrf wants to merge 17 commits into
scverse:mainfrom
schrf:improved_sliding_windows

Conversation

@schrf

@schrf schrf commented Feb 10, 2026

Copy link
Copy Markdown

Description

replaced the drop_partial_windows argument with partial_windows, adding two new options:

  1. "adaptive": adjust window sizes such that all windows have almost the exact same size
  2. "split": recursively split a slide into windows such that they do not exceed a maximum number of cells defined by max_nr_cells.

The old arguments are None and drop now.

How has this been tested?

Tested using imc pancreas and visium datasets, running sliding_windows with different argument combinations and visually inspecting the windows by plotting them.

Closes

closes #908

@codecov

codecov Bot commented Feb 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 82.85714% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.63%. Comparing base (55572c4) to head (2f32109).

Files with missing lines Patch % Lines
src/squidpy/tl/_sliding_window.py 82.85% 6 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1116      +/-   ##
==========================================
+ Coverage   76.55%   76.63%   +0.08%     
==========================================
  Files          63       63              
  Lines        9067     9122      +55     
  Branches     1521     1534      +13     
==========================================
+ Hits         6941     6991      +50     
- Misses       1541     1543       +2     
- Partials      585      588       +3     
Files with missing lines Coverage Δ
src/squidpy/tl/_sliding_window.py 82.97% <82.85%> (+5.07%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@timtreis

Copy link
Copy Markdown
Member

Hey @schrf, thank you for your contribution! Quite a lot of the changes are inconsequential (removing comments, renaming things, etc) which I'd guess are introduced by some LLM. While that's totally acceptable, I'd be great if you could clean up this PR to only touch things that are actually relevant to the new functionality you're implementing.

Thank you!

@selmanozleyen

Copy link
Copy Markdown
Member

yes also if you could make the notebook changes in a separate PR for the notebook repo I would appreciate it

@schrf

schrf commented Feb 23, 2026

Copy link
Copy Markdown
Author

Hey, thanks for the feedback. I'll simplify the changes as soon as I have time. @selmanozleyen I made a pull request for the notebook, I just didn't mention it here. It is #156

@schrf

schrf commented Mar 8, 2026

Copy link
Copy Markdown
Author

@timtreis I cleaned up the code. Let me know what you think

@selmanozleyen selmanozleyen self-requested a review June 9, 2026 14:23
If True, drop windows that are smaller than the window size at the borders.
partial_windows: Literal["adaptive", "drop", "split"] | None
If None, possibly small windows at the edges are kept.
If `adaptive`, all windows might be shrunken a bit to avoid small windows at the edges.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About adaptive mode. This mode still yields incomplete windows right? See this example:

Example (window_size=40, extent 0–100): number_x_windows = ceil(100/40) = 3, x_window_size = ceil(100/3) = 34 → starts [0, 34, 68], ends [34, 68, 100]. The last window is 32 wide, the others 34.

If all window sizes aren't guaranteed to be the same this should be noted here. We also need better explanation how the new window size is going to be calculated. Also we need to point that the main goal here is to make all window sizes closer to each other. This can be useful in your workflow but I'd like to see how this can be useful for researchers in general. Could you explain the use case with a concrete example to me?

indices = ((x_start, x_middle, y_start, y_end), (x_middle, x_end, y_start, y_end))
else:
# horizontal split
y_middle = sub_coord_y_sorted.loc[middle_pos, y_col]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be consistent with these and choose one of them

x_middle = sub_coord_x_sorted[x_col].iloc[middle_pos]
y_middle = sub_coord_y_sorted.loc[middle_pos, y_col]

@@ -45,8 +47,14 @@ def sliding_window(
overlap: int

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overlap and window_size should be optional if they are not required by partial_windows=split . It doesn't make sense to expect window_size from the user if aren't going to even use it right?

so it should be

window_size: int | None = None,

also the same for overlap.

"""
x_col, y_col = coord_columns

mask = (

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inclusive boundaries here can cause an infinite recursion. I made claude create a counterexample

The counterexample
Trigger: any window where more than max_nr_cells cells share the same coordinate on the split axis — i.e. a dense cluster of tied/duplicate coordinates. With quantized or integer-binned spatial data this is realistic, not just pathological.

Minimal case: 10 cells all at (0,0), max_nr_cells=5 → RecursionError.
Realistic case: a normal spread of points plus 8 cells piled at integer (7,7), max_nr_cells=5 → RecursionError.
Why it never terminates
Walk the recursion on a window that contains the tied pile (say all cells share x = m, width > height so it's a vertical split):

middle_pos = len // 2, x_middle = ...iloc[middle_pos] → x_middle = m (line 423).
Children are (x_start, m, …) and (m, x_end, …) (line 425).
The mask is inclusive on both ends (>= and <=, _get_window_mask:243-248), so every cell at x = m falls into the first child (x_start, m).
That child has the same cell count as the parent → step 1 again, with no progress. → infinite recursion until the stack limit.
The base case at line 407 (n_cells <= max_cells) can never be reached because the split fails to reduce the count. The same happens on the horizontal branch (line 428).

Root cause
The median value is used as a shared, inclusive boundary, so cells equal to the median can't be partitioned. A robust fix splits by rank/index rather than value (and makes the boundary half-open), or bails out when a window can't be reduced. Rough options:

Half-open boundaries: make sub-windows [start, mid) and [mid, end) so each cell lands in exactly one side. (Needs care so the overall coverage still includes max_x/max_y.)
Terminate on no-progress: if a child ends up with the same n_cells as its parent (all cells tied on both axes), stop and emit it as a single window even though it exceeds max_cells — and ideally log() a warning that the cap couldn't be honored.
Guard up front: detect that the split made no progress (child count == parent count) and break.

@selmanozleyen

Copy link
Copy Markdown
Member

One thing I noticed in the notebooks is the Nans here:
grafik

Is this expected? Previous notebook didn't have these nans. Also @FrancescaDr , do you ever use this function at large scale? I feel like this dense representation might not be very scalable.

Also another point I wan't to make is for partial_windows=split, if this function is expected to run at larger scales, I am not sure how well it would perform. But I am also not sure for this function in general. Therefore I'd like to know if the current arrangement already works well or not. Otherwise we can consider doing benchmarking and stuff for this

partial_windows: Literal["adaptive", "drop", "split"] | None
If None, possibly small windows at the edges are kept.
If `adaptive`, all windows might be shrunken a bit to avoid small windows at the edges.
If `drop`, possibly small windows at the edges are removed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop argument shouldn't be so easily removed. We should first deprecate it and warn users if it's being used. Then in another release we can remove it. Because this is a breaking change.

@selmanozleyen selmanozleyen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schrf could you apply the changes I mentioned? Also please don't include the notebook changes yet in this PR. Also the "drop, adaptive, split" names aren't very clear.

@FrancescaDr @timtreis @mossishahi , this function doesn't seem scalable but I am curious of your experience. If you had problems, there is a chance that you will have worse problems when trying to split by the number of cells evenly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add parameter to sliding_window to end up with windows of (approximately) same number of cells

3 participants