Sentinel scan iterator by harshrai654 · Pull Request #3141 · redis/node-redis

harshrai654 · 2025-11-26T13:58:27Z

Description

This PR implements the scanIterator method for the RedisSentinel client.
Previously, scanIterator was not exposed on the Sentinel client, limiting the ability to iterate over keys on the master node directly through the Sentinel interface. This implementation adds that capability with handling for failovers.

Issue: #2999

Changes:

Automatic Failover Handling: If a master failover occurs during iteration, the iterator detects the MASTER_CHANGE topology event and automatically restarts the scan from the beginning on the new master. This ensures all keys are eventually covered, even if the topology changes mid-scan.
Reset-on-Failover Strategy: Since Redis cursors are stateful and specific to a node instance, the iterator deliberately restarts from cursor 0 (the beginning) upon detecting a master failover. This guarantees data completeness but may result in duplicate keys being yielded, as documented.

Checklist

Does npm test pass with this change (including linting)?
Is the new or changed code fully tested?
Is a documentation update included (if this change modifies existing APIs, or introduces new ones)?

Note

Medium Risk
New public API on the Sentinel master path; failover restart can yield duplicate or briefly inconsistent keys during iteration, though behavior is documented and tested.

Overview
Adds scanIterator on the Redis Sentinel client so callers can async-iterate master keys with the same ScanOptions as standalone clients.

On MASTER_CHANGE, the iterator resets the cursor to 0 and restarts (and discards a batch if failover happened mid-SCAN). That can duplicate keys and briefly skew the keyset vs replication; docs/sentinel.md documents dedup and pool behavior.

Each SCAN acquires/releases a master lease only around the command so loop bodies can run other commands (e.g. mGet) without deadlocking when masterPoolSize is 1. Integration tests cover normal scan, MATCH, in-loop commands, and failover during iteration.

^{Reviewed by Cursor Bugbot for commit 27f3d53. Bugbot is set up for automated code reviews on this repo. Configure here.}

jit-ci · 2025-11-26T13:58:35Z

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

jit-ci

❌ The following Jit checks failed to run:

secret-detection-trufflehog

#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.

More info in the Jit platform.

nkaradzhov · 2025-11-28T09:49:01Z

Hi @harshrai654, thanks for the contribution! On first glance, this looks good, gut it looks like there are some failing tests. Can you please resolve those, so we can have more thorough look on the final state of the code

jit-ci

❌ The following Jit checks failed to run:

secret-detection-trufflehog
static-code-analysis-semgrep-pro

#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.

More info in the Jit platform.

harshrai654 · 2025-11-29T10:18:00Z

Hey @nkaradzhov I have fixed the tests. A stopped node in previous test of the describe block was causing issues in the next test which requires healthy nodes in the sentinel. Added cleaning up of nodes of frame between these two tests.

elimelt · 2025-12-02T01:16:22Z

+}
+```
+
+If a failover occurs during the scan, the iterator will automatically restart from the beginning on the new master to ensure all keys are covered. This may result in duplicate keys being yielded. If your application requires processing each key exactly once, you should implement a deduplication mechanism (like a `Set` or Bloom filter).


nit: Bloom filter is not really a sound way to deduplicate on its own, so I'm not sure about the phrasing "Set or Bloom filter".

I'm not an expert on Sentinel or Redis replication, but is there any chance the two values differ between old and new master?

I agree, A Bloom filter isn’t a perfect deduplication mechanism.
The reason I mentioned it is because it provides a very low-memory way to reduce duplicates for large keyspaces, even though it can still produce false positives and skip some unseen keys.
I’ll update the documentation to clarify this tradeoff: using a Set gives strict, 100% deduplication, while a Bloom filter is only suitable when occasional false positives are acceptable in exchange for significantly lower memory usage.

Yes, the results may differ between the old and new master.
Redis replication is asynchronous, so a replica promoted to master may not contain exactly the same
keyspace as the previous master at the moment of failover. Some writes may not yet have replicated,
and new writes may occur on the promoted master after the failover.
Because the iterator restarts from cursor 0 on the new master, any additions or deletions that
happened around the failover will be reflected in the new scan.
Reference: https://redis.io/docs/latest/commands/scan/#scan-guarantees

Yeah I only called it out because of the implication you made, i.e. "application requires processing each key exactly once" -> "use set or bloom filter"

Difference in semantics between this and a regular scan is probably worth noting in the docs as well

nkaradzhov · 2025-12-02T12:41:13Z

@harshrai654 thank you so much! I will need to take a thorough look into this before passing it, and I am extremely busy on a big change that needs to happen very soon. So, apologies, but I will have to delay my full review here.

harshrai654 · 2025-12-02T18:16:56Z

@harshrai654 thank you so much! I will need to take a thorough look into this before passing it, and I am extremely busy on a big change that needs to happen very soon. So, apologies, but I will have to delay my full review here.

No worries, Thanks for the update and for letting me know!

TheIndra55 · 2025-12-08T19:25:37Z

This seems to cause a deadlock when the consumer tries to do another operation during the scan and there's not enough master clients available.

for await (const keys of sentinel.scanIterator()) {
    console.log(keys)

    // Hangs forever
    const values = await sentinel.mGet(keys)
    console.log(values)
}

harshrai654 · 2025-12-09T09:45:52Z

Hey @TheIndra55, thanks for catching this! You are absolutely correct.

Acquiring the master connection for the entire lifespan of the iterator causes a resource starvation deadlock. Since acquire reserves a client from the pool and doesn't release it until the iterator finishes, any operation inside the consumer's loop (like your mGet example) that requires a connection will block indefinitely if the pool is exhausted (e.g., a single-connection pool).

Even with a larger pool, this implementation unnecessarily holds a connection idle while the user's code executes, reducing overall throughput.

The fix can be to do an "Acquire->Scan->Release" loop: we should acquire the connection only for the scan operation itself and release it immediately before yielding to the user. This ensures the pool is free for other operations during the consumer's processing phase.

Here is the proposed implementation:

async *scanIterator(
  this: RedisSentinelType<M, F, S, RESP, TYPE_MAPPING>,
  options?: ScanOptions & ScanIteratorOptions
) {
  let cursor = options?.cursor ?? "0";
  let shouldRestart = false;

  // This listener persists for the entire iterator life to track failovers
  const handleTopologyChange = (event: RedisSentinelEvent) => {
    if (event.type === "MASTER_CHANGE") {
      shouldRestart = true;
    }
  };
  this.on("topology-change", handleTopologyChange);

  try {
    do {
      // 1. Acquire lease JUST for the scan command
      const masterClient = await this.acquire();
      
      let reply;
      try {
        // If master changed since last loop, reset cursor to 0
        if (shouldRestart) {
          cursor = "0";
          shouldRestart = false;
        }
        
        reply = await masterClient.scan(cursor, options);
        
      } finally {
        // 2. Release IMMEDIATELY so the pool is available for the user
        masterClient.release();
      }

      // Handle edge case: Topology changed *during* the scan
      if (shouldRestart) {
        // Data might be from the old master, so we discard this batch 
        // and restart from cursor 0 on the new master.
        cursor = "0"; 
        continue;
      }

      // 3. Yield to user
      // The connection is released, so `await sentinel.mGet()` will succeed
      cursor = reply.cursor;
      yield reply.keys;

    } while (cursor !== "0");

  } finally {
    this.removeListener("topology-change", handleTopologyChange);
  }
}

This ensures we respect the pool limits and handle topology changes correctly by resetting the cursor if a failover occurs between (or during) iterations.

What do you think?

@TheIndra55

…ration The original implementation acquired the master client lease once and held it for the whole iteration. With the default `masterPoolSize` of 1, any command issued from inside the `for await` loop body (e.g. `sentinel.mGet(keys)`) would wait for a free slot that the iterator never released, hanging the caller indefinitely (reported by @TheIndra55). Move the acquire/release into the per-page loop so the pool is free between pages and consumer commands can run. The persistent `topology-change` listener still resets the cursor on `MASTER_CHANGE`, and if a failover occurs during a scan call the partial reply is discarded. Docs: - Clarify that the iterator inherits SCAN's standard guarantees plus an extra source of duplicates on failover. - Replace the "Set or Bloom filter" wording with a `Set` recommendation and a separate note that Bloom filters trade correctness for memory. - Document the per-page lease behaviour so users know the iterator is safe to combine with other commands inside the loop body. Tests: add a regression test that issues `mGet` from inside the loop on the default `masterPoolSize: 1` setup and asserts no deadlock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nkaradzhov · 2026-06-02T14:08:28Z

Hi @harshrai654 — first off, apologies for sitting on this for so long.
I pushed your Dec 9 proposal (acquire/release per scan page) onto the branch and rebased over master:

Reworked the docs (docs/sentinel.md) to
- (a) make the Set vs Bloom filter trade-off explicit per @elimelt's nit,
- (b) call out the divergence from standalone SCAN guarantees, and
- (c) document the per-page lease behavior so users know commands inside the loop are safe.
Added a regression test (should not deadlock when consumer runs commands inside the loop with masterPoolSize 1)

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Reviewed by Cursor Bugbot for commit 27f3d53. Configure here.}

cursor · 2026-06-02T14:11:55Z

+
+        cursor = reply.cursor;
+        yield reply.keys;
+      } while (cursor !== "0");


Failover during cursor-zero scan silently terminates iterator

High Severity

When a MASTER_CHANGE event fires during a scan call where cursor is "0" (e.g., the very first iteration, or right after a failover restart), the continue at line 699 jumps to the do…while condition. Because line 702 (cursor = reply.cursor) was skipped, cursor is still "0", so cursor !== "0" evaluates to false and the loop exits immediately — yielding zero keys and silently terminating the iterator instead of restarting the scan on the new master.

^{Reviewed by Cursor Bugbot for commit 27f3d53. Configure here.}

jit-ci Bot reviewed Nov 26, 2025

View reviewed changes

harshrai654 force-pushed the sentinel-scan-iterator branch from 65129d2 to 1f2471f Compare November 28, 2025 17:45

jit-ci Bot reviewed Nov 28, 2025

View reviewed changes

elimelt reviewed Dec 2, 2025

View reviewed changes

nkaradzhov mentioned this pull request Jun 2, 2026

feat(sentinel): add scan iterator #3278

Closed

harshrai654 and others added 3 commits June 2, 2026 17:04

Sentinel scan iterator

2d2fc5f

fix: cleaning up frame in between tests

074feec

nkaradzhov force-pushed the sentinel-scan-iterator branch from 2046035 to 27f3d53 Compare June 2, 2026 14:05

cursor Bot reviewed Jun 2, 2026

View reviewed changes

Conversation

harshrai654 commented Nov 26, 2025 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

jit-ci Bot commented Nov 26, 2025

Uh oh!

jit-ci Bot left a comment

Choose a reason for hiding this comment

Uh oh!

nkaradzhov commented Nov 28, 2025

Uh oh!

jit-ci Bot left a comment

Choose a reason for hiding this comment

Uh oh!

harshrai654 commented Nov 29, 2025

Uh oh!

elimelt Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

harshrai654 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

elimelt Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

nkaradzhov commented Dec 2, 2025

Uh oh!

harshrai654 commented Dec 2, 2025

Uh oh!

TheIndra55 commented Dec 8, 2025

Uh oh!

harshrai654 commented Dec 9, 2025

Uh oh!

nkaradzhov commented Jun 2, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 2, 2026

Choose a reason for hiding this comment

Failover during cursor-zero scan silently terminates iterator

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

harshrai654 commented Nov 26, 2025 •

edited by cursor Bot

Loading