Skip to content

perf(redis): use sorted set index for O(log N) schedule claiming#14

Open
isimisi wants to merge 4 commits into
boringnode:mainfrom
Juristic:feat/zset-due-index
Open

perf(redis): use sorted set index for O(log N) schedule claiming#14
isimisi wants to merge 4 commits into
boringnode:mainfrom
Juristic:feat/zset-due-index

Conversation

@isimisi
Copy link
Copy Markdown

@isimisi isimisi commented May 15, 2026

explanation: #13 (comment)

@RomainLanz
Copy link
Copy Markdown
Member

Hi!

Thanks for the PR.

We need a migration path for existing Redis users before merging this.

With this change, claimDueSchedule() only reads from the new schedules::due ZSET, but users upgrading from the current version will only have the existing schedule hashes plus schedules::index. Those schedules will never be claimed until something manually rewrites them or calls backfillDueIndex().

Can we make the migration automatic/idempotent, or expose and document a required migration step? For example, on startup or first claim, backfill schedules::due from schedules::index for active schedules with next_run_at.

@isimisi
Copy link
Copy Markdown
Author

isimisi commented May 26, 2026

Hey @RomainLanz, great point. I completely missed the migration path for existing users.

I'll add an automatic backfill that runs on startup: on first claimDueSchedule() call (or worker boot), scan schedules::index and populate schedules::due from any schedule that has a next_run_at. I'll make it idempotent so it's safe to run multiple times across worker restarts or multi-instance deployments.

Existing users upgrading will have schedules in the legacy format
(hashes + SET) but not in the new ZSET. Run backfillDueIndex() once
per worker process on the first claimDueSchedule() call so schedules
keep firing without manual intervention.
@isimisi
Copy link
Copy Markdown
Author

isimisi commented Jun 2, 2026

Hey @RomainLanz, I pushed a follow-up commit that handles the migration automatically.

On the first claimDueSchedule() call per worker process, it runs backfillDueIndex() to populate the ZSET from existing schedules. An instance flag skips it on subsequent calls, so the cost is a one-time.

The approach is idempotent (ZADD with the same score is a no-op), safe under concurrent workers, and self-healing if the ZSET is ever cleared. backfillDueIndex() is public for anyone who wants manual control.

Does this work for u? Otherwise the we remove the automatic backfill and let users handle it themselves one-time, then we don't need to have a check on claimDueSchedule().

@RomainLanz
Copy link
Copy Markdown
Member

Oops, I have pushed some changes.

Can you resolve the conflicts? 😅

@isimisi
Copy link
Copy Markdown
Author

isimisi commented Jun 3, 2026

Should be good now! We have some formatting differences in our editors it seem. Don't know if you care about that haha

@RomainLanz
Copy link
Copy Markdown
Member

RomainLanz commented Jun 3, 2026

Thanks for resolving the conflicts and adding the automatic backfill.

I'm still not fully comfortable merging this as-is.

The main concern is that schedules::due becomes a second source of truth for next_run_at. claimDueSchedule() now trusts the ZSET score and does not re-check the schedule hash next_run_at before claiming. If the ZSET ever contains a stale due score while the hash says the schedule is not due anymore, we could claim it too early.

Since the hash is still the canonical schedule data everywhere else, can we keep the ZSET as a derived index only? For example, the Lua script could validate next_run_at <= now from the hash before claiming, and repair/remove stale ZSET entries when it finds them.

I'm also a bit concerned about the automatic backfill running on the first claimDueSchedule() call per adapter instance. That hides an O(N) scan in the scheduler hot path, and multiple workers can all discover/run it at startup. I think I’d prefer either an explicit migration/bootstrap step, or a Redis-level idempotent backfill marker/lock.

The direction is good, but before merging I'd like us to tighten the invariant: the schedule hash stays canonical, schedules::due stays derived, and claiming repairs stale index entries instead of trusting them.

PS: You can run yarn format:fix to fix the formating

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants