Skip to content

feat(allocation-policy): Per-org overrides for bytes-scanned policy#7975

Open
phacops wants to merge 4 commits into
masterfrom
feat/per-org-bytes-scanned-overrides
Open

feat(allocation-policy): Per-org overrides for bytes-scanned policy#7975
phacops wants to merge 4 commits into
masterfrom
feat/per-org-bytes-scanned-overrides

Conversation

@phacops
Copy link
Copy Markdown
Contributor

@phacops phacops commented May 28, 2026

Add per-organization controls to BytesScannedRejectingPolicy so individual noisy orgs can be tuned without affecting every other organization.

Per-org scan-limit overrides

Two new configs override the sliding-window scan limit on the organization branch:

  • organization_referrer_scan_limit_override, keyed by (organization_id, referrer)
  • organization_scan_limit_override, keyed by organization_id

Overrides are resolved in order of specificity, with the first one set winning:

(organization_id, referrer)
  > organization_id
  > (all orgs, referrer)      # existing referrer_all_organizations_scan_limit_override
  > default                    # existing organization_referrer_scan_limit

Per-org max_bytes_to_read cap

Two more configs forward a hard max_bytes_to_read value to ClickHouse and bypass the sliding-window check entirely:

  • organization_referrer_max_bytes_to_read, keyed by (organization_id, referrer)
  • organization_max_bytes_to_read, keyed by organization_id

When set, the query is allowed to run at full threads with the cap applied; ClickHouse aborts it if it would scan more than the cap. (org_id, referrer) wins over org_id. This complements the existing global limit_bytes_instead_of_rejecting flow, which only caps queries after a tenant exceeds its scan limit.

The project branch and cross-org behavior are unchanged.

Scope: which queries these levers affect

All four configs above only fire on the policy's organization branch — i.e. when tenant_ids carries organization_id and no project_id. That's typically cross-project work (org-wide Discover, "All Projects" views, subscription-style aggregations across an org).

Single-project queries carry both organization_id and project_id and resolve to the project branch, where none of these overrides fire. The only existing knobs on the project branch remain project_referrer_scan_limit (global default) and referrer_all_projects_scan_limit_override (global per referrer). A per-org override that fires on the project branch is not added here — if a noisy org's per-project traffic is the problem, that lever still needs to be designed.

Concrete usage — raise the cross-project quota for one big org past the current global:

  • set organization_scan_limit_override with params {\"organization_id\": <id>} to the higher byte value
  • (org_id, referrer) can still narrow that later via organization_referrer_scan_limit_override if a specific referrer needs a different number

organization_max_bytes_to_read is a cap, not a limit raise — it pins a hard ClickHouse max_bytes_to_read on every query from that org and bypasses the sliding window. Use it to contain blast radius, not to grant more headroom.

Bug fix on top of the original approval

The Seer review on the first round flagged that the org-cap check ran before the policy resolved whether a query was project-keyed or org-keyed. Because Sentry usually sends both organization_id and project_id, a project-keyed query would have silently picked up the org cap and bypassed the project-level sliding-window rate limit. Fixed by moving the cap check below _get_customer_tenant_key_and_value() and gating it on customer_tenant_key == \"organization_id\". Added a regression test (test_org_caps_do_not_apply_to_project_queries).

Drive-by fixes

Touching the file caused the pre-commit hook to flag pre-existing lint/mypy issues in the same files (E712 truthiness comparisons, untyped tenant_ids dicts, ResourceIdentifier arg type, and a # type: ignore that drifted off the offending line after ruff reformatted it). These are fixed in the same PR since the hook blocks otherwise.

…ngPolicy

Add two new Configuration entries on BytesScannedRejectingPolicy:

- organization_referrer_scan_limit_override, keyed by
  (organization_id, referrer)
- organization_scan_limit_override, keyed by organization_id

Previously the only way to override the scan limit for the
organization branch was the per-referrer override that applied to
every organization, which made it impossible to tune the limit for a
single noisy org without affecting everyone else.

Overrides on the organization branch are now resolved in order of
specificity, with the first one set winning:

  (organization_id, referrer)
    > organization_id
    > (all orgs, referrer)
    > default

The project branch and cross-org behavior are unchanged.

Also fix pre-existing lint/mypy issues in the same files that the
pre-commit hook surfaces once the files are touched (E712 truthiness
asserts, untyped tenant_ids dicts, ResourceIdentifier arg type, and a
misplaced type: ignore exposed by ruff reformatting).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/xXAC6_vLoFmdoaI4mlEdxiC7CUhp2mohJV2MuJoaNPU
@phacops phacops requested a review from a team as a code owner May 28, 2026 00:17
Add two new Configuration entries on BytesScannedRejectingPolicy that
forward a hard max_bytes_to_read value to ClickHouse and bypass the
sliding-window scan limit for the configured organization:

- organization_referrer_max_bytes_to_read, keyed by
  (organization_id, referrer)
- organization_max_bytes_to_read, keyed by organization_id

When either is set the query runs at full threads with the configured
cap and the sliding window is not consulted; ClickHouse aborts the
query if it would scan more than the cap. (org_id, referrer) is more
specific and wins over org_id.

This complements the existing global limit_bytes_instead_of_rejecting
flow, which only caps queries after a tenant exceeds its scan limit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/K7ElxY0inzZzY1icw8PnMWaO4jjRKEoV3AthvEUGg7E
@phacops phacops changed the title feat(allocation-policy): Add per-org overrides to BytesScannedRejectingPolicy feat(allocation-policy): Per-org overrides for bytes-scanned policy May 28, 2026
Comment thread snuba/query/allocation_policies/bytes_scanned_rejecting_policy.py Outdated
…ueries

Sentry queries usually carry both organization_id and project_id, and
the policy resolves those to the project_id branch. The new org cap was
checked before that resolution, so any project query with an
organization_id in tenant_ids would silently pick up the org cap and
bypass the project-level sliding-window limit.

Move the cap check after _get_customer_tenant_key_and_value() and gate
it on customer_tenant_key == "organization_id". Adds a regression test
covering the (org_id + project_id) shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/ZggQ4p_AlWKzjkr6xJZ5kCjchDzv_CTIpf-S8WXChVs
Comment thread snuba/query/allocation_policies/bytes_scanned_rejecting_policy.py
`_get_quota_allowance` bypasses the sliding-window scan limit for
org-keyed queries that run under a per-org `max_bytes_to_read` cap, but
`_update_quota_balance` still recorded those queries' bytes_scanned into
the same window. If the cap is later removed, the window has phantom
usage from the capped period and queries get rejected against quota
they never consumed.

Mirror the bypass in `_update_quota_balance` and add a regression test.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/tvrt5qSDhk3FhvibKVFT_XWjBOhbIdYwDfgHykOwGVw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants