Serve direct thumbnail URLs to reduce load on main app server#1331
Serve direct thumbnail URLs to reduce load on main app server#1331mihow wants to merge 64 commits into
Conversation
…edirects, and add index for (source_image, label).
Conflicts: - ami/main/tests.py: combined import additions (create_captures_from_files + create_detections, both used). - ami/main/migrations/0088_*: renamed 0088_sourceimagethumbnail_and_more.py to 0089_*, repointed dependency to the new 0088_detection_det_srcimg_created_idx from main. Linear chain, no merge migration needed.
JPEG natively supports only L, RGB, and CMYK modes. Source images in RGBA/P/LA/PA modes (common for uploaded PNGs) raised "OSError: cannot write mode <X> as JPEG" on `img.save(format="JPEG")`, 500-ing the thumbnail endpoint. Convert to RGB first when needed. Adds a regression test that feeds an RGBA PNG through find_or_generate and asserts the redirect succeeds and the thumbnail row lands. Closes review thread on ami/main/models.py L2241. Co-Authored-By: Claude <noreply@anthropic.com>
Two changes, one root cause.
`SourceImage.last_modified` tracks the source bytes' mtime (S3 LastModified
on the sync path) and is the freshness signal the thumbnail regen check uses
to decide whether to invalidate a cached thumbnail. It is NOT the same as
`updated_at` (BaseModel auto_now), which bumps on every row save (cached
counts, detection counts, etc.) and would force regen far too often.
The upload path (`create_source_image_from_upload`) never set this field,
so:
1. Uploaded captures had `last_modified=None` forever, breaking parity
with sync-created rows.
2. The thumbnail regen check `thumb.last_modified < self.last_modified`
raised `TypeError` on the 2nd+ request for any uploaded capture's
thumbnail (datetime < None).
Fixes:
* `create_source_image_from_upload`: set `last_modified=timezone.now()` so
uploaded captures have a real source mtime, matching the sync path's
behaviour of copying the S3 LastModified header.
* `find_or_generate_thumbnail_for_label`: guard the comparison. None on
either side now means "no signal of source change, trust the cached
thumb" instead of crashing. Covers legacy rows synced before the upload
backfill.
Tests:
* `test_thumbnail_source_last_modified_none_does_not_crash` reproduces the
TypeError class of bug by nulling `last_modified` after a thumbnail row
exists and asserts the 2nd request reuses the cache.
* `test_upload_handler_backfills_last_modified` covers the upload-path
backfill (asserts the new field falls inside [before, after] of the
upload call).
Closes review thread on ami/main/models.py L2227.
Co-Authored-By: Claude <noreply@anthropic.com>
Two F541 (Ruff: f-string without placeholders) hits, plus a small behavioural cleanup on the list action. * `SourceImageThumbnailViewSet.list` now raises `MethodNotAllowed` (405) instead of `NotFound` (404). The route exists for `/captures/thumbnails/<pk>/?label=...` only; listing has no defined semantics. 405 sets the Allow header and is the correct REST response. * Updated `test_thumbnail_no_list` to expect 405 and to use a plain string literal (the previous test used an f-string with no placeholders). Closes review thread on ami/main/tests.py L237 (Ruff F541). Co-Authored-By: Claude <noreply@anthropic.com>
When `settings.THUMBNAILS['SIZES']` is empty (misconfiguration), the default-label fallback `next(iter(_sizes))` raised `StopIteration` and surfaced as a 500 with no useful detail. Replace with an explicit empty check that returns 404 with a clear API error message, and use `next(iter(_sizes))` only after we know the dict is non-empty (also switch the fallback expression to `or next(...)` since the previous form passed `next(iter(...))` as the default arg unconditionally — fine for non-empty dicts but still worth tidying alongside the guard). Adds a test that overrides THUMBNAILS with an empty SIZES dict and asserts the endpoint returns 404 with the configured-error message. Co-Authored-By: Claude <noreply@anthropic.com>
`requests.get(url)` with no timeout blocks indefinitely if the upstream storage is slow or unreachable. The thumbnail-generation path (`SourceImage.find_or_generate_thumbnail_for_label`) runs this synchronously inside a Django web request, so a stuck upstream ties up a gunicorn worker; a gallery's worth of stuck calls drains the pool. Default timeout is `(connect=5s, read=30s)`. Callers that need different limits (e.g. very large originals on a slow link) can override by passing `timeout=`. Behaviour for fast / responsive upstreams is unchanged. Tests verify the default is finite and that explicit overrides are honored. Co-Authored-By: Claude <noreply@anthropic.com>
…rent-gen race Two related bugs in `find_or_generate_thumbnail_for_label`: 1. Under concurrent gen for the same (source_image, label) pair the loser could either (a) hit the new UniqueConstraint and 500 with IntegrityError when its delete-loop ran before the winner's INSERT, or (b) destroy the winner's row + storage blob when the delete-loop ran after. Either outcome is wrong; both were possible because the code did `filter(label=label).delete()` then `thumbnails.create(...)` instead of a single atomic upsert. 2. The pre-save delete-loop also deleted the storage blob BEFORE the new `default_storage.save()`, which on storage backends that suffix colliding keys (FileSystemStorage default) left the row pointing at a path that the next save could overwrite in the same way the racer would. Belt-and-braces failure where the cleanup order itself was the bug. Fix: * Replace the delete-then-create with `update_or_create` keyed on `label`. Django serializes via the UniqueConstraint and, on collision, catches IntegrityError on INSERT and re-runs as UPDATE. No 500s, no destructive deletes of concurrent rows. * Snapshot the prior `path` before the new save and clean it up after the row update only when the backend gave us a new key (`prior_path != thumbnail_path`). Backends that overwrite in place (boto3 with `AWS_S3_FILE_OVERWRITE=True`) skip the cleanup and avoid the file-deletion race entirely. * Force-bump `thumb.last_modified` on the UPDATE branch via `QuerySet.update` because `auto_now_add=True` only fires on INSERT — without this, regen via source-mtime bump would re-trigger regen on every subsequent request as the field's pre_save callback preserved the existing (now-stale) value. Tests: * `test_thumbnail_regen_reuses_row_via_upsert`: delete the storage blob out from under a cached row, re-request, assert the row's PK is preserved (proving the UPDATE path was taken, not a fresh INSERT). * `test_thumbnail_exists_newer_modified_source` updated to assert row reuse + freshness bump rather than the previous (incidental) PK change. The `SourceImageThumbnail.last_modified` field semantics are still muddled (see PR thread; rename or drop tracked separately). Co-Authored-By: Claude <noreply@anthropic.com>
✅ Deploy Preview for antenna-preview canceled.
|
…EAD, progressive JPEG, redirect caching, blank-path guard) into feat/thumbnails-serializer-direct-urls Co-Authored-By: Claude <noreply@anthropic.com> # Conflicts: # ami/main/models.py # ami/main/tests.py
…ec width The generator records the requested spec width on the row (see parent branch), so the warm-check can use strict equality instead of the 2px tolerance that compensated for PIL's aspect-ratio rounding. Legacy rows that stored encoder output read as stale, fall back to the route URL, and self-heal with a single regeneration. Co-Authored-By: Claude <noreply@anthropic.com>
The test pinned a Pillow encoder flag rather than behavior we own; it served its purpose during development. Co-Authored-By: Claude <noreply@anthropic.com>
…facts Several comment blocks added during review explained why a change was correct relative to code that no longer exists. Keep only the constraints a future reader needs (spec-width vs encoder output, auto_now_add bump, presign-lifetime bound); drop the change-justification prose, which lives in the commit history. Co-Authored-By: Claude <noreply@anthropic.com>
…thumbnails-serializer-direct-urls Co-Authored-By: Claude <noreply@anthropic.com>
Same pass as the parent branch: keep the contract statements (prefetch required, annotation required, strict width semantics) and drop the review-era justification prose. Co-Authored-By: Claude <noreply@anthropic.com>
… sweep Co-Authored-By: Claude <noreply@anthropic.com>
…to feat/thumbnails-serializer-direct-urls Co-Authored-By: Claude <noreply@anthropic.com>
| # ``thumb.width`` stores the requested spec width (see the generator), so | ||
| # strict equality works; legacy encoder-width rows read stale and self-heal. | ||
| width_match = thumb is not None and thumb.width == spec["width"] | ||
| warm = thumb is not None and thumb.path and width_match and not thumb.is_source_changed |
There was a problem hiding this comment.
So this "warm" (thumb exists and is valid) check is implemented here (depending on a new annotation to the queryset) and in SourceImage.find_or_generate_thumbnail_for_label. In both cases we have already queried/loaded a source image and thumb (or None) objects, my inclination would be to extract the non-annotated version as a helper method e.g. SourceImage.thumbnail_is_valid(self, spec, thumb).
There was a problem hiding this comment.
thanks for reviewing again @loppear! I implemented the singular helper as you suggested. I did add an extra guard that should ensure the thumbnails are prefetched and the thumbnail_is_valid check does not trigger n+1 queries. I'd like to integrate Django zen-queries into the project sometime this year to catch introductions of n+1 queries automatically, since that has been the main source of slow load times as the database grows
Single-sources the 'is this thumbnail good?' predicate so the generator gate and the warm/cold URL check (#1331) share one definition. The helper takes (spec, thumb) and compares in memory; legacy encoder-width rows read invalid and self-heal on next generation. Co-Authored-By: Claude <noreply@anthropic.com>
…force prefetch Extracts SourceImage.thumbnail_is_valid(spec, thumb) so the generator regen gate and the warm/cold URL decision in thumbnail_urls share one in-memory predicate (per Luke's review on #1331). With the predicate no longer needing SQL, with_thumbnails() collapses to a plain prefetch_related('thumbnails'). thumbnail_urls now guards on prefetched thumbnails and raises if missing, turning a forgotten prefetch into a loud error instead of a silent N+1 (TODO: drop once django-zen-queries enforces this globally). The guard surfaced one un-prefetched caller, Event.first_capture(), now wired through with_thumbnails(). Co-Authored-By: Claude <noreply@anthropic.com>
1ebb2d5 to
15677a2
Compare
a80d20d to
63d103f
Compare
…lizer-direct-urls # Conflicts: # ami/main/api/serializers.py # ami/main/models.py # ami/main/tests.py
✅ Deploy Preview for antenna-ssec canceled.
|
There was a problem hiding this comment.
Pull request overview
This PR updates the captures/events API to emit direct storage thumbnail URLs for “warm” thumbnails (cached + matching current spec + source not changed) while keeping the existing redirect endpoint URL for cold/stale thumbnails, reducing per-gallery HTTP fan-out to Django without introducing DB N+1s by enforcing a thumbnail prefetch contract.
Changes:
- Add
SourceImage.thumbnail_urls()(warm/cold URL selection) andSourceImageQuerySet.with_thumbnails()(prefetch contract) in the model layer. - Switch thumbnail serialization to delegate to
SourceImage.thumbnail_urls()and update view querysets/prefetches to include.with_thumbnails(). - Add tests covering warm-path direct URLs, stale conditions (width mismatch, legacy width, source-changed), and the prefetch requirement.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| ami/main/models.py | Adds warm/cold thumbnail URL logic and a queryset helper to prefetch thumbnails; reuses the validity predicate for generation and serialization. |
| ami/main/api/serializers.py | Routes the thumbnails field through SourceImage.thumbnail_urls() and documents the required prefetch contract. |
| ami/main/api/views.py | Applies .with_thumbnails() to captures querysets and event example-captures prefetch to satisfy the serializer/model contract. |
| ami/main/tests.py | Adds regression tests for warm direct URLs, stale fallbacks, source-changed handling, and the “requires prefetch” guard. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@coderabbitai full review |
✅ Action performedFull review finished. |
The strict prefetch guard raised on every non-list serialization of a SourceImage through the shared serializer — notably the create/update write responses, which serialize a freshly-saved instance with no prefetch cache (500 on POST /captures/). Replace the raise with a no-query fallback: when thumbnails aren't prefetched, emit route URLs for every label without querying. This keeps the zero-N+1 guarantee (never a per-object SELECT) while not breaking single-object serialization; list endpoints still get warm storage URLs via with_thumbnails(), pinned by the list query-count tests. Co-Authored-By: Claude <noreply@anthropic.com>
…dth test The runtime no longer raises when thumbnails are unprefetched (it falls back to route URLs without querying), so the warm-list contract is enforced only by tests. Add a multi-row test asserting a warm captures list serves direct storage URLs while hitting the thumbnail table at most once — guards both a dropped with_thumbnails() (route URLs, perf win lost) and a per-row lazy read (N+1). Merge the two strict-width-mismatch tests (settings-changed +100 and legacy encoder rounding -1) into one subTest-parametrized case; they exercised the same branch. Fix a stale comment that claimed thumbnail_urls still raises without a prefetch. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ami/main/models.py`:
- Around line 1435-1468: The _regroup_lock function has a race condition in its
finally block where the separate cache.get() and cache.delete() operations on
lock_key allow another worker to acquire the same lock between those two calls
if the TTL expires. Replace the two separate operations with a single atomic
compare-and-delete operation using a Lua script or Redis primitive (via
django_redis's script support) that verifies the token matches and deletes the
lock in one indivisible step, preventing the TTL expiration window from allowing
lock acquisition by another worker between the get and delete calls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6ebd9a13-5f3f-4f60-a577-0dc6f1f464b0
📒 Files selected for processing (4)
ami/main/api/serializers.pyami/main/api/views.pyami/main/models.pyami/main/tests.py
🚧 Files skipped from review as they are similar to previous changes (2)
- ami/main/api/views.py
- ami/main/api/serializers.py
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ami/main/models.py`:
- Around line 1435-1468: The _regroup_lock function has a race condition in its
finally block where the separate cache.get() and cache.delete() operations on
lock_key allow another worker to acquire the same lock between those two calls
if the TTL expires. Replace the two separate operations with a single atomic
compare-and-delete operation using a Lua script or Redis primitive (via
django_redis's script support) that verifies the token matches and deletes the
lock in one indivisible step, preventing the TTL expiration window from allowing
lock acquisition by another worker between the get and delete calls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6ebd9a13-5f3f-4f60-a577-0dc6f1f464b0
📒 Files selected for processing (4)
ami/main/api/serializers.pyami/main/api/views.pyami/main/models.pyami/main/tests.py
🚧 Files skipped from review as they are similar to previous changes (2)
- ami/main/api/views.py
- ami/main/api/serializers.py
🛑 Comments failed to post (1)
ami/main/models.py (1)
1435-1468:
⚠️ Potential issue | 🟠 Major🧩 Analysis chain
🏁 Script executed:
#!/bin/bash set -euo pipefail printf '\n== Cache backend configuration ==\n' rg -n -C3 'CACHES\s*=|django_redis|LocMemCache|MemcachedCache|RedisCache|CACHE_URL' \ --glob '*.py' --glob '*.env*' --glob '*.yml' --glob '*.yaml' printf '\n== Regroup task time limits / lock TTL ==\n' rg -n -C4 'regroup_events|soft_time_limit|time_limit|REGROUP_LOCK_TTL_SECONDS' \ --glob '*.py'Repository: RolnickLab/antenna
Length of output: 31016
Make the regroup lock release atomic using a Lua script or Redis primitive.
The
_regroup_lockfunction uses two separate cache operations (cache.get()thencache.delete()on lines 1466–1468) to verify and release the lock. This creates a race condition when a task approaches the 600-second TTL: if the task hitssoft_time_limit(600 seconds) right as the key's TTL expires, another worker can acquire the same lock between the get and delete calls, and the finally block will then delete that new lock. Result: overlapping regroup runs.Since the cache backend is Redis (django-redis), use a Lua script or Redis's atomic primitives (e.g., via
django_redis's script support) to replace the separate get-and-delete with a single compare-and-delete operation that cannot be interrupted by TTL expiry.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ami/main/models.py` around lines 1435 - 1468, The _regroup_lock function has a race condition in its finally block where the separate cache.get() and cache.delete() operations on lock_key allow another worker to acquire the same lock between those two calls if the TTL expires. Replace the two separate operations with a single atomic compare-and-delete operation using a Lua script or Redis primitive (via django_redis's script support) that verifies the token matches and deletes the lock in one indivisible step, preventing the TTL expiration window from allowing lock acquisition by another worker between the get and delete calls.
Stacked on #1306. Base branch:
feat/thumbnail-source-images. Several pieces of the original proposal were promoted to the parent PR on 2026-06-10 so it deploys safely on its own; what remains here is only the warm/cold URL classification (+223/−20 over four files).Summary
If a thumbnail has already been generated, and the source image hasn't changed, list responses return the direct file-storage URL instead of the Django redirect route. Cold or stale thumbnails keep the existing redirect route, which generates them on first request.
Here you can see the small thumbnail has already been fetched, but the medium one has not.

Two separate costs — what this fixes and what it must not introduce
Earlier drafts of this PR (and some review discussion) conflated these, so spelling them out:
Browser HTTP round-trips — what this PR removes. With the parent alone, every thumbnail
<img>is a request throughSourceImageThumbnailViewSet: DRF auth,get_object(), a thumbnail-row check, then a 302 to storage. This is not a database N+1 — it is per-image HTTP fan-out, paid by every client. A 40-card gallery page is 80 of those requests. The parent has since reduced the per-request cost (no storage existence check, and the 302 is now browser-cached for five minutes), but each client still pays the full fan-out on first view and again whenever the cache expires. This PR removes it for warm rows: a measurement on a dev deployment showed Django requests per warm gallery view going from 80 to 0.Database N+1 — what this PR must avoid introducing. The parent's serializer never touches the thumbnail table: it builds route URLs from
obj.pkplus settings, so there is no pre-existing N+1 to fix (an earlier draft wrongly claimed there was — see this comment and Luke's review thread). This PR makes the serializer readobj.thumbnails.all()per row to decide warm vs cold, which would be a new N+1 on its own.SourceImageQuerySet.with_thumbnails()prefetches it as one extra query per page, and the tests pin that contract.In short: the win is fewer HTTP requests to Django; the prefetch is self-defense against a regression this PR would otherwise create, not a query-count improvement over the parent.
Why generation stays lazy (and why the cached row matters)
Captures have a sparse access pattern: a project can hold on the order of a million captures, but users review occurrence crops — captures themselves are opened only when troubleshooting or checking the context around a detection, so only a few hundred per project are ever viewed. That rules out pre-warming thumbnails at upload/sync time (it would spend storage and compute on images nobody opens) and with it the simpler "deterministic URL, no lookup" design, which only works when the blob is guaranteed to exist.
With generation lazy, something must answer "has this thumbnail been generated, at the current spec, and is it still fresh?" per capture and label — that is exactly what the
SourceImageThumbnailrow records, and why this PR reads it (via the prefetch below) instead of constructing URLs blindly.What changes
ami/main/models.pySourceImageQuerySet.with_thumbnails()— a plainprefetch_related("thumbnails")sothumbnail_urlscan decide warm/cold in memory instead of firing a SELECT per row. (An earlier revision pushed anis_source_changedannotation into this prefetch; it was dropped in favour of the in-memorythumbnail_is_validpredicate below, so the same check is shared with the generator.)SourceImage.thumbnail_urls(request)— per-label{label: url}builder, the single source of truth for the warm/cold decision. Warm = a cached row exists with a non-blank path, a width equal to the requested spec width, and no source change →default_storage.url(row.path). Anything else → route URL viareverse_with_params, exactly as the parent serializer built it. The width check is strict equality because the generator (since the spec-width fix in Thumbnails API for SourceImages #1306) records the requested width on the row; legacy rows that stored the encoder's output read as stale here and self-heal through one regeneration.SourceImage.thumbnail_is_valid(spec, thumb)— the in-memory validity predicate (non-blank path, width equal to the spec, source not changed) shared bythumbnail_urlsand the lazy generator gate, so the warm/cold rule lives in one place (per Luke's review).thumbnailswas not prefetched (a freshly created instance in a write response, or a caller that skippedwith_thumbnails), every label falls back to the route URL without querying. This keeps single-object serialization working and guarantees no per-row N+1; it also means the warm-cache win is enforced by the list query-count test rather than at runtime (until Django zen-queries lands).ami/main/api/serializers.pySourceImageThumbnailSerializer.get_thumbnailsbecomes a one-line passthrough:ami/main/api/views.pySourceImageViewSet.get_queryset→.with_thumbnails().EventViewSet.get_queryset's innerexample_capturesPrefetch → same.List of changes
default_storage.url(row.path)when the cached row matches the configured width and isn't source-changed<img src=>); no FE change.SourceImageThumbnailViewSet+ concurrency-safe upsertthumbnail_is_validcompares row vs sourcelast_modified; stale rows fall back to the route URLwith_thumbnails()prefetch is the contract; viewsets wire it; un-prefetched callers fall back to route URLs without querying; a multi-row test asserts a warm list serves storage URLs with one thumbnail querytest_thumbnail_urls_model_method_emits_warm_and_routeexercisesSourceImage.thumbnail_urlsdirectlyContract change
thumbnails.{small,medium}URL shape changes for cached rows fromhttp://<host>/api/v2/captures/thumbnails/<id>/?label=smallto e.g.https://<bucket>.s3.amazonaws.com/thumbnails/capture_<id>/small.jpg.Frontend treats the value as opaque (see
ui/src/data-services/models/capture.ts:170-182,ui/src/pages/captures/capture-columns.tsx). No frontend change required.On deployments using
FileSystemStorage(e.g. local dev),default_storage.url()returns a relative/media/...URL that Django still serves — the "bypass Django" win applies fully only where storage URLs are external (S3/MinIO). Compose with a CDN later viaAWS_S3_CUSTOM_DOMAINand these URLs front-load automatically.Promoted to #1306 (no longer in this diff)
These shipped on the parent so it is solo-deployable: removal of the per-request
default_storage.exists()check, progressive JPEG encoding,Cache-Control: private, max-age=300on the redirect, the blank-path regen guard, and the spec-width fix (replacing the 2px width tolerance an earlier revision of this branch carried).Luke's review threads addressed
reverse_with_paramsalready exists — adopted.lru_cachehelper is obfuscation — dropped; warm path just callsdefault_storage.url(row.path).is_source_changedannotation and warm-path check.What this does NOT change
SourceImageThumbnailViewSet) and its concurrency-safe upsert path are unchanged.Known follow-ups
<img>until the row is removed (consequence of theexists()removal, which lives in Thumbnails API for SourceImages #1306). Planned as a periodic integrity-check task; separate ticket.Event.first_capture()atami/main/models.py:1277is a per-row method that runs a fresh query; pre-existing, unchanged.Verification
<img>requests go direct to storage; zero requests to/api/v2/captures/thumbnails/....test_warm_list_serves_storage_urls_with_single_thumbnail_query).🤖 Generated with Claude Code
Summary by CodeRabbit
Performance
Tests