fix: resolve race condition in file logging for /ai/log_correction endpoint#2152
Conversation
|
@ArshVermaGit is attempting to deploy a commit to the ritesh Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughThis PR migrates ticket storage from in-memory to Supabase-backed persistence, hardens endpoint authentication against user-id spoofing, adds concurrency safety to the corrections log, and offloads long-running AI operations to background threads to prevent event loop blocking. ChangesTicket Storage & Concurrency Hardening
AI Pipeline Threading
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ArshVermaGit
left a comment
There was a problem hiding this comment.
Hi @ritesh-1918 ! The issue has been resolved. Please review the PR and merge it under GSSoC. Thanks!
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/main.py (1)
647-682:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftTicket creation and the initial system message are not atomic.
If the
ticketsinsert succeeds andticket_messagesfails, this handler returns 500 even though the ticket row already exists. Because ticket inserts also trigger the notifier webhook, a client retry can create duplicate tickets and duplicate notifications. This needs a single DB transaction/RPC or an idempotent compensation strategy.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/main.py` around lines 647 - 682, The ticket creation and the initial system message must be performed atomically to avoid partial commits; replace the separate supabase.table("tickets").insert(...) and supabase.table("ticket_messages").insert(...) calls with a single database transaction or server-side RPC that inserts the ticket (using final_data), inserts the initial system message (message/msg, sender_id, sender_name, sender_role) and returns the new ticket id in one operation, and only after the transaction succeeds call duplicate_service.add_ticket(str(ticket_id), duplicate_text); ensure the notifier webhook is invoked only after the transaction completes (or move webhook firing into the same RPC) and preserve the existing duplicate_index_warning/duplicate_indexed handling in the non-transactional duplicate indexing path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/.env.example`:
- Around line 1-3: The .env.example is missing the Supabase service-role key
used by backend/services/auto_close_service.py; add a new entry named
SUPABASE_SERVICE_ROLE_KEY alongside SUPABASE_URL and SUPABASE_ANON_KEY so
environments include the service role secret required by the
auto-close/background job path and documents its purpose (e.g., indicate it's
the service role secret for server-side operations used by
auto_close_service.py).
In `@backend/main.py`:
- Around line 706-728: The create_ticket and update_ticket endpoints accept raw
dicts and allow unguarded writes; either remove these routes or harden them by
reusing the same auth helper used by /tickets/save to validate the bearer token,
user-id and tenant, then validate and sanitize the request body against a
whitelisted Pydantic model (only allowed fields like title, body, status, etc.),
explicitly strip or reject any server-controlled columns (id, created_at,
tenant_id, user_id, viewed_at), and only pass the sanitized payload to
supabase.table("tickets").insert(...) or .update(...). Ensure update_ticket also
verifies the ticket belongs to the authenticated user/tenant before applying
updates.
- Around line 38-47: The current shared supabase client (created via
create_client and stored in supabase) only validates tokens with
supabase.auth.get_user(token) but does not bind the caller JWT to subsequent
table(...).select/insert/update calls, so RLS won’t apply; fix by creating a
per-request Supabase Client or session using the incoming JWT (either build a
per-request Client with ClientOptions including headers Authorization: Bearer
<jwt> or call client.auth.set_session(...) with the token) and use that
per-request client instance for all profiles/tickets DB operations instead of
the global supabase object (referencing create_client, Client, supabase,
supabase.auth.get_user, table(...).select/insert/update, and auth.set_session).
---
Outside diff comments:
In `@backend/main.py`:
- Around line 647-682: The ticket creation and the initial system message must
be performed atomically to avoid partial commits; replace the separate
supabase.table("tickets").insert(...) and
supabase.table("ticket_messages").insert(...) calls with a single database
transaction or server-side RPC that inserts the ticket (using final_data),
inserts the initial system message (message/msg, sender_id, sender_name,
sender_role) and returns the new ticket id in one operation, and only after the
transaction succeeds call duplicate_service.add_ticket(str(ticket_id),
duplicate_text); ensure the notifier webhook is invoked only after the
transaction completes (or move webhook firing into the same RPC) and preserve
the existing duplicate_index_warning/duplicate_indexed handling in the
non-transactional duplicate indexing path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 628be068-a2ed-4d8c-9b14-d2e5fbeb741d
📒 Files selected for processing (2)
backend/.env.examplebackend/main.py
| # Supabase Configuration | ||
| SUPABASE_URL=https://your-project.supabase.co | ||
| SUPABASE_SERVICE_KEY=your-service-key | ||
| SUPABASE_ANON_KEY=your-anon-key |
There was a problem hiding this comment.
Keep the backend service-role secret documented here.
This template now only shows the anon key, but backend/services/auto_close_service.py still creates its own Supabase client from SUPABASE_SERVICE_ROLE_KEY. Fresh environments copied from this file will start the API and silently break the auto-close/background job path.
Suggested env template update
# Supabase Configuration
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=your-anon-key
+SUPABASE_SERVICE_ROLE_KEY=your-service-role-key📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Supabase Configuration | |
| SUPABASE_URL=https://your-project.supabase.co | |
| SUPABASE_SERVICE_KEY=your-service-key | |
| SUPABASE_ANON_KEY=your-anon-key | |
| # Supabase Configuration | |
| SUPABASE_URL=https://your-project.supabase.co | |
| SUPABASE_ANON_KEY=your-anon-key | |
| SUPABASE_SERVICE_ROLE_KEY=your-service-role-key |
🧰 Tools
🪛 dotenv-linter (4.0.0)
[warning] 3-3: [UnorderedKey] The SUPABASE_ANON_KEY key should go before the SUPABASE_URL key
(UnorderedKey)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/.env.example` around lines 1 - 3, The .env.example is missing the
Supabase service-role key used by backend/services/auto_close_service.py; add a
new entry named SUPABASE_SERVICE_ROLE_KEY alongside SUPABASE_URL and
SUPABASE_ANON_KEY so environments include the service role secret required by
the auto-close/background job path and documents its purpose (e.g., indicate
it's the service role secret for server-side operations used by
auto_close_service.py).
| # Initialize Supabase Client (Anon Key to respect RLS) | ||
| try: | ||
| from supabase import create_client, Client | ||
| url = os.environ.get("SUPABASE_URL") | ||
| key = os.environ.get("SUPABASE_SERVICE_KEY") | ||
| key = os.environ.get("SUPABASE_ANON_KEY") | ||
| if not url or not key: | ||
| print("[ERROR] SUPABASE_URL or SUPABASE_SERVICE_KEY not set in backend/.env") | ||
| print("[ERROR] SUPABASE_URL or SUPABASE_ANON_KEY not set in backend/.env") | ||
| supabase = None | ||
| else: | ||
| supabase = create_client(url, key) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
For supabase-py 2.22.4, if a client is created once with create_client(SUPABASE_URL, SUPABASE_ANON_KEY)and a request later callssupabase.auth.get_user(token), do subsequent table(...).select/insert/updatecalls automatically run with that user's JWT for RLS, or must the request create/set a client with the incomingAuthorization: Bearer header per request?
💡 Result:
When you call supabase.auth.get_user(token), the Supabase Python client validates the provided JWT but does not automatically associate that token with subsequent database operations (e.g., table.select). Row Level Security (RLS) relies on the Authorization header of the specific database request, so simply validating a user via get_user is insufficient to propagate that user's identity to your database queries [1][2][3]. To perform database operations on behalf of a user, you must explicitly set their session context so that the client includes the appropriate JWT in the Authorization header for those requests [1][4][3]. There are two primary ways to achieve this: 1. Use set_session: If you have a client instance, you can use client.auth.set_session(access_token=jwt, refresh_token=...) [3]. This updates the client's internal state to use that JWT for subsequent requests [4][5]. 2. Initialize with ClientOptions: For server-side or stateless requests (such as in FastAPI or other backend frameworks), it is often recommended to create a new client instance per request by passing the JWT in the headers via ClientOptions [6][1][7]: from supabase import create_client, ClientOptions options = ClientOptions(headers={"Authorization": f"Bearer {your_jwt}"}) client = create_client(SUPABASE_URL, SUPABASE_ANON_KEY, options=options) This ensures the client correctly sends the user's JWT in the Authorization header for all API calls, allowing Supabase to enforce RLS policies based on that user's identity [6][8]. Note: Earlier versions of the library had bugs where the client would overwrite custom Authorization headers with the anon key, but these have been addressed in later versions of the 2.x release line [2][8][9]. If you are working in a backend context where you have elevated privileges, you might alternatively use the service role key, though this bypasses RLS and should be used with caution [4][3]. Top results: [6][1][8][7][3]
Citations:
- 1: Supabase Storage API Session is None in FastAPI with AuthBearer supabase/supabase-py#869
- 2: Create a client with Auth context of a user supabase/supabase-py#420
- 3: https://stackoverflow.com/questions/78050169/supabase-with-jwt-auth-from-python-client-doesnt-work-with-rls
- 4: cannot authenticate user supabase/supabase-py#645
- 5: fix: refresh token updates auth headers and sets realtime auth supabase/supabase-py#1171
- 6: Improving the flow of handling sessions for better clarity due to RLS issues supabase/supabase-py#915
- 7: bug: client access token for auth admin API is not properly refreshed supabase/supabase-py#1143
- 8: fix(user auth context): do not overwrite provided client options Authorization header supabase/supabase-py#766
- 9: Create a client with Auth context of a user supabase/supabase-py#420
Bind the caller’s JWT to Supabase requests so RLS actually applies
supabase.auth.get_user(token) only validates the JWT; with supabase-py, it does not automatically associate that JWT with subsequent table(...).select/insert/update calls made on a shared create_client(SUPABASE_URL, SUPABASE_ANON_KEY) instance. As a result, the later profiles/tickets DB operations won’t run in the caller’s RLS context. Create a per-request client/session (e.g., per-request ClientOptions with Authorization: Bearer <jwt> or client.auth.set_session(...)) and use that client for every ticket/profile query.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/main.py` around lines 38 - 47, The current shared supabase client
(created via create_client and stored in supabase) only validates tokens with
supabase.auth.get_user(token) but does not bind the caller JWT to subsequent
table(...).select/insert/update calls, so RLS won’t apply; fix by creating a
per-request Supabase Client or session using the incoming JWT (either build a
per-request Client with ClientOptions including headers Authorization: Bearer
<jwt> or call client.auth.set_session(...) with the token) and use that
per-request client instance for all profiles/tickets DB operations instead of
the global supabase object (referencing create_client, Client, supabase,
supabase.auth.get_user, table(...).select/insert/update, and auth.set_session).
| @app.post("/tickets") | ||
| @limiter.limit("30/minute") | ||
| async def create_ticket(ticket: dict, request: Request): | ||
| """Save a new ticket into the system (persisted to Supabase).""" | ||
| if not supabase: | ||
| raise HTTPException(status_code=500, detail="Database connection not initialized") | ||
|
|
||
| res = supabase.table("tickets").insert(ticket).execute() | ||
| if not res.data: | ||
| raise HTTPException(status_code=400, detail="Failed to create ticket") | ||
|
|
||
| print(f"[DB] Ticket created: {res.data[0].get('id')}") | ||
| return res.data[0] | ||
|
|
||
|
|
||
| @app.patch("/tickets/{ticket_id}") | ||
| @limiter.limit("60/minute") | ||
| async def update_ticket(ticket_id: str, updates: dict, request: Request): | ||
| """Partially update a ticket's fields (e.g., status, viewed_at) via Supabase.""" | ||
| if not supabase: | ||
| raise HTTPException(status_code=500, detail="Database connection not initialized") | ||
|
|
||
| raise HTTPException(status_code=404, detail="Ticket not found") | ||
| res = supabase.table("tickets").update(updates).eq("id", ticket_id).execute() |
There was a problem hiding this comment.
POST /tickets and PATCH /tickets/{ticket_id} reopen an unguarded write path.
These endpoints accept raw dict payloads and write them directly to tickets without the bearer-token, user-id, or tenant checks added to /tickets/save. That leaves an alternate route that can bypass the new spoofing protection and mutate server-controlled columns if table policies allow it. Please either remove these routes or apply the same auth helper plus a whitelisted request model for allowed fields.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/main.py` around lines 706 - 728, The create_ticket and update_ticket
endpoints accept raw dicts and allow unguarded writes; either remove these
routes or harden them by reusing the same auth helper used by /tickets/save to
validate the bearer token, user-id and tenant, then validate and sanitize the
request body against a whitelisted Pydantic model (only allowed fields like
title, body, status, etc.), explicitly strip or reject any server-controlled
columns (id, created_at, tenant_id, user_id, viewed_at), and only pass the
sanitized payload to supabase.table("tickets").insert(...) or .update(...).
Ensure update_ticket also verifies the ticket belongs to the authenticated
user/tenant before applying updates.
|
Hi @ArshVermaGit! Thanks for the contribution. I have triaged your PR and set it to merge into the
Welcome to the HELPDESK.AI developer family! 🚀💻 |
Summary
This PR fixes a critical data-corruption bug in the admin correction logging mechanism where concurrent requests to /ai/log_correction could mangle the corrections_log.json file.
Changes Made
Introduced a global asyncio.Lock() instance (corrections_log_lock) in backend/main.py.
Wrapped the read-modify-write logic for corrections_log.json inside an async with corrections_log_lock: block.
Ensures strict sequential file access during periods of high concurrency, completely eliminating the race condition.
Resolved Issue
Resolves #2151
Summary by CodeRabbit
New Features
Bug Fixes
Performance Improvements