Return HTTP 503 on concurrent rename instead of 500#4646
Conversation
|
Although this change makes sense for me, I think there's a disconnect between Iceberg spec, and our implementation. The spec for |
a20a385 to
d329502
Compare
|
Thanks @nandorKollar , good catch. The rename endpoints define I've updated the PR to map the concurrency statuses ( (If Team prefer server-side retry for rename as the original TODO - that'd be a larger change; happy to do it as a follow-up.) |
| // here because the rename endpoint reserves 409 for "target already exists" (handled by | ||
| // the ENTITY_ALREADY_EXISTS case above). | ||
| case BaseResult.ReturnStatus.TARGET_ENTITY_CONCURRENTLY_MODIFIED: | ||
| case BaseResult.ReturnStatus.ENTITY_CANNOT_BE_RESOLVED: |
There was a problem hiding this comment.
I think we should narrow down this to case BaseResult.ReturnStatus.TARGET_ENTITY_CONCURRENTLY_MODIFIED, I'm not sure about ENTITY_CANNOT_BE_RESOLVED. Can a retry solve the problem with entity resolution?
Thanks, 503 sounds better, but still not the best response code IMHO. I think it is intended to indicate the client to slow down. It seems to me, that Iceberg spec doesn't have a clear response code for rename operations, which indicate that there was a conflict, the client should retry the operation. |
|
Opened a discussion on the dev list: https://lists.apache.org/thread/tr8zh8121t2jb41s0q2yd9s73y2tp2tq |
I'll hold off finalizing until the dev list discussion wraps up, then update this - splitting ENTITY_CANNOT_BE_RESOLVED to 404 and aligning the rest with whatever we decide. Thanks for taking it to the list. |
| @@ -2494,10 +2495,14 @@ private void renameTableLike( | |||
| case BaseResult.ReturnStatus.ENTITY_NOT_FOUND: | |||
There was a problem hiding this comment.
If the goal is to tackle concurrent renames, I think we need to handle CATALOG_PATH_CANNOT_BE_RESOLVED as well. It's raised when a catalog path resolution fails during a write, see TransactionalMetaStoreManagerImpl.renameEntity(). I suggest it be mapped to NoSuchNamespaceException.
| // here because the rename endpoint reserves 409 for "target already exists" (handled by | ||
| // the ENTITY_ALREADY_EXISTS case above). | ||
| case BaseResult.ReturnStatus.TARGET_ENTITY_CONCURRENTLY_MODIFIED: | ||
| case BaseResult.ReturnStatus.ENTITY_CANNOT_BE_RESOLVED: |
There was a problem hiding this comment.
ENTITY_CANNOT_BE_RESOLVED is very similar to CATALOG_PATH_CANNOT_BE_RESOLVED: in TransactionalMetaStoreManagerImpl.renameEntity() the former is raised when the old path cannot be resolved, and the latter, when the new path cannot be resolved.
I'd note though that the NoSQL persistence does not raise CATALOG_PATH_CANNOT_BE_RESOLVED.
I'd suggest to group them together and throw NoSuchNamespaceException instead.
However, NoSuchNamespaceException maps to 404 which is non-retriable. But the comments on BaseResult for both codes say they are retriable:
I actually think the comments are wrong. If either the old or new path has been deleted by a concurrent commit, clients should not retry.
There was a problem hiding this comment.
Interesting info, but IIRC TransactionalMetaStoreManagerImpl is not actually used in actual OSS call paths... perhaps only with the in-memory persistence, but JDBC does not use it either, I'm pretty sure 🤔
There was a problem hiding this comment.
I ALWAYS get fooled by its name 😅
Looking at AtomicOperationMetaStoreManager.renameEntity() this time: oddly enough, it does not raise neither CATALOG_PATH_CANNOT_BE_RESOLVED nor ENTITY_CANNOT_BE_RESOLVED. It actually seems to not care about the validity of the entity path before and after 🤷♂️
There was a problem hiding this comment.
Interesting... 🤔 @vigneshio : How did you hit these errors in practice?
There was a problem hiding this comment.
If ENTITY_CANNOT_BE_RESOLVED is not a valid end expected response of a rename, then shouldn't we handle it as 'everything else', and throw new IllegalStateException( "Unknown error status " + returnedEntityResult.getReturnStatus());?
There was a problem hiding this comment.
ENTITY_CANNOT_BE_RESOLVED is used by the NoSQL persistence.
| // Transient concurrency conditions: surface as 503 so clients can retry. We avoid 409 | ||
| // here because the rename endpoint reserves 409 for "target already exists" (handled by | ||
| // the ENTITY_ALREADY_EXISTS case above). | ||
| case BaseResult.ReturnStatus.TARGET_ENTITY_CONCURRENTLY_MODIFIED: |
There was a problem hiding this comment.
I agree, and it's very unfortunate, that we can't use 409. It describes perfectly what happened, and it's retriable; but the Iceberg spec is very opinionated and maps this code to the "already exists" case (and seems to imply that the error is not retriable, in contradiction with the HTTP spec).
We can't force a 409 for other use cases because clients would surface the error as an AlreadyExistsException:
So, I agree that ServiceUnavailableException is the least worst choice. It's retriable, which is all that counts.
Note: 429 (Too Many Requests) is also retriable but should in theory include a Retry-After header in the response. It may trigger a client throttling that would be undesirable.
There was a problem hiding this comment.
I agree, and it's very unfortunate, that we can't use 409. It describes perfectly what happened, and it's retriable; but the Iceberg spec is very opinionated and maps this code to the "already exists" case (and seems to imply that the error is not retriable, in contradiction with the HTTP spec).
Exactly, this conflict is a non-resolvable conflict, which is normally not solvable with a retry.
There was a problem hiding this comment.
Retry-After is optional in 429 responses, AFAIK. If we go with 429, I'd think we should not set Retry-After to indicate that the server is not asking the client to back off, and the client is free to decide when to retry.
|
@vigneshio it seems there is a consensus on the mailing list that 503 is the best choice among the available options for signaling a conflict error. We should probably open a follow-up issue to implement server-side retries for conflicting rename operations. However, ENTITY_CANNOT_BE_RESOLVED should return a 404 instead. cc. @flyrain @adutra @dimas-b @rmannibucau |
|
I'll just write it for the record since you kind of converge but I still think this lead to a wrong behavior on the client side:
Another thing to consider I think is that if the management of the mutations is implemented as a queue internally (potentially distributed but let's stick to the design) then you never have this ambiguity and plain iceberg status are fine (409, 404 mainly there), so maybe the question is not which status but more how to implement it right - think polaris can just delay the response to the previous execution "end" somehow. The overall concern is using a semantic the client and gateways/proxies know and associate it another meaning leading to a wrong behavior (the 503 global cache - think circuit breakers - is a good example of that). |
d329502 to
15da8eb
Compare
|
Thanks @dimas-b @nandorKollar @adutra . Based on the discussion on the dev list, I've updated the PR:
I have also opened follow-up issue #4729 for server-side retries on rename conflicts. I agree with @rmannibucau that server-side retry is the better long-term solution, and that work can be handled separately. |
Concurrent modification during renameTable/renameView now returns 503 (retriable transient conflict) instead of a bare 500. Source/target paths that cannot be resolved (ENTITY_CANNOT_BE_RESOLVED, CATALOG_PATH_CANNOT_BE_RESOLVED) now return 404 NoSuchNamespaceException rather than 500, since a concurrently dropped path is not retriable.
15da8eb to
51bea3d
Compare
adutra
left a comment
There was a problem hiding this comment.
Thanks @vigneshio for this PR!
|
Re: 412 - I think it is closely related to |
|
@dimas-b agree but 412 doesnt have the semantic issues of 503 and 429 in middleware (it is poorly used today AFAIK) so "least worse" from my window ;) - the real issue is trying to use a spec for assets for an API, this always had been wrong by design and this is where *RPC style solutions are way more relevant - hopefully iceberg get a JSON-RPC catalog a day ;) |
|
None of the options are ideal 😅 We're effectively trying to work around an IRC spec problem, while allowing reasonable clients to recover from this kind of failure without having to make any Polaris-specific assumptions. As I commented on With 503, the RFC is pretty lenient towards servers. I do not think clients can assume any service-wise outage on receiving a 503 response from one particular request. As Polaris will not provide a |
So - as 504 is - is literally global/for the server so often resilience4j, mp fault tolérance and friends use an impl opening the circuit breaker after a few occurrences - and it is not insane and is quite common on load balancers as behavior. |
When a concurrent modification occurs during
renameTable/renameView, theoperation now returns HTTP 503 (Service Unavailable) instead of HTTP 500
(Internal Server Error), signalling a transient, retryable condition.
Why
renameTableLikeinIcebergCatalogpreviously threw a bareRuntimeExceptionfor both
TARGET_ENTITY_CONCURRENTLY_MODIFIEDandENTITY_CANNOT_BE_RESOLVED.IcebergExceptionMappermaps unknownRuntimeExceptions to 500, so clientsreceived an opaque server error instead of a retryable signal. The code even
admitted this was temporary: "this is temporary. Should throw a special error
that will be caught and retried".
Both statuses are documented in
BaseResult.ReturnStatusas retryableconcurrency conditions ("the client should retry"), and the rename
implementation (
TransactionalMetaStoreManagerImpl.renameEntityInCurrentTxn)returns both on real races, so each branch is reachable in practice.
HTTP 409 is intentionally not used here: the Iceberg REST rename endpoint
reserves 409 for "the target identifier to rename to already exists" (which
Polaris already uses via the
ENTITY_ALREADY_EXISTScase). Mapping a concurrentmodification to 409 would collide with that meaning, so a spec-compliant client
would surface it as
AlreadyExistsExceptionrather than retrying. The renameendpoint lists 503 for the transient case, so this change uses 503 instead.
(Thanks to @nandorKollar for catching the 409 semantic mismatch in review.)
Changes
IcebergCatalog.renameTableLike: replaced the bareRuntimeExceptionwithServiceUnavailableException(HTTP 503) for concurrent rename failures(
TARGET_ENTITY_CONCURRENTLY_MODIFIED,ENTITY_CANNOT_BE_RESOLVED).testConcurrencyConflictRenameTablecovering bothstatuses, verifying they surface as
ServiceUnavailableException(503).CHANGELOG.md: added a Fixes entry.Known follow-up (out of scope)
The same
renameTableLikeswitch still lacks a case forCATALOG_PATH_CANNOT_BE_RESOLVED(cross-namespace renames where the destinationpath can't be resolved). That falls through to
default → IllegalStateException → 500, whereasupdateTableLikehandles it asNotFoundException(404). Thisis a pre-existing gap not introduced by this change; tracking separately.
Testing
./gradlew :polaris-runtime-service:compileTestJava— passes./gradlew :polaris-runtime-service:spotlessCheck :polaris-runtime-service:checkstyleMain :polaris-runtime-service:checkstyleTest— passes./gradlew :polaris-runtime-service:test --tests "org.apache.polaris.service.catalog.iceberg.IcebergCatalogRelationalTest.testConcurrencyConflictRenameTable"— passes (both parameterized cases)./gradlew :polaris-runtime-service:test --tests "org.apache.polaris.service.catalog.iceberg.IcebergCatalogRelationalTest.testConcurrencyConflictUpdateTableDuringFinalTransaction"— passes (regression check)Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)