Add Crr Cascade capabilities to backbeat crr replication#2747
Add Crr Cascade capabilities to backbeat crr replication#2747SylvainSenechal wants to merge 3 commits into
Conversation
Hello sylvainsenechal,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
| "@aws-sdk/client-sts": "^3.921.0", | ||
| "@aws-sdk/credential-providers": "^3.921.0", | ||
| "@scality/cloudserverclient": "^1.0.8", | ||
| "@scality/cloudserverclient": "file:../cloudserverclient/scality-cloudserverclient-v1.0.9.tgz", |
There was a problem hiding this comment.
@scality/cloudserverclient points to a local file path (file:../cloudserverclient/scality-cloudserverclient-v1.0.9.tgz). This will break CI and other developers' builds. Must be changed to a proper registry version or git-pinned tag before merge.
| "@scality/cloudserverclient": "file:../cloudserverclient/scality-cloudserverclient-v1.0.9.tgz", | |
| "@scality/cloudserverclient": "^1.0.9", |
— Claude Code
| const ObjectMDLocation = require('arsenal').models.ObjectMDLocation; | ||
| const { errors, jsutil, models, versioning } = require('arsenal'); | ||
| const ObjectMDLocation = models.ObjectMDLocation; | ||
| const { decode, checkCrrCascadeEvent } = versioning.VersionID; |
There was a problem hiding this comment.
Arsenal is pinned to 8.3.9, but checkCrrCascadeEvent and decode are imported from versioning.VersionID. Neither checkCrrCascadeEvent nor the getMicroVersionId() method (called on source/dest entries throughout this PR) appear to exist in arsenal 8.3.9 or in backbeat's own models. The arsenal dependency likely needs a version bump for this PR to work.
— Claude Code
| if (err.ObjNotFound || err.name === 'ObjNotFound') { | ||
| return cbOnce(err); | ||
| } | ||
| if (err.$metadata?.httpStatusCode === 409) { |
There was a problem hiding this comment.
Any 409 from the destination is assumed to be a cascade-stale scenario and the replication is silently marked COMPLETED. If cloudserver ever returns 409 for a different reason, the object would never be replicated. Consider checking for a more specific signal (e.g. a response body field or custom error code) rather than relying solely on the HTTP status code.
— Claude Code
|
There was a problem hiding this comment.
Can already check this pr, but should really be reviewed after all the other cascade prs, as changes in these pr would also mean changes here
There was a problem hiding this comment.
I think we can functional tests instead of just these,
But waiting for Arsenal/cloudserver to be merged, as it will be easier to make these tests (functional tests in backbeat rely on an image of cloudserver)
| "@aws-sdk/client-sts": "^3.921.0", | ||
| "@aws-sdk/credential-providers": "^3.921.0", | ||
| "@scality/cloudserverclient": "^1.0.8", | ||
| "@scality/cloudserverclient": "file:../cloudserverclient/scality-cloudserverclient-v1.0.9.tgz", |
There was a problem hiding this comment.
@scality/cloudserverclient is pinned to a local file path (file:../cloudserverclient/scality-cloudserverclient-v1.0.9.tgz). This will break CI and other developers' builds. Before merging, this needs to be changed to a proper npm version (e.g., ^1.0.9) or a git tag reference consistent with the other git-based deps in this project.
| "@scality/cloudserverclient": "file:../cloudserverclient/scality-cloudserverclient-v1.0.9.tgz", | |
| "@scality/cloudserverclient": "^1.0.9", |
— Claude Code
Codecov Report❌ Patch coverage is
Additional details and impacted files
... and 7 files with indirect coverage changes
@@ Coverage Diff @@
## development/9.5 #2747 +/- ##
===================================================
- Coverage 74.73% 74.58% -0.16%
===================================================
Files 199 199
Lines 13650 13691 +41
===================================================
+ Hits 10201 10211 +10
- Misses 3439 3470 +31
Partials 10 10
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
| entry: destEntry.getLogInfo(), | ||
| }); | ||
| return doneOnce(null, partAlreadyAtDest); | ||
| default: { |
There was a problem hiding this comment.
When cascadeLoopDetected or cascadeDataComplete is returned as an error from a part, this error flows through the retry wrapper in _getAndPutPart. The BackbeatTask.retry() method checks err.retryable === undefined and then inspects err.code, err.name, and err.message to detect network errors. Since these cascade signals are plain objects ({}), none of those properties exist, so no mutation occurs today. But if a future change accidentally adds such a property to these sentinels, retry() would mutate the shared singleton (setting retryable = true) and corrupt all subsequent cascade checks. Consider using Object.freeze() on the sentinel objects at line 33-35 to prevent accidental mutation.
— Claude Code
|
| partResults.length > 0 && | ||
| partResults.every(result => result === partAlreadyAtDest); | ||
| const destLocations = allPartsAlreadyAtDest ? [] : | ||
| (partResults || []).filter(result => result && result !== partAlreadyAtDest); |
There was a problem hiding this comment.
Bug: when err is truthy (line 434 returns early), destLocations is used in the _deleteOrphans call, but destLocations was computed on line 432-433 with the !err guard already making allPartsAlreadyAtDest false. The filter on line 433 still runs (partResults || []).filter(...) which accesses partResults — but after an error, partResults from mapLimitWaitPendingIfError may contain partial results mixed with partAlreadyAtDest sentinel values. These sentinel objects (empty {}) will pass the result && result !== partAlreadyAtDest filter and be passed to _deleteOrphans, which tries to read .key and .dataStoreName from them. Since they're empty objects, they'll produce undefined keys — likely harmless because _deleteOrphans filters out falsy .key, but worth a defensive check or comment explaining why this is safe.
— Claude Code
| "@smithy/node-http-handler": "^3.3.3", | ||
| "JSONStream": "^1.3.5", | ||
| "arsenal": "git+https://github.com/scality/arsenal#8.3.9", | ||
| "arsenal": "git+https://github.com/scality/Arsenal#2c429ab35a5ac82c3dafa5a0296a49a23a9c8a4a", |
There was a problem hiding this comment.
Arsenal dependency is pinned to a raw commit hash (2c429ab) instead of a release tag. All other git-based deps in this file use tags (e.g. breakbeat#v1.0.3, bucketclient#8.2.4). Pin to a tagged release once the arsenal PR is merged to keep the dependency auditable and consistent with the project convention.
— Claude Code
| "@aws-sdk/client-sts": "^3.921.0", | ||
| "@aws-sdk/credential-providers": "^3.921.0", | ||
| "@scality/cloudserverclient": "^1.0.8", | ||
| "@scality/cloudserverclient": "file:./scality-cloudserverclient-v1.0.9.tgz", |
There was a problem hiding this comment.
Vendoring a .tgz into the repo adds a binary blob to git history that can never be garbage-collected. Prefer publishing @scality/cloudserverclient@1.0.9 to the registry (or using a git tag reference like the other deps) and pin to it here. If vendoring is intentional for CI reasons, it should be called out in the PR description.
— Claude Code
| entry: destEntry.getLogInfo(), | ||
| }); | ||
| return doneOnce(null, partAlreadyAtDest); | ||
| default: { |
There was a problem hiding this comment.
Trailing whitespace on this line (after the opening brace {). Minor, but some CI lint checks catch this.
— Claude Code
| // update location, replication status and put metadata in | ||
| // target bucket | ||
| (location, next) => { | ||
| (location, allPartsAlreadyAtDest, next) => { |
There was a problem hiding this comment.
In _processQueueEntryRetryFull, the second waterfall stage now passes allPartsAlreadyAtDest as the mdOnly argument to _putMetadata. This means a full retry where all parts happened to already exist at the destination will send ReplicationContent: 'METADATA' in the putMetadata call. Verify this is the intended behavior — it changes the semantics from "always full replication on retry" to "maybe metadata-only on retry if cascade detected all parts present." If a retry was triggered because metadata was stale but data was fine, this seems correct; if the retry was triggered for a data integrity reason, skipping the data write could mask the issue.
— Claude Code
| { | ||
| method: 'ReplicateObject._getAndPutPartOnce', | ||
| entry: destEntry.getLogInfo(), | ||
| }); |
There was a problem hiding this comment.
cascadeLoopDetected is passed as the err argument via doneOnce(cascadeLoopDetected). This sentinel is an empty object {}, so upstream retry logic in BackbeatTask._retry will see it as a truthy error and may retry the operation before _handleReplicationOutcome gets to check for it. Confirm that _getAndPutPart (the retry wrapper around _getAndPutPartOnce) won't retry on this sentinel — if it does, the loop detection is bypassed and the part is retried unnecessarily.
— Claude Code
|
4c64ed6 to
3237f9e
Compare
| "@smithy/node-http-handler": "^3.3.3", | ||
| "JSONStream": "^1.3.5", | ||
| "arsenal": "git+https://github.com/scality/arsenal#8.3.9", | ||
| "arsenal": "git+https://github.com/scality/Arsenal#2c429ab35a5ac82c3dafa5a0296a49a23a9c8a4a", |
There was a problem hiding this comment.
Arsenal is pinned to a raw commit hash (2c429ab...) instead of a semantic version tag. Per project conventions, git-based deps (arsenal, vaultclient, etc.) should pin to tags (e.g. #8.x.y). Commit hashes are opaque — it's unclear which features/fixes are included, and there's no semver contract. This also makes it harder for reviewers and operators to reason about what changed.
— Claude Code
| "@aws-sdk/client-sts": "^3.921.0", | ||
| "@aws-sdk/credential-providers": "^3.921.0", | ||
| "@scality/cloudserverclient": "^1.0.8", | ||
| "@scality/cloudserverclient": "file:./scality-cloudserverclient-v1.0.9.tgz", |
There was a problem hiding this comment.
A binary .tgz is committed to the repo and referenced via file: in package.json. This permanently inflates the git history (binary blobs can't be diffed or garbage-collected), prevents code review of the dependency contents, and bypasses the normal package registry workflow. Consider publishing @scality/cloudserverclient@1.0.9 to the npm registry (or a private registry) and referencing it as a versioned dependency instead.
— Claude Code
The cascade replication logic in Review by Claude Code |
Issue: BB-767
Related PRs :
Arsenal : scality/Arsenal#2628
Cloudserver : scality/cloudserver#6179
CloudserverClient : scality/cloudserverclient#24
S3utils : scality/s3utils#395