feat: Artifacts — attach binary blobs to spans#1931
Conversation
`logfire.Artifact` attaches an image, audio clip, PDF, or large JSON payload to a span. The blob is uploaded to object storage out of band (sync or in the background, chosen per artifact); the span carries only a small content-addressed reference, so it is not subject to attribute size limits. - `Artifact` from bytes, a file path, or a binary handle. - `upload='background'` (default — never blocks the caller; drops with a warning if the upload queue is full) or `upload='sync'` (inline, guaranteed delivery, frees the source immediately). - `json_schema` / `json_encoder` hooks: an artifact serializes to a reference object; the blob upload is brokered out of band. - `logfire-api` stubs and a docs page. The backend side ships separately in the platform repo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Backend side: pydantic/platform#21850 |
Deploying logfire-docs with
|
| Latest commit: |
ead1587
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://2206d6f6.logfire-docs.pages.dev |
| Branch Preview URL: | https://feat-artifacts.logfire-docs.pages.dev |
There was a problem hiding this comment.
Please don't generate pyi files in PRs, they're just clutter. the CLAUDE.md should probably make this more explicit.
| logging call never blocks. If uploads cannot keep up, queued artifacts are dropped | ||
| with a warning rather than stalling your program. | ||
| - **`sync`** — the upload runs inline; the logging call returns only once the blob is | ||
| stored. Use this when you need delivery guaranteed, or when you want to free the |
There was a problem hiding this comment.
Delivery is far from guaranteed with sync, it still just silently swallows request exceptions without retrying.
There was a problem hiding this comment.
Do you think we should make the guarantee stronger (error) or make the docstrings match current impl?
There was a problem hiding this comment.
i think we eventually want a stronger guarantee, but it doesn't have to be in the first pass. until then, the docs should be accurate.
| from ..artifacts import Artifact | ||
| from ..utils import log_internal_error | ||
|
|
||
| # Default ceiling on bytes queued for background upload. When exceeded, `submit` blocks. |
There was a problem hiding this comment.
it doesn't block, it drops
There was a problem hiding this comment.
should we make it drop, block, spill to disk...?
There was a problem hiding this comment.
for background uploading, i think not blocking is part of the contract. i think it should spill to disk up to a higher limit, then drop.
| on the calling thread. | ||
| """ | ||
|
|
||
| def __init__(self, *, base_url: str, token: str, max_queue_bytes: int = DEFAULT_MAX_QUEUE_BYTES) -> None: |
There was a problem hiding this comment.
max_queue_bytes isn't actually user configurable and probably should be
| # there can break the signature, so only the backend `/blob` endpoint gets auth. | ||
| put_headers = self._auth if target['requires_auth'] else {} | ||
| put = requests.request( | ||
| target['method'], target['url'], data=artifact.read(), headers=put_headers, timeout=_REQUEST_TIMEOUT |
There was a problem hiding this comment.
when the artifact is read from a file, it doesn't live in memory in the queue. but max_queue_bytes seems to be intended to save memory, so should it apply to file artifacts? in fact, what if we stored all artifacts to files to allow a bigger queue?
There was a problem hiding this comment.
yeah I think it would make sense to force all artifacts to buffer on disk, they are almost by definition large
| 'attach_context', | ||
| 'url_from_eval', | ||
| 'Artifact', | ||
| 'UploadMode', |
There was a problem hiding this comment.
Do users need to use logfire.UploadMode? this seems like clutter in the root package.
| # Network/HTTP failures are operational, not bugs: the artifact reference is | ||
| # still recorded on the span, the blob just isn't stored. Best-effort upload — | ||
| # never crash the caller's logging call over it. | ||
| pass |
|
|
||
| def __init__( | ||
| self, | ||
| data: bytes, |
There was a problem hiding this comment.
IMO data should accept:
- bytes
- str, which is encoded to bytes
- Path, which is like from_file
- a file handle, and then from_file_handle isn't needed at all
- any other object, which goes through the logfire JSON encoding
Then the constructor can handle basically anything, and from_file is only a convenience to treat str as a path instead of actual data, and to maybe save some memory depending on implementation details.
| return self._compute()[0] | ||
|
|
||
| @property | ||
| def size_bytes(self) -> int: |
There was a problem hiding this comment.
does the backend check that this is reported honestly?
| UUID: _to_str, | ||
| Exception: _to_str, | ||
| # An artifact serialises to its reference object; the blob is uploaded separately. | ||
| Artifact: lambda o, _: o.reference(), |
There was a problem hiding this comment.
we need a way to mark the contents of these as always exempt from scrubbing. this shape doesn't allow us to without adding some new machinery. if it was {'logfire.artifact': {...}} then adding logfire.artifact to the scrubber safe keys would work.
that wouldn't help if someone writes secret=Artifact(...), which will get scrubbed by default either way. not sure if we want to do something about that.
There was a problem hiding this comment.
maybe we just say scrubbing doesn't apply to artifcats for now and figure it out in a followup?
There was a problem hiding this comment.
i'm saying we need to figure out how to make scrubbing not apply to artifacts, right now it can. not the contents, the reference.
There was a problem hiding this comment.
that seems like something we can sort out. i'm just asking if we can ship whatever falls out naturally now (even if it's kinda broken) and come up with the right apis, etc. later or if you think it needs to be bundled into this change?
There was a problem hiding this comment.
i thought the goal right now was to settle on a good API? what kind of review are you looking for?
What
Adds
logfire.Artifact— attach a binary blob (image, audio, PDF, large JSON) to a span. The blob is uploaded to object storage out of band; the span carries only a small content-addressed (sha256) reference, so it is not subject to attribute size limits and does not bloat telemetry.This is the SDK side. The backend side ships separately in the platform repo (PR linked below once open).
Usage
Artifact.from_file), or a binary handle (Artifact.from_file_handle— handles temp/spooled files).uploadis chosen per artifact:background(default) — never blocks the caller. If the upload queue is over its byte budget, the artifact is dropped with a warning rather than applying backpressure.sync— uploaded inline; the call returns once the blob is stored, so the source can be freed immediately.How it works
An artifact serializes to a reference object (
{"type": "logfire.artifact", "sha256": …}) via the existingjson_schema/json_encoderhooks. A background uploader runs the register → PUT → finalize handshake against the backend; signed object-store URLs are PUT to without the bearer token.Changes
logfire/_internal/artifacts.py—Artifact+ theArtifactSourceabstraction (bytes / path; designed so streaming sources can be added later).logfire/_internal/exporters/artifact_uploader.py— background uploader (bounded queue, drop-on-full).json_schema/json_encoder/main/confighooks;Artifact+UploadModeexports.logfire-apistubs regenerated; docs page atreference/advanced/artifacts.md.Verification
tests/test_artifacts.py— 14 tests pass (construction from each source, schema/encoder hooks, span integration, uploader sync/background/dedup/drop-on-full/error-swallowing).ruff+pyrightclean.Note:
test_logfire_api.py::test_runtime[with_logfire]currently fails on an unrelateddspyDeprecationWarningraised byinstrument_dspy— pre-existing, not introduced here.🤖 Generated with Claude Code