-
Notifications
You must be signed in to change notification settings - Fork 870
Allow OpenTelemetry*Client.JsonSerializerOptions to control full OTel message formatting #7558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,6 +8,7 @@ | |
| using System.Text.Encodings.Web; | ||
| using System.Text.Json; | ||
| using System.Text.Json.Serialization.Metadata; | ||
| using Microsoft.Shared.Diagnostics; | ||
|
|
||
| #pragma warning disable CA1307 // Specify StringComparison for clarity | ||
| #pragma warning disable CA1308 // Normalize strings to uppercase | ||
|
|
@@ -19,6 +20,19 @@ internal static class OtelMessageSerializer | |
| { | ||
| internal static readonly JsonSerializerOptions DefaultOptions = CreateDefaultOptions(); | ||
|
|
||
| /// <summary> | ||
| /// Validates that the given options include the required OTel type resolver. | ||
| /// </summary> | ||
| internal static void ThrowIfMissingOtelResolver(JsonSerializerOptions options) | ||
| { | ||
| if (!options.TypeInfoResolverChain.Any(r => r is OtelContext)) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this check is a bit too aggressive and I'm not aware of precedent elsewhere. Why do we need
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Good point, it won't hold in that case. For context, OtelContext is internal and is needed for AOT serialization of OTel message-part POCOs. The goal of the change is to allow some user-options to control serialization, so we can either: a. Reject user options that wouldn't produce otel-conformant json, that is require snake-case and OtelContext first in the resolver chain, "first" is the simple, enforceable approximation that guarantees correctness at the cost of rejecting some valid-but-unusual chains. The exception message will tell users to copy the existing options ( b. Copy a curated set of user knobs onto our fixed options (e.g. WriteIndented, Encoder) and ignore the rest. This never throws, but silently dropping options the user set is its own surprise as @tarekgh mentioned, which is why I leaned toward throwing.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. c. Do nothing: accept any options, document the guidance. No validation, no throwing, no repairing. The remarks advise copying the default and overwriting only what you need. If a user passes options that break conformance, that's on them.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. a. and c. would be breaking in different ways: (a) Reject — API/compile/throw break. Previously-accepted options now throw ArgumentException from the setter. That's a source/binary behavioral break: code that compiled and ran fine now fails. Clearly breaking. (c) Do nothing — behavioral break, not an API break. The setter still accepts everything (no throw, same signature), so nothing breaks at compile/bind time. But options that used to be ignored for the envelope now control it, for example, a user passing
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch, agreed, the check targets the wrong thing. Presence of OtelContext does not guarantee it is used (a DefaultJsonTypeInfoResolver at the front resolves everything first), and it is not necessary either, since any resolver that can resolve the OTel types works. GetTypeInfo already throws on its own when nothing resolves the types, so the guard adds no reliable value. I will drop it. Note too that the conformance critical bits (PropertyNamingPolicy, DefaultIgnoreCondition, Encoder) come from the live options, not the resolver, so if we want to guarantee correct telemetry the better levers are validating those settings or narrowing the API. Happy to go whichever way @eiriktsarpalis and @jozkee prefer.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Users can always assign a new JSO to the setter but then modify its settings while it is still mutable. So I don't see how validation at the setter would prevent anything. You might consider running validation when you first try to serialize after the JSO has been marked read-only. Can I ask why you specifically are looking for OtelContext? I don't think it's a good idea to demand the presence of what amounts to internal implementation detail and users cannot directly control. I would suggest instead to examine specific structural properties of the resultant
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
As a convenient way of spotting JSO not created with the copy ctor, also for AOT, if we don't do the check, we will get a cryptic
I will give it a try, but I suspect that may be too brittle. |
||
| { | ||
| Throw.ArgumentException( | ||
| nameof(options), | ||
| "The provided JsonSerializerOptions is missing the required OpenTelemetry type resolver."); | ||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, the message is not actionable as written. Since the OtelContext resolver is internal, a user has no obvious way to satisfy the requirement starting from a blank JsonSerializerOptions. Suggest pointing them at the supported path, for example: "The provided JsonSerializerOptions is missing the required OpenTelemetry type resolver. Copy the client's existing JsonSerializerOptions (new JsonSerializerOptions(client.JsonSerializerOptions)) and override only the properties you need." That matches the pattern already shown in the new XML doc remarks. |
||
| } | ||
|
|
||
| private static readonly JsonElement _emptyObject = | ||
| JsonSerializer.SerializeToElement(new object(), DefaultOptions.GetTypeInfo(typeof(object))); | ||
|
|
||
|
|
@@ -36,8 +50,10 @@ private static JsonSerializerOptions CreateDefaultOptions() | |
| } | ||
|
|
||
| internal static string SerializeChatMessages( | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While we are here: two other OTel middlewares call this same SerializeChatMessages but do not pass an options argument, so they always fall back to DefaultOptions and give the user no way to customize telemetry formatting (the same limitation #7514 describes for chat/realtime):
Neither client exposes a JsonSerializerOptions property at all. Should this PR (or a follow-up) extend the same treatment to them, adding the property and flowing it through as options: _jsonSerializerOptions, so the behavior is consistent across all OTel clients that emit gen_ai message JSON? If it is out of scope here, a tracking issue would be good so they do not drift. |
||
| IEnumerable<ChatMessage> messages, ChatFinishReason? chatFinishReason = null, JsonSerializerOptions? customContentSerializerOptions = null) | ||
| IEnumerable<ChatMessage> messages, ChatFinishReason? chatFinishReason = null, JsonSerializerOptions? options = null) | ||
| { | ||
| options ??= DefaultOptions; | ||
|
|
||
| List<object> output = []; | ||
|
|
||
| string? finishReason = | ||
|
|
@@ -225,12 +241,7 @@ internal static string SerializeChatMessages( | |
| JsonElement element = _emptyObject; | ||
| try | ||
| { | ||
| JsonTypeInfo? unknownContentTypeInfo = | ||
| customContentSerializerOptions?.TryGetTypeInfo(content.GetType(), out JsonTypeInfo? ctsi) is true ? ctsi : | ||
| DefaultOptions.TryGetTypeInfo(content.GetType(), out JsonTypeInfo? dtsi) ? dtsi : | ||
| null; | ||
|
|
||
| if (unknownContentTypeInfo is not null) | ||
| if (options.TryGetTypeInfo(content.GetType(), out JsonTypeInfo? unknownContentTypeInfo)) | ||
| { | ||
| element = JsonSerializer.SerializeToElement(content, unknownContentTypeInfo); | ||
| } | ||
|
|
@@ -252,7 +263,7 @@ internal static string SerializeChatMessages( | |
| output.Add(m); | ||
| } | ||
|
|
||
| return JsonSerializer.Serialize(output, DefaultOptions.GetTypeInfo(typeof(IList<object>))); | ||
| return JsonSerializer.Serialize(output, options.GetTypeInfo(typeof(IList<object>))); | ||
| } | ||
|
|
||
| /// <summary>Derives the OTel <c>modality</c> classifier from a media type's top-level type.</summary> | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -123,14 +123,32 @@ public OpenTelemetryRealtimeClientSession(IRealtimeClientSession innerSession, I | |
| _tokenUsageHistogram = OtelMetricHelpers.CreateGenAITokenUsageHistogram(_meter); | ||
| _operationDurationHistogram = OtelMetricHelpers.CreateGenAIOperationDurationHistogram(_meter); | ||
|
|
||
| _jsonSerializerOptions = AIJsonUtilities.DefaultOptions; | ||
| _jsonSerializerOptions = OtelMessageSerializer.DefaultOptions; | ||
| } | ||
|
|
||
| /// <summary>Gets or sets JSON serialization options to use when formatting realtime data into telemetry strings.</summary> | ||
| /// <remarks> | ||
| /// <para> | ||
| /// The default value uses <see cref="System.Text.Json.JsonNamingPolicy.SnakeCaseLower"/> property naming, | ||
| /// <see cref="System.Text.Json.JsonSerializerOptions.WriteIndented"/> set to <see langword="true"/>, | ||
| /// and includes type metadata for all built-in OpenTelemetry message-part types. | ||
| /// </para> | ||
| /// <para> | ||
| /// To customize settings, it is recommended to copy the current options and override individual properties: | ||
| /// <code> | ||
| /// session.JsonSerializerOptions = new JsonSerializerOptions(session.JsonSerializerOptions) { WriteIndented = false }; | ||
| /// </code> | ||
| /// </para> | ||
| /// </remarks> | ||
| public JsonSerializerOptions JsonSerializerOptions | ||
| { | ||
| get => _jsonSerializerOptions; | ||
| set => _jsonSerializerOptions = Throw.IfNull(value); | ||
| set | ||
| { | ||
| _ = Throw.IfNull(value); | ||
| OtelMessageSerializer.ThrowIfMissingOtelResolver(value); | ||
| _jsonSerializerOptions = value; | ||
| } | ||
| } | ||
|
|
||
| /// <inheritdoc /> | ||
|
|
@@ -311,7 +329,7 @@ private static void AddOutputModalitiesTag(Activity? activity, HashSet<string>? | |
| } | ||
|
|
||
| /// <summary>Adds output messages tag to the activity if there are messages to add.</summary> | ||
| private static void AddOutputMessagesTag(Activity? activity, List<RealtimeOtelMessage>? outputMessages) | ||
| private void AddOutputMessagesTag(Activity? activity, List<RealtimeOtelMessage>? outputMessages) | ||
| { | ||
| if (activity is { IsAllDataRequested: true } && outputMessages is { Count: > 0 }) | ||
| { | ||
|
|
@@ -498,15 +516,15 @@ private static void AddOutputMessagesTag(Activity? activity, List<RealtimeOtelMe | |
| } | ||
|
|
||
| /// <summary>Serializes a single message to OTel format (as an array with one element).</summary> | ||
| private static string SerializeMessage(RealtimeOtelMessage message) | ||
| private string SerializeMessage(RealtimeOtelMessage message) | ||
| { | ||
| return JsonSerializer.Serialize(new[] { message }, OtelContext.Default.IEnumerableRealtimeOtelMessage); | ||
| return JsonSerializer.Serialize(new[] { message }, _jsonSerializerOptions.GetTypeInfo(typeof(IEnumerable<RealtimeOtelMessage>))); | ||
| } | ||
|
Comment on lines
+527
to
530
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed this is a real gap. SerializeMessage/SerializeMessages can run from the non-streaming path before the options are frozen (today they are only made read-only in GetStreamingResponseAsync), so a caller could mutate JsonSerializerOptions after the first serialization and get inconsistent telemetry, and it is not safe under concurrency. Suggest freezing at the point of first serialization (for example call MakeReadOnly() at the top of SerializeMessage/SerializeMessages, or freeze once in the setter) so the options stay immutable for the lifetime of any emitted span. |
||
|
|
||
| /// <summary>Serializes content items to OTel format.</summary> | ||
| private static string SerializeMessages(IEnumerable<RealtimeOtelMessage> messages) | ||
| private string SerializeMessages(IEnumerable<RealtimeOtelMessage> messages) | ||
| { | ||
| return JsonSerializer.Serialize(messages, OtelContext.Default.IEnumerableRealtimeOtelMessage); | ||
| return JsonSerializer.Serialize(messages, _jsonSerializerOptions.GetTypeInfo(typeof(IEnumerable<RealtimeOtelMessage>))); | ||
| } | ||
|
Comment on lines
+533
to
536
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same root cause as the SerializeMessage comment above. SerializeMessages also reads _jsonSerializerOptions without ensuring it is read-only first, so options can be mutated between operations and produce inconsistent output. Whatever freezing approach we pick should cover both methods (freezing once on first use, or in the setter, would handle both at once). |
||
|
|
||
| /// <summary>Extracts content from an AIContent list and converts to OTel format.</summary> | ||
|
|
@@ -622,17 +640,7 @@ private static string SerializeMessages(IEnumerable<RealtimeOtelMessage> message | |
| JsonElement element = default; | ||
| try | ||
| { | ||
| JsonTypeInfo? unknownContentTypeInfo = null; | ||
| if (_jsonSerializerOptions?.TryGetTypeInfo(content.GetType(), out JsonTypeInfo? ctsi) ?? false) | ||
| { | ||
| unknownContentTypeInfo = ctsi; | ||
| } | ||
| else if (AIJsonUtilities.DefaultOptions.TryGetTypeInfo(content.GetType(), out JsonTypeInfo? dtsi)) | ||
| { | ||
| unknownContentTypeInfo = dtsi; | ||
| } | ||
|
|
||
| if (unknownContentTypeInfo is not null) | ||
| if (_jsonSerializerOptions.TryGetTypeInfo(content.GetType(), out JsonTypeInfo? unknownContentTypeInfo)) | ||
| { | ||
| element = JsonSerializer.SerializeToElement(content, unknownContentTypeInfo); | ||
| } | ||
|
|
@@ -716,7 +724,7 @@ private static string SerializeMessages(IEnumerable<RealtimeOtelMessage> message | |
| { | ||
| _ = activity.AddTag( | ||
| OpenTelemetryConsts.GenAI.SystemInstructions, | ||
| JsonSerializer.Serialize(new object[1] { new OtelGenericPart { Content = options.Instructions } }, OtelContext.Default.IListObject)); | ||
| JsonSerializer.Serialize(new object[1] { new OtelGenericPart { Content = options.Instructions } }, _jsonSerializerOptions.GetTypeInfo(typeof(IList<object>)))); | ||
| } | ||
|
|
||
| } | ||
|
|
@@ -725,7 +733,7 @@ private static string SerializeMessages(IEnumerable<RealtimeOtelMessage> message | |
| { | ||
| _ = activity.AddTag( | ||
| OpenTelemetryConsts.GenAI.Tool.Definitions, | ||
| JsonSerializer.Serialize(options.Tools.Select(t => OtelFunction.Create(t, includeOptionalProperties: EnableSensitiveData)), OtelContext.Default.IEnumerableOtelFunction)); | ||
| JsonSerializer.Serialize(options.Tools.Select(t => OtelFunction.Create(t, includeOptionalProperties: EnableSensitiveData)), _jsonSerializerOptions.GetTypeInfo(typeof(IEnumerable<OtelFunction>)))); | ||
| } | ||
| } | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Design question: is the breaking change worth it, or could we avoid it entirely? Rejecting options that lack the resolver is what makes this source/binary breaking, since the setter previously accepted any options object silently. An alternative is to repair instead of reject: if the OtelContext resolver is missing from the supplied options, add it to the TypeInfoResolverChain (cloning first if the instance is read-only) rather than throwing. That keeps previously valid callers working, still honors the user formatting (the #7514 fix), and guarantees conformance because the resolver is always present. The tradeoff is that silently augmenting a caller-provided options object is its own minor surprise. If we deliberately want the strictness, that is fine, but it would be good to call out the breaking change explicitly in the PR description and changelog and confirm it is acceptable for this type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This begs the question, should we repair also the PropertyNamingPolicy to always be SnakeCaseLower or were you suggesting just repairing the OtelContext and throw if PropertyNamingPolicy != SnakeCaseLower?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. I dug into this and the short version is that just repairing the OtelContext resolver is not enough, and repairing only PropertyNamingPolicy still leaves gaps.
The key finding is that OtelContext's source-gen settings are not baked into the property metadata when its resolver is reused on a different JsonSerializerOptions instance. The wire format follows the live options. A quick repro with a snake_case context:
So three settings are conformance critical and all come from the live options: PropertyNamingPolicy (must be SnakeCaseLower, otherwise the gen_ai attribute names are wrong), DefaultIgnoreCondition (WhenWritingNull, otherwise null parts leak in), and Encoder (UnsafeRelaxedJsonEscaping for readable UTF-8). WriteIndented is the only one that is safe to vary, and it is the actual ask in #7514.
That makes full silent repair unattractive: we would have to clone the supplied options (they are read-only), inject the resolver, and force three of the user's settings while keeping only WriteIndented. Since this is a get/set property, get would then return something different from what was set, and we would be silently discarding most of what the caller passed. That is more code and more surprising semantics than the break it avoids.
On whether avoiding the breaking change is worth it: the break is behavioral only (the property signature is unchanged), it fails loudly and early at the setter rather than corrupting data, it lands on a niche audience (only people who set this property today), and the behavior it replaces is the bug we are fixing. So I would lean toward not contorting the design to preserve it, and instead documenting it as an intentional low severity break.
My suggestions, in order of preference:
If an API adjustment is acceptable, make invalid states unrepresentable instead of validating a free-form options object. For example a configure callback Action that receives a clone of DefaultOptions (re-asserting the critical settings after it runs), or a small targeted knob such as a WriteIndented/compact toggle. This removes the repair-vs-throw question entirely and directly serves Allow controlling JSON formatting for gen_ai input/output message telemetry #7514.
If we keep the property, validate and throw, but strengthen the check beyond the resolver to also require PropertyNamingPolicy == SnakeCaseLower (the one that actually corrupts the convention's attribute names), and ideally DefaultIgnoreCondition and Encoder too. Pair it with the actionable "copy the existing options" message.
I would avoid the silent repair path: it is the only option that fully prevents the break, but it costs the most complexity and the least predictable behavior.