Improve XML paging serialization performance by astruebi · Pull Request #3690 · samply/blaze

astruebi · 2026-06-09T09:23:16Z

Summary

This draft PR improves FHIR XML paging serialization performance by reducing generic XML writer overhead and adding targeted direct XML writers for common FHIR structures.

The full original WIP history is preserved in the fork branch codex/xml-direct-writer-wip. This branch is the review-oriented version with the same final code changes split into larger logical commits.

Commit Structure

Optimize XML response output
- Adds initial XML response path optimizations and profiling/benchmark support under profiling/xml-paging.
Reduce generic XML writer overhead
- Reduces generic XML writer overhead through cheaper field iteration, precomputed tag fragments, fewer sequence allocations, faster escaping, and optimized FHIR type lookup.
Specialize XML search bundle entries
- Adds a fast path for common search Bundle.entry shapes.
Add direct XML writers for common complex types
- Introduces XmlDirectWriter and covers first common complex types with regression tests.
Write XML through a UTF-8 writer
- Adds XmlUtf8Writer and routes XML output through the UTF-8 writer path.
Expand direct XML writing for common values
- Extends direct XML output for Period, Coding, Identifier, and Meta cases and optimizes ASCII paths in the UTF-8 writer.
Document XML optimization handover
- Adds historical optimization context, benchmark notes, verification notes, and discarded experiments.

Verification

Run on codex/xml-direct-writer-review:

make -C modules/fhir-structure fmt            OK
make -C modules/fhir-structure lint           OK
make -C modules/fhir-structure test           OK, 220 tests / 4551 assertions
make -C modules/fhir-structure test-coverage  OK, ALL FILES 95.15% Forms / 94.59% Lines
make -C modules/fhir-structure clean          OK
make -C modules/fhir-structure prep           OK
make uberjar                                  OK

Additional jar check:

javap confirms XmlDirectWriter contains writeIdentifier and writeMeta.
javap confirms removed/experimental writeReference is not present.

Local Benchmark

Three local Blaze instances were used with the same dataset. The PR instance was started from a freshly built image of this branch and used a copied Docker volume from the existing optimized Blaze instance, so the resource counts match exactly:

Patient:     29021
Encounter:   58271
Observation: 19187
Condition:   46217
Consent:     19887

Comparison below is XML output, median of runs 2-8:

PR 8090 vs old 8081

Encounter   1000:  19.1 ms vs 187.4 ms   9.8x faster
Encounter   5000:  39.8 ms vs 585.1 ms  14.7x faster
Observation 1000:  12.2 ms vs 132.4 ms  10.9x faster
Observation 5000:  35.9 ms vs 680.3 ms  19.0x faster
Condition   1000:   7.4 ms vs  63.5 ms   8.6x faster
Condition   5000:  21.0 ms vs 313.9 ms  15.0x faster
Consent     1000:  25.9 ms vs 770.3 ms  29.7x faster
Consent     5000: 113.7 ms vs 3581.9 ms 31.5x faster
Patient     1000:   9.1 ms vs  97.9 ms  10.8x faster
Patient     5000:  30.4 ms vs 437.4 ms  14.4x faster

A direct comparison between the fresh PR instance on 8090 and the existing optimized WIP instance on 8080 showed essentially identical timings, with identical XML byte sizes and only about 1-2% variance.

Notes

This is intentionally a draft PR. The implementation is performance-oriented and should be reviewed especially for maintainability of the XML writer fast paths and whether the profiling artifacts should stay in the final PR.

Default Page Size Benchmark

The same local setup was also benchmarked without an explicit _count parameter. Blaze therefore used its default page size of 50. XML output, median of runs 6-30:

PR 8090 vs old 8081, default page size 50

Encounter:    3.31 ms vs 12.25 ms   3.7x faster
Observation:  3.10 ms vs 13.05 ms   4.2x faster
Condition:    2.93 ms vs  9.62 ms   3.3x faster
Consent:      3.84 ms vs 39.56 ms  10.3x faster
Patient:      2.84 ms vs  7.29 ms   2.6x faster

The speedup is smaller than with _count=1000 or _count=5000, because fixed request, DB, and HTTP overhead account for a larger share of total time at 50 resources per page.

Full Paging Download Benchmark

The following benchmark follows all next links until the complete resource type has been downloaded. XML compares this PR on port 8090 against the old baseline on port 8081. JSON was measured only on the PR instance. Values are medians of three full downloads.

Resource	`_count`	Pages	Entries	PR XML	Old XML	XML Speedup	PR JSON
Encounter	50	1166	58,271	31.799 s	39.485 s	1.24x	36.458 s
Encounter	1000	59	58,271	3.382 s	10.365 s	3.06x	3.960 s
Encounter	5000	12	58,271	2.183 s	8.818 s	4.04x	2.611 s
Observation	50	384	19,187	12.073 s	14.146 s	1.17x	12.971 s
Observation	1000	20	19,187	1.232 s	3.885 s	3.15x	1.443 s
Observation	5000	4	19,187	0.955 s	3.496 s	3.66x	0.992 s
Condition	50	925	46,217	28.406 s	30.699 s	1.08x	28.437 s
Condition	1000	47	46,217	2.192 s	5.236 s	2.39x	2.421 s
Condition	5000	10	46,217	1.224 s	4.166 s	3.40x	1.335 s
Consent	50	398	19,887	14.529 s	25.196 s	1.73x	16.109 s
Consent	1000	20	19,887	2.941 s	14.897 s	5.07x	3.886 s
Consent	5000	4	19,887	2.440 s	13.995 s	5.74x	3.396 s
Patient	50	581	29,021	17.614 s	19.306 s	1.10x	18.444 s
Patient	1000	30	29,021	1.534 s	4.034 s	2.63x	1.701 s
Patient	5000	6	29,021	0.865 s	3.411 s	3.94x	0.973 s

At the default page size of 50, fixed paging and HTTP overhead dominates, so the XML speedup is modest. With larger pages, XML serialization becomes a larger share of total runtime and the PR shows substantially larger gains. On this dataset, PR XML is also slightly faster than PR JSON for the measured full downloads.

Add profiling scripts and baseline measurements for XML paging while reducing overhead in the XML response writer path.

Precompute tag fragments, streamline repeated field writing, avoid unnecessary sequence allocation and use a faster XML escaping helper in the hot path.

Add a fast path for common search Bundle.entry shapes to avoid generic XML field iteration for fullUrl, resource and search metadata.

Introduce XmlDirectWriter and dispatch selected FHIR complex types through direct XML serialization with regression coverage for Period values.

Add XmlUtf8Writer and route XML serialization through it so primitive XML output can avoid extra character encoding work.

Extend direct XML serialization to additional Period, Coding, Identifier and Meta cases, and optimize the UTF-8 writer for common ASCII output paths.

Capture the optimization context, verification results, benchmark observations and discarded experiments for follow-up review.

alexanderkiel

I don't like the approach with the low-level XmlUtf8Writer.

Benchmark a clean Woodstox XMLStreamWriter emit path against this branch. This is the real experiment worth running. Drop both XmlUtf8Writer and the hand-rolled write-xml-element, drive WstxOutputFactory directly from your xml-handlers/unform-xml walk (no data.xml Element tree, no emit). That removes the intermediate tree that made main slow, keeps a maintained library doing escaping/encoding, and deletes ~200 lines of hand-rolled byte-twiddling + the duplicated escaping now living in both Java (XmlUtf8Writer.writeEscaped) and Clojure (write-xml-str). If it's within a few percent, the maintenance win is worth it.

alexanderkiel · 2026-06-09T13:25:35Z

+import java.io.OutputStream;
+import java.io.Writer;
+
+public final class XmlUtf8Writer extends Writer {


This class is a bit to low-level.

astruebi added 7 commits June 9, 2026 10:03

Optimize XML response output

68b84d3

Add profiling scripts and baseline measurements for XML paging while reducing overhead in the XML response writer path.

Reduce generic XML writer overhead

37484fb

Precompute tag fragments, streamline repeated field writing, avoid unnecessary sequence allocation and use a faster XML escaping helper in the hot path.

Specialize XML search bundle entries

cddb538

Add a fast path for common search Bundle.entry shapes to avoid generic XML field iteration for fullUrl, resource and search metadata.

Add direct XML writers for common complex types

53928a0

Introduce XmlDirectWriter and dispatch selected FHIR complex types through direct XML serialization with regression coverage for Period values.

Write XML through a UTF-8 writer

2f81c10

Add XmlUtf8Writer and route XML serialization through it so primitive XML output can avoid extra character encoding work.

Expand direct XML writing for common values

3970432

Extend direct XML serialization to additional Period, Coding, Identifier and Meta cases, and optimize the UTF-8 writer for common ASCII output paths.

Document XML optimization handover

993f31e

Capture the optimization context, verification results, benchmark observations and discarded experiments for follow-up review.

alexanderkiel requested changes Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve XML paging serialization performance#3690

Improve XML paging serialization performance#3690
astruebi wants to merge 7 commits into
samply:mainfrom
astruebi:codex/xml-direct-writer-review

astruebi commented Jun 9, 2026 •

edited

Loading

Uh oh!

alexanderkiel left a comment

Uh oh!

alexanderkiel Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

astruebi commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commit Structure

Verification

Local Benchmark

Notes

Default Page Size Benchmark

Full Paging Download Benchmark

Uh oh!

alexanderkiel left a comment

Choose a reason for hiding this comment

Uh oh!

alexanderkiel Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

astruebi commented Jun 9, 2026 •

edited

Loading