Skip to content

Improve XML paging serialization performance#3690

Draft
astruebi wants to merge 7 commits into
samply:mainfrom
astruebi:codex/xml-direct-writer-review
Draft

Improve XML paging serialization performance#3690
astruebi wants to merge 7 commits into
samply:mainfrom
astruebi:codex/xml-direct-writer-review

Conversation

@astruebi

@astruebi astruebi commented Jun 9, 2026

Copy link
Copy Markdown

Summary

This draft PR improves FHIR XML paging serialization performance by reducing generic XML writer overhead and adding targeted direct XML writers for common FHIR structures.

The full original WIP history is preserved in the fork branch codex/xml-direct-writer-wip. This branch is the review-oriented version with the same final code changes split into larger logical commits.

Commit Structure

  1. Optimize XML response output
    • Adds initial XML response path optimizations and profiling/benchmark support under profiling/xml-paging.
  2. Reduce generic XML writer overhead
    • Reduces generic XML writer overhead through cheaper field iteration, precomputed tag fragments, fewer sequence allocations, faster escaping, and optimized FHIR type lookup.
  3. Specialize XML search bundle entries
    • Adds a fast path for common search Bundle.entry shapes.
  4. Add direct XML writers for common complex types
    • Introduces XmlDirectWriter and covers first common complex types with regression tests.
  5. Write XML through a UTF-8 writer
    • Adds XmlUtf8Writer and routes XML output through the UTF-8 writer path.
  6. Expand direct XML writing for common values
    • Extends direct XML output for Period, Coding, Identifier, and Meta cases and optimizes ASCII paths in the UTF-8 writer.
  7. Document XML optimization handover
    • Adds historical optimization context, benchmark notes, verification notes, and discarded experiments.

Verification

Run on codex/xml-direct-writer-review:

make -C modules/fhir-structure fmt            OK
make -C modules/fhir-structure lint           OK
make -C modules/fhir-structure test           OK, 220 tests / 4551 assertions
make -C modules/fhir-structure test-coverage  OK, ALL FILES 95.15% Forms / 94.59% Lines
make -C modules/fhir-structure clean          OK
make -C modules/fhir-structure prep           OK
make uberjar                                  OK

Additional jar check:

javap confirms XmlDirectWriter contains writeIdentifier and writeMeta.
javap confirms removed/experimental writeReference is not present.

Local Benchmark

Three local Blaze instances were used with the same dataset. The PR instance was started from a freshly built image of this branch and used a copied Docker volume from the existing optimized Blaze instance, so the resource counts match exactly:

Patient:     29021
Encounter:   58271
Observation: 19187
Condition:   46217
Consent:     19887

Comparison below is XML output, median of runs 2-8:

PR 8090 vs old 8081

Encounter   1000:  19.1 ms vs 187.4 ms   9.8x faster
Encounter   5000:  39.8 ms vs 585.1 ms  14.7x faster
Observation 1000:  12.2 ms vs 132.4 ms  10.9x faster
Observation 5000:  35.9 ms vs 680.3 ms  19.0x faster
Condition   1000:   7.4 ms vs  63.5 ms   8.6x faster
Condition   5000:  21.0 ms vs 313.9 ms  15.0x faster
Consent     1000:  25.9 ms vs 770.3 ms  29.7x faster
Consent     5000: 113.7 ms vs 3581.9 ms 31.5x faster
Patient     1000:   9.1 ms vs  97.9 ms  10.8x faster
Patient     5000:  30.4 ms vs 437.4 ms  14.4x faster

A direct comparison between the fresh PR instance on 8090 and the existing optimized WIP instance on 8080 showed essentially identical timings, with identical XML byte sizes and only about 1-2% variance.

Notes

This is intentionally a draft PR. The implementation is performance-oriented and should be reviewed especially for maintainability of the XML writer fast paths and whether the profiling artifacts should stay in the final PR.

Default Page Size Benchmark

The same local setup was also benchmarked without an explicit _count parameter. Blaze therefore used its default page size of 50. XML output, median of runs 6-30:

PR 8090 vs old 8081, default page size 50

Encounter:    3.31 ms vs 12.25 ms   3.7x faster
Observation:  3.10 ms vs 13.05 ms   4.2x faster
Condition:    2.93 ms vs  9.62 ms   3.3x faster
Consent:      3.84 ms vs 39.56 ms  10.3x faster
Patient:      2.84 ms vs  7.29 ms   2.6x faster

The speedup is smaller than with _count=1000 or _count=5000, because fixed request, DB, and HTTP overhead account for a larger share of total time at 50 resources per page.

Full Paging Download Benchmark

The following benchmark follows all next links until the complete resource type has been downloaded. XML compares this PR on port 8090 against the old baseline on port 8081. JSON was measured only on the PR instance. Values are medians of three full downloads.

Resource _count Pages Entries PR XML Old XML XML Speedup PR JSON
Encounter 50 1166 58,271 31.799 s 39.485 s 1.24x 36.458 s
Encounter 1000 59 58,271 3.382 s 10.365 s 3.06x 3.960 s
Encounter 5000 12 58,271 2.183 s 8.818 s 4.04x 2.611 s
Observation 50 384 19,187 12.073 s 14.146 s 1.17x 12.971 s
Observation 1000 20 19,187 1.232 s 3.885 s 3.15x 1.443 s
Observation 5000 4 19,187 0.955 s 3.496 s 3.66x 0.992 s
Condition 50 925 46,217 28.406 s 30.699 s 1.08x 28.437 s
Condition 1000 47 46,217 2.192 s 5.236 s 2.39x 2.421 s
Condition 5000 10 46,217 1.224 s 4.166 s 3.40x 1.335 s
Consent 50 398 19,887 14.529 s 25.196 s 1.73x 16.109 s
Consent 1000 20 19,887 2.941 s 14.897 s 5.07x 3.886 s
Consent 5000 4 19,887 2.440 s 13.995 s 5.74x 3.396 s
Patient 50 581 29,021 17.614 s 19.306 s 1.10x 18.444 s
Patient 1000 30 29,021 1.534 s 4.034 s 2.63x 1.701 s
Patient 5000 6 29,021 0.865 s 3.411 s 3.94x 0.973 s

At the default page size of 50, fixed paging and HTTP overhead dominates, so the XML speedup is modest. With larger pages, XML serialization becomes a larger share of total runtime and the PR shows substantially larger gains. On this dataset, PR XML is also slightly faster than PR JSON for the measured full downloads.

astruebi added 7 commits June 9, 2026 10:03
Add profiling scripts and baseline measurements for XML paging while reducing overhead in the XML response writer path.
Precompute tag fragments, streamline repeated field writing, avoid unnecessary sequence allocation and use a faster XML escaping helper in the hot path.
Add a fast path for common search Bundle.entry shapes to avoid generic XML field iteration for fullUrl, resource and search metadata.
Introduce XmlDirectWriter and dispatch selected FHIR complex types through direct XML serialization with regression coverage for Period values.
Add XmlUtf8Writer and route XML serialization through it so primitive XML output can avoid extra character encoding work.
Extend direct XML serialization to additional Period, Coding, Identifier and Meta cases, and optimize the UTF-8 writer for common ASCII output paths.
Capture the optimization context, verification results, benchmark observations and discarded experiments for follow-up review.

@alexanderkiel alexanderkiel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the approach with the low-level XmlUtf8Writer.

Benchmark a clean Woodstox XMLStreamWriter emit path against this branch. This is the real experiment worth running. Drop both XmlUtf8Writer and the hand-rolled write-xml-element, drive WstxOutputFactory directly from your xml-handlers/unform-xml walk (no data.xml Element tree, no emit). That removes the intermediate tree that made main slow, keeps a maintained library doing escaping/encoding, and deletes ~200 lines of hand-rolled byte-twiddling + the duplicated escaping now living in both Java (XmlUtf8Writer.writeEscaped) and Clojure (write-xml-str). If it's within a few percent, the maintenance win is worth it.

import java.io.OutputStream;
import java.io.Writer;

public final class XmlUtf8Writer extends Writer {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is a bit to low-level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants