Skip to content

ENH: add xarray-native open_datatree with engine= parameter#335

Open
aladinor wants to merge 17 commits into
openradar:mainfrom
aladinor:feat/open-datatree-engine
Open

ENH: add xarray-native open_datatree with engine= parameter#335
aladinor wants to merge 17 commits into
openradar:mainfrom
aladinor:feat/open-datatree-engine

Conversation

@aladinor

@aladinor aladinor commented Feb 20, 2026

Copy link
Copy Markdown
Member

Summary

Closes #329

Note: This is a draft. Full implementation depends on #333 (station coords + optional_groups) and #332 (NEXRAD chunk support) being merged first. After those land, this branch will be rebased to incorporate:

  • optional_groups parameter flow-through in open_groups_as_dict / open_datatree
  • Station coordinates as root coordinates (propagates automatically via _assign_root)
  • NEXRAD chunk list/tuple input support in open_groups_as_dict

Adds open_datatree() and open_groups_as_dict() methods with supports_groups = True to three prototype backends (ODIM, CfRadial1, NEXRAD Level2), enabling:

import xradar as xd
import xarray as xr

# New unified xradar API
dtree = xd.open_datatree("file.h5", engine="odim")

# xarray native API
dtree = xr.open_datatree("file.h5", engine="odim")

# Legacy (emits FutureWarning)
dtree = xd.io.open_odim_datatree("file.h5")

Changes

  • xradar/io/backends/common.py: Add shared _build_groups_dict() helper and _deprecation_warning() utility
  • xradar/io/backends/odim.py: Add supports_groups=True, open_groups_as_dict(), open_datatree(); refactor open_odim_datatree() to delegate with deprecation warning
  • xradar/io/backends/cfradial1.py: Same pattern (unique single-file slicing logic)
  • xradar/io/backends/nexrad_level2.py: Same pattern (store-based with sweep resolution)
  • xradar/io/__init__.py: Add engine registry and unified open_datatree(file, engine=) function
  • xradar/__init__.py: Export open_datatree at package level
  • requirements.txt: Exclude broken netCDF4 1.7.3/1.7.4
  • tests/io/test_backend_datatree.py: 24 new tests (integration, deprecation, equivalence)
  • examples/notebooks/Open-Datatree-Engine.ipynb: Demo notebook

Depends on

Test plan

@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@aladinor aladinor force-pushed the feat/open-datatree-engine branch from 3542304 to 173ae84 Compare February 26, 2026 23:18
@codecov

codecov Bot commented Feb 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.98477% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.41%. Comparing base (d723857) to head (9830e01).

Files with missing lines Patch % Lines
xradar/io/__init__.py 95.45% 1 Missing ⚠️
xradar/io/backends/common.py 97.67% 1 Missing ⚠️
xradar/io/backends/hpl.py 96.66% 1 Missing ⚠️
xradar/io/backends/nexrad_level2.py 98.43% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #335      +/-   ##
==========================================
+ Coverage   94.18%   94.41%   +0.23%     
==========================================
  Files          29       29              
  Lines        6416     6538     +122     
==========================================
+ Hits         6043     6173     +130     
+ Misses        373      365       -8     
Flag Coverage Δ
notebooktests 0.00% <0.00%> (ø)
unittests 94.41% <98.98%> (+0.23%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

aladinor added a commit to aladinor/xradar that referenced this pull request Mar 31, 2026
@aladinor aladinor force-pushed the feat/open-datatree-engine branch from a9c9519 to 6986717 Compare March 31, 2026 02:51
@aladinor aladinor marked this pull request as ready for review March 31, 2026 02:55
@aladinor aladinor marked this pull request as draft March 31, 2026 03:04
@aladinor aladinor mentioned this pull request Apr 1, 2026
1 task
@kmuehlbauer

Copy link
Copy Markdown
Collaborator

@aladinor The necessary PR are merged, do you have bandwidth to pick up working on this? This would tremendously improve user experience and overall alignment.

aladinor and others added 15 commits May 11, 2026 18:42
Implement open_datatree/open_groups_as_dict on three prototype backends
(ODIM, CfRadial1, NEXRAD Level2) with supports_groups=True, enabling
xr.open_datatree(file, engine="odim") and a unified xd.open_datatree()
entry point. Deprecate standalone open_*_datatree functions with
FutureWarning.

Closes openradar#329

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve rebase conflicts with openradar#333 (station coords + optional_groups):
- Thread optional_groups parameter through open_groups_as_dict and
  _build_groups_dict for all three backends (ODIM, CfRadial1, NEXRAD)
- Update test assertions to match optional_groups=False default
- Fix read-only array bug in util._ipol_time (use .values.copy())
Rebase fixes:
- Resolve conflicts in common.py, odim.py, cfradial1.py, nexrad_level2.py
- Keep _prepare_backend_ds from PR openradar#345 alongside _build_groups_dict
- Remove unused _attach_sweep_groups imports

Bug fixes:
- Fix site_coords → site_as_coords parameter name in ODIM and NEXRAD
- Fix _assign_root tuple unpacking in NEXRAD open_groups_as_dict
- Add incomplete_sweep drop/pad/chunk logic to NEXRAD open_groups_as_dict
- Fix site_coords undefined in open_cfradial1_datatree deprecation wrapper
- Remove duplicate _deprecation_warning call in open_odim_datatree
- Guard against empty sweep list (sweep=[]) with ValueError
- Remove unused exp_sweeps variable

New tests (30 total, up from 24):
- test_site_coords_false: verifies kwarg flows through correctly
- test_xr_open_datatree_cfradial1: was missing
- test_nexrad_groups_dict_optional_groups: was missing
- test_nexrad_empty_sweep_list_raises: edge case
- TestEngineRegistry: verifies registry completeness
- Deprecation warning count changed from >= 1 to == 1
Phase 2: Convert 7 standard-pattern backends (GAMIC, IRIS, Furuno,
Rainbow, DataMet, HPL, Metek) to support open_groups_as_dict() and
open_datatree() with supports_groups=True. All 10 backends now
registered in _ENGINE_REGISTRY.

Code improvements:
- Extract _resolve_sweeps helper to common.py, replacing duplicated
  sweep normalization in all backends
- Fix stacklevel=3 → 4 in _deprecation_warning
- Fix HPL sweep=[N] incorrect +1 offset
- Fix open_hpl_datatree optional=None → True
- Fix GAMIC/IRIS deprecated wrappers missing site_as_coords remap
- Strip station vars from sweeps in _build_groups_dict

Tests (74 total):
- All 9 engines parametrized (basic open, int/string sweep, kwargs,
  empty list guard)
- CfRadial1 tested separately (needs engine="h5netcdf")
- xr.open_datatree tested for 5 engines
- supports_groups verified for all 10 engines
- Engine registry exact match assertion
- Deprecation FutureWarning tested for all 10 deprecated functions
UF backend:
- Add supports_groups=True, open_groups_as_dict(), open_datatree()
- Use _resolve_sweeps for sweep normalization
- Drop _STATION_VARS from sweeps in groups_dict
- Deprecate open_uf_datatree() with FutureWarning
- Register "uf" in _ENGINE_REGISTRY (11/11 complete)

Bug fixes:
- Fix NEXRAD deprecated wrapper: site_coords=site_as_coords instead
  of hardcoded False
- Fix NEXRAD/UF: drop _STATION_VARS from sweep datasets in
  open_groups_as_dict (matching _build_groups_dict behavior)

Tests (87 total):
- Add xr.open_datatree tests for all 11 engines
- UF added to all parametrized test fixtures
- Engine registry now asserts all 11 engines
Phase 2 of PR openradar#335 absorption after rebase onto upstream/main.

- Extract `_sweep_attrs_from_msg5_elev(elev)` helper, shared by
  `_assign_sweep_attrs` and `NexradLevel2BackendEntrypoint.open_groups_as_dict`
  to avoid duplicating the MSG_5_ELEV -> attrs mapping in two places.
- In `open_nexradlevel2_datatree`, translate the legacy `site_coords=`
  kwarg into the canonical `site_as_coords` parameter instead of
  silently dropping it. Mirrors the precedent at
  `xradar/io/backends/cfradial1.py:382` and keeps existing callers
  using the old name working.
- Add `test_open_nexradlevel2_datatree_legacy_site_coords_kwarg`
  pinning that translation against silent regressions.
- Add `test_actual_elevation_cuts_invariant_under_sweep_selection`
  confirming the root attr reflects what's recorded in the file,
  not the user-selected sweeps.
…adial2"

Phase 3 of PR openradar#335. Adds `CfRadial2BackendEntrypoint` so users can:

    xr.open_datatree(file, engine="cfradial2")
    xd.open_datatree(file, engine="cfradial2")
    xr.open_dataset(file, engine="cfradial2", group="sweep_0")

The legacy `open_cfradial2_datatree(...)` becomes a FutureWarning shim
that delegates to the new class. Pattern mirrors the established
conversions (odim, cfradial1, gamic, iris, nexradlevel2, ...).

- Extract `_build_cfradial2_dtree_dict(...)` helper.
- Add `CfRadial2BackendEntrypoint` with `open_dataset`,
  `open_groups_as_dict`, `open_datatree`. The per-group `open_dataset`
  resolves canonical sweep names (`sweep_2`) against file-native
  variants (`sweep_02`) and loads the dataset inside the
  `with open_datatree(...)` block to avoid lazy-load failures after
  the underlying file handle closes.
- Register in `xradar/io/__init__.py:_ENGINE_REGISTRY` and
  `pyproject.toml [project.entry-points."xarray.backends"]`.
- Add session-scoped `cfradial2_file` fixture in `tests/conftest.py`.
- Add `open_cfradial2_datatree` to `_DEPRECATED_FUNCTIONS` map
  (FutureWarning regression guard).
- Add four targeted tests for the new public APIs.
- Fix `test_open_cfradial2_roundtrip`: cfradial1 attaches station
  coords to each sweep, cfradial2 places them at root only; drop
  station coords on the cfradial1 side so the DBZ comparison
  succeeds.
Phase 4 of PR openradar#335. Adds `supports_groups`, `open_groups_as_dict`,
and `open_datatree` to `IMDBackendEntrypoint` so users can:

    xr.open_datatree(file, engine="imd")
    xd.open_datatree(file, engine="imd")

The legacy `open_imd_datatree(...)` stays public and undeprecated
because IMD volumes legitimately span multiple files (one sweep per
file, stacked via `util.create_volume`), which the `engine="imd"`
registry entry does not support. The function's docstring documents
the single-vs-multi-file split with a `.. note::` block.

- Extract `_build_single_imd_dtree_dict(...)` helper.
- Add the standard xarray decoder kwargs (`mask_and_scale`,
  `decode_times`, `decode_timedelta=False`, ...) to
  `open_groups_as_dict` so the datatree path matches `open_dataset`.
- Preserve file-handle closer across `_conform_imd_sweep`'s rename
  chain via `ds.set_close(...)` so `open_datatree(...).close()`
  properly releases the netcdf4 handle.
- Register `IMDBackendEntrypoint` in `_ENGINE_REGISTRY`.
- Add `test_xr_open_datatree_imd_engine`,
  `test_xd_open_datatree_imd_engine`, and
  `test_open_imd_datatree_no_futurewarning` (carve-out invariant).
Phase 5 of PR openradar#335. Replaces the legacy .ipynb (wiped by openradar#348's MyST
move) with docs/notebooks/Open-Datatree-Engine.md and demos the new
xr.open_datatree(file, engine=...) API across every registered engine
(odim, cfradial1, cfradial2, nexradlevel2, gamic, iris, furuno,
rainbow, datamet, hpl, metek, uf, imd), including the IMD multi-file
carve-out via open_imd_datatree.

- Add xd.io.list_engines() as the public way to enumerate registered
  engines (replaces a _ENGINE_REGISTRY private import that previously
  leaked into user docs).
- Register the new notebook under the "Get started" toctree in
  docs/usage.md.
- Add test_demo_notebook_lists_all_engines as a bitrot guard so a new
  entry in _ENGINE_REGISTRY cannot land without a notebook section.
- Smoke-tested via jupyter nbconvert --execute end-to-end.
…2 + imd

Phase 6 of PR openradar#335.

- Add ("cfradial2", "cfradial2_file") to engine_and_file so the five
  parametrized tests in TestXdOpenDatatree exercise cfradial2.
- Add ("imd", "imd_file") with pytest.mark.skip so a future maintainer
  can't accidentally re-include IMD without noticing it is a
  single-sweep-per-file carve-out.
- Add test_xr_open_datatree_cfradial2 and test_xr_open_datatree_imd to
  TestXrOpenDatatree.
- Add TestIMDMultiFile covering the single-file engine path and the
  module-level multi-file open_imd_datatree(list_of_paths) path.
- Make CfRadial2BackendEntrypoint.open_groups_as_dict accept and
  *honor* site_coords (False now drops latitude/longitude/altitude
  from root coords; True is the default). Add TestCfRadial2SiteCoords
  covering both branches.
- Move the empty-sweep-list check into cfradial2's
  _iter_selected_sweeps so the validation lives next to the parsing
  logic (single source of truth).
Phase 7 of PR openradar#335.

NEXRAD parameters were previously impossible to discover from the
rendered docs because every `open_groups_as_dict` and `open_datatree`
method shipped with an empty docstring. Fix that for all 13 backends.

xradar/io/backends/common.py gains:
- COMMON_BACKEND_PARAMS_DOC: the shared NumPy-style Parameters block
  covering `filename_or_obj`, the seven xarray decoder kwargs, plus
  xradar-specific `sweep`, `first_dim`, `optional`, `optional_groups`.
  `optional_groups` explicitly states the three subgroups it controls
  (`/radar_parameters`, `/georeferencing_correction`,
  `/radar_calibration`), defaults to False.
- REINDEX_PARAMS_DOC, HDF5_PARAMS_DOC, SITE_COORDS_PARAM_DOC,
  LOCK_PARAM_DOC: per-feature blocks shared across backends.
- _compose_docstring(summary, *extra_blocks): textwrap-normalized
  composer that authors can write at any indent level.

Each backend imports the blocks it needs and attaches the composed
result via `ClassName.method.__doc__ = _compose_docstring(...)` after
the class definition.

docs/conf.py: add `autodoc_default_options = {"members": True, ...}`
so Sphinx renders the new method docstrings on each entrypoint class.

tests/io/test_backend_datatree.py: add TestDocstrings (parametrized
over _ENGINE_REGISTRY) so a future refactor cannot silently delete a
docstring without breaking a test. Add two unit tests for
_compose_docstring.
Phase 8 (final audit) of PR openradar#335. The original Phase 1/2 backend
conversion shipped four regressions in the legacy `open_*_datatree`
wrappers and in cfradial1's per-sweep handling. Earlier phase smoke
tests didn't run the full per-backend test files, so these escaped
until the Phase 8 sweep.

- odim: translate legacy `site_as_coords=` kwarg to the canonical
  `site_coords=` accepted by the entrypoint (NEXRAD/cfradial1 pattern).
- uf: normalize `"/sweep_N"` NodePath strings and raise ValueError on
  non-int/non-str list items (NEXRAD pattern).
- hpl: HplFile stores a junk lead-in entry at internal `sweep_0` for
  multi-sweep files. Shift integer sweep indices by +1 ONLY when
  n_sweeps > 1, so the new engine= API matches upstream behavior.
  Single-sweep files are unaffected (no junk entry).
- cfradial1: drop _STATION_VARS from per-sweep datasets in
  open_groups_as_dict so station coords live only on the root,
  matching upstream's _attach_sweep_groups behavior.
- history.md: enrich the PR openradar#335 entry to mention cfradial2/imd,
  xd.io.list_engines(), NumPy-style docstrings, and the demo
  notebook.

The remaining test_open_furuno_datatree failure (asserts 21 vars,
gets 18) is pre-existing on upstream/main and unrelated.
@aladinor aladinor force-pushed the feat/open-datatree-engine branch from f105ff5 to 07709da Compare May 12, 2026 15:17
@aladinor aladinor marked this pull request as ready for review May 12, 2026 15:17
… (18)

The assertion `len(dtree[sample_sweep].variables) == 21` was bumped from
18 to 21 by PR openradar#337 with the design intent of keeping latitude/longitude/
altitude as plain data variables on each sweep. But both the legacy
`_attach_sweep_groups` and the new `_build_groups_dict` actually drop
station vars from sweeps entirely (consistent with the parallel
`TestStationCoordsOnRoot` tests in test_io.py, which assert station vars
are NOT in sweep coords *and* NOT in sweep data_vars for all 4 backends
they cover).

Net result: the 21-count assertion was stale on upstream/main as well —
this commit just realigns it with the actual cross-backend behavior.
@aladinor aladinor requested review from kmuehlbauer and removed request for kmuehlbauer May 12, 2026 15:43
@aladinor

Copy link
Copy Markdown
Member Author

@openradar/developers — heads-up to reviewers: this PR was developed with the help of Claude Code (Anthropic's coding agent), specifically the Opus 4.7 (1M context) model (claude-opus-4-7). Concretely, the agent drove the 8-phase rebase + per-backend conversion + docstring normalisation, with each phase gated by my review/approval and run through a 5-skill audit pass (/simplify, /xarray, /python-design-patterns, /python-code-review, /python-testing) before commit.

Because it's a large, cross-cutting change touching every backend + the entrypoint registry + docs + tests, I'd really appreciate:

  • Multiple human reviewers on this one. The xarray-native engine= surface is now the canonical user-facing API, so any subtle behavior drift between backends will be hard to walk back later.
  • AI-assisted review is welcome too — if you have Claude / Copilot / Cursor / etc. set up to review PRs, please run it. The agent's own audits flagged the regressions I fixed in commits caf1937, 07709da, and db24741, but an independent AI pass (different model, different prompts) would help catch anything my agent's audits glossed over.

Known rough edges I'm aware of (happy to address as separate issues or in-PR depending on reviewer preference):

Thanks for your patience reviewing!

Phase 8 follow-up. Replace every `xd.io.open_<format>_datatree(...)`
call in the existing 20 notebooks with the new
`xd.open_datatree(file, engine="<format>", ...)` API so the rendered
docs teach the canonical surface and stop emitting `FutureWarning`s
during doc builds.

- 18 notebooks migrated: Assign_GeoCoords, CfRadial1, CfRadial1_Export,
  CfRadial1_Model_Transformation, Furuno, GAMIC, Georeference_TargetCRS,
  HaloPhotonics, Iris, MRR, Mapping_Sweeps, NexradLevel2, ODIM_H5,
  Rainbow, Transform, UF, multiple-sweeps-into-volume-scan,
  nexrad_read_chunks, plot-ppi.
- IMD: single-file path now uses `xd.open_datatree(file, engine="imd")`;
  multi-file path keeps `xd.io.open_imd_datatree([files])` (the
  documented carve-out from the engine= API).
- Open-Datatree-Engine.md: the one remaining `xd.io.open_odim_datatree`
  call is intentional — it's in the "deprecated shims" section
  demonstrating the FutureWarning.
- `help(xd.io.open_*_datatree)` / `?xd.io.open_*_datatree` introspection
  calls updated to `help(xd.open_datatree)` / `?xd.open_datatree`.
- HaloPhotonics: drop `backend_kwargs=dict(...)` indirection in favor
  of direct `latitude=` / `longitude=` kwargs accepted by the entrypoint.

Smoke-tested via `pytest -n auto --dist loadscope docs/notebooks`:
25 of 26 notebooks pass. The one failure
(`multiple-sweeps-into-volume-scan`) is `RuntimeError: Requested
MovieWriter (ffmpeg) not available` — an env-only issue (ffmpeg is
not installed in my dev shell, but CI has it). All migrated calls
themselves execute cleanly.
@syedhamidali

Copy link
Copy Markdown
Member

I think we should delay it until we complete some of the tasks in Milestone 1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Implement xarray-native open_datatree with engine parameter

3 participants