ENH: add xarray-native open_datatree with engine= parameter#335
ENH: add xarray-native open_datatree with engine= parameter#335aladinor wants to merge 17 commits into
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
3542304 to
173ae84
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #335 +/- ##
==========================================
+ Coverage 94.18% 94.41% +0.23%
==========================================
Files 29 29
Lines 6416 6538 +122
==========================================
+ Hits 6043 6173 +130
+ Misses 373 365 -8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
a9c9519 to
6986717
Compare
|
@aladinor The necessary PR are merged, do you have bandwidth to pick up working on this? This would tremendously improve user experience and overall alignment. |
Implement open_datatree/open_groups_as_dict on three prototype backends (ODIM, CfRadial1, NEXRAD Level2) with supports_groups=True, enabling xr.open_datatree(file, engine="odim") and a unified xd.open_datatree() entry point. Deprecate standalone open_*_datatree functions with FutureWarning. Closes openradar#329 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve rebase conflicts with openradar#333 (station coords + optional_groups): - Thread optional_groups parameter through open_groups_as_dict and _build_groups_dict for all three backends (ODIM, CfRadial1, NEXRAD) - Update test assertions to match optional_groups=False default - Fix read-only array bug in util._ipol_time (use .values.copy())
Rebase fixes: - Resolve conflicts in common.py, odim.py, cfradial1.py, nexrad_level2.py - Keep _prepare_backend_ds from PR openradar#345 alongside _build_groups_dict - Remove unused _attach_sweep_groups imports Bug fixes: - Fix site_coords → site_as_coords parameter name in ODIM and NEXRAD - Fix _assign_root tuple unpacking in NEXRAD open_groups_as_dict - Add incomplete_sweep drop/pad/chunk logic to NEXRAD open_groups_as_dict - Fix site_coords undefined in open_cfradial1_datatree deprecation wrapper - Remove duplicate _deprecation_warning call in open_odim_datatree - Guard against empty sweep list (sweep=[]) with ValueError - Remove unused exp_sweeps variable New tests (30 total, up from 24): - test_site_coords_false: verifies kwarg flows through correctly - test_xr_open_datatree_cfradial1: was missing - test_nexrad_groups_dict_optional_groups: was missing - test_nexrad_empty_sweep_list_raises: edge case - TestEngineRegistry: verifies registry completeness - Deprecation warning count changed from >= 1 to == 1
Phase 2: Convert 7 standard-pattern backends (GAMIC, IRIS, Furuno, Rainbow, DataMet, HPL, Metek) to support open_groups_as_dict() and open_datatree() with supports_groups=True. All 10 backends now registered in _ENGINE_REGISTRY. Code improvements: - Extract _resolve_sweeps helper to common.py, replacing duplicated sweep normalization in all backends - Fix stacklevel=3 → 4 in _deprecation_warning - Fix HPL sweep=[N] incorrect +1 offset - Fix open_hpl_datatree optional=None → True - Fix GAMIC/IRIS deprecated wrappers missing site_as_coords remap - Strip station vars from sweeps in _build_groups_dict Tests (74 total): - All 9 engines parametrized (basic open, int/string sweep, kwargs, empty list guard) - CfRadial1 tested separately (needs engine="h5netcdf") - xr.open_datatree tested for 5 engines - supports_groups verified for all 10 engines - Engine registry exact match assertion - Deprecation FutureWarning tested for all 10 deprecated functions
UF backend: - Add supports_groups=True, open_groups_as_dict(), open_datatree() - Use _resolve_sweeps for sweep normalization - Drop _STATION_VARS from sweeps in groups_dict - Deprecate open_uf_datatree() with FutureWarning - Register "uf" in _ENGINE_REGISTRY (11/11 complete) Bug fixes: - Fix NEXRAD deprecated wrapper: site_coords=site_as_coords instead of hardcoded False - Fix NEXRAD/UF: drop _STATION_VARS from sweep datasets in open_groups_as_dict (matching _build_groups_dict behavior) Tests (87 total): - Add xr.open_datatree tests for all 11 engines - UF added to all parametrized test fixtures - Engine registry now asserts all 11 engines
Phase 2 of PR openradar#335 absorption after rebase onto upstream/main. - Extract `_sweep_attrs_from_msg5_elev(elev)` helper, shared by `_assign_sweep_attrs` and `NexradLevel2BackendEntrypoint.open_groups_as_dict` to avoid duplicating the MSG_5_ELEV -> attrs mapping in two places. - In `open_nexradlevel2_datatree`, translate the legacy `site_coords=` kwarg into the canonical `site_as_coords` parameter instead of silently dropping it. Mirrors the precedent at `xradar/io/backends/cfradial1.py:382` and keeps existing callers using the old name working. - Add `test_open_nexradlevel2_datatree_legacy_site_coords_kwarg` pinning that translation against silent regressions. - Add `test_actual_elevation_cuts_invariant_under_sweep_selection` confirming the root attr reflects what's recorded in the file, not the user-selected sweeps.
…adial2" Phase 3 of PR openradar#335. Adds `CfRadial2BackendEntrypoint` so users can: xr.open_datatree(file, engine="cfradial2") xd.open_datatree(file, engine="cfradial2") xr.open_dataset(file, engine="cfradial2", group="sweep_0") The legacy `open_cfradial2_datatree(...)` becomes a FutureWarning shim that delegates to the new class. Pattern mirrors the established conversions (odim, cfradial1, gamic, iris, nexradlevel2, ...). - Extract `_build_cfradial2_dtree_dict(...)` helper. - Add `CfRadial2BackendEntrypoint` with `open_dataset`, `open_groups_as_dict`, `open_datatree`. The per-group `open_dataset` resolves canonical sweep names (`sweep_2`) against file-native variants (`sweep_02`) and loads the dataset inside the `with open_datatree(...)` block to avoid lazy-load failures after the underlying file handle closes. - Register in `xradar/io/__init__.py:_ENGINE_REGISTRY` and `pyproject.toml [project.entry-points."xarray.backends"]`. - Add session-scoped `cfradial2_file` fixture in `tests/conftest.py`. - Add `open_cfradial2_datatree` to `_DEPRECATED_FUNCTIONS` map (FutureWarning regression guard). - Add four targeted tests for the new public APIs. - Fix `test_open_cfradial2_roundtrip`: cfradial1 attaches station coords to each sweep, cfradial2 places them at root only; drop station coords on the cfradial1 side so the DBZ comparison succeeds.
Phase 4 of PR openradar#335. Adds `supports_groups`, `open_groups_as_dict`, and `open_datatree` to `IMDBackendEntrypoint` so users can: xr.open_datatree(file, engine="imd") xd.open_datatree(file, engine="imd") The legacy `open_imd_datatree(...)` stays public and undeprecated because IMD volumes legitimately span multiple files (one sweep per file, stacked via `util.create_volume`), which the `engine="imd"` registry entry does not support. The function's docstring documents the single-vs-multi-file split with a `.. note::` block. - Extract `_build_single_imd_dtree_dict(...)` helper. - Add the standard xarray decoder kwargs (`mask_and_scale`, `decode_times`, `decode_timedelta=False`, ...) to `open_groups_as_dict` so the datatree path matches `open_dataset`. - Preserve file-handle closer across `_conform_imd_sweep`'s rename chain via `ds.set_close(...)` so `open_datatree(...).close()` properly releases the netcdf4 handle. - Register `IMDBackendEntrypoint` in `_ENGINE_REGISTRY`. - Add `test_xr_open_datatree_imd_engine`, `test_xd_open_datatree_imd_engine`, and `test_open_imd_datatree_no_futurewarning` (carve-out invariant).
Phase 5 of PR openradar#335. Replaces the legacy .ipynb (wiped by openradar#348's MyST move) with docs/notebooks/Open-Datatree-Engine.md and demos the new xr.open_datatree(file, engine=...) API across every registered engine (odim, cfradial1, cfradial2, nexradlevel2, gamic, iris, furuno, rainbow, datamet, hpl, metek, uf, imd), including the IMD multi-file carve-out via open_imd_datatree. - Add xd.io.list_engines() as the public way to enumerate registered engines (replaces a _ENGINE_REGISTRY private import that previously leaked into user docs). - Register the new notebook under the "Get started" toctree in docs/usage.md. - Add test_demo_notebook_lists_all_engines as a bitrot guard so a new entry in _ENGINE_REGISTRY cannot land without a notebook section. - Smoke-tested via jupyter nbconvert --execute end-to-end.
…2 + imd Phase 6 of PR openradar#335. - Add ("cfradial2", "cfradial2_file") to engine_and_file so the five parametrized tests in TestXdOpenDatatree exercise cfradial2. - Add ("imd", "imd_file") with pytest.mark.skip so a future maintainer can't accidentally re-include IMD without noticing it is a single-sweep-per-file carve-out. - Add test_xr_open_datatree_cfradial2 and test_xr_open_datatree_imd to TestXrOpenDatatree. - Add TestIMDMultiFile covering the single-file engine path and the module-level multi-file open_imd_datatree(list_of_paths) path. - Make CfRadial2BackendEntrypoint.open_groups_as_dict accept and *honor* site_coords (False now drops latitude/longitude/altitude from root coords; True is the default). Add TestCfRadial2SiteCoords covering both branches. - Move the empty-sweep-list check into cfradial2's _iter_selected_sweeps so the validation lives next to the parsing logic (single source of truth).
Phase 7 of PR openradar#335. NEXRAD parameters were previously impossible to discover from the rendered docs because every `open_groups_as_dict` and `open_datatree` method shipped with an empty docstring. Fix that for all 13 backends. xradar/io/backends/common.py gains: - COMMON_BACKEND_PARAMS_DOC: the shared NumPy-style Parameters block covering `filename_or_obj`, the seven xarray decoder kwargs, plus xradar-specific `sweep`, `first_dim`, `optional`, `optional_groups`. `optional_groups` explicitly states the three subgroups it controls (`/radar_parameters`, `/georeferencing_correction`, `/radar_calibration`), defaults to False. - REINDEX_PARAMS_DOC, HDF5_PARAMS_DOC, SITE_COORDS_PARAM_DOC, LOCK_PARAM_DOC: per-feature blocks shared across backends. - _compose_docstring(summary, *extra_blocks): textwrap-normalized composer that authors can write at any indent level. Each backend imports the blocks it needs and attaches the composed result via `ClassName.method.__doc__ = _compose_docstring(...)` after the class definition. docs/conf.py: add `autodoc_default_options = {"members": True, ...}` so Sphinx renders the new method docstrings on each entrypoint class. tests/io/test_backend_datatree.py: add TestDocstrings (parametrized over _ENGINE_REGISTRY) so a future refactor cannot silently delete a docstring without breaking a test. Add two unit tests for _compose_docstring.
Phase 8 (final audit) of PR openradar#335. The original Phase 1/2 backend conversion shipped four regressions in the legacy `open_*_datatree` wrappers and in cfradial1's per-sweep handling. Earlier phase smoke tests didn't run the full per-backend test files, so these escaped until the Phase 8 sweep. - odim: translate legacy `site_as_coords=` kwarg to the canonical `site_coords=` accepted by the entrypoint (NEXRAD/cfradial1 pattern). - uf: normalize `"/sweep_N"` NodePath strings and raise ValueError on non-int/non-str list items (NEXRAD pattern). - hpl: HplFile stores a junk lead-in entry at internal `sweep_0` for multi-sweep files. Shift integer sweep indices by +1 ONLY when n_sweeps > 1, so the new engine= API matches upstream behavior. Single-sweep files are unaffected (no junk entry). - cfradial1: drop _STATION_VARS from per-sweep datasets in open_groups_as_dict so station coords live only on the root, matching upstream's _attach_sweep_groups behavior. - history.md: enrich the PR openradar#335 entry to mention cfradial2/imd, xd.io.list_engines(), NumPy-style docstrings, and the demo notebook. The remaining test_open_furuno_datatree failure (asserts 21 vars, gets 18) is pre-existing on upstream/main and unrelated.
f105ff5 to
07709da
Compare
… (18) The assertion `len(dtree[sample_sweep].variables) == 21` was bumped from 18 to 21 by PR openradar#337 with the design intent of keeping latitude/longitude/ altitude as plain data variables on each sweep. But both the legacy `_attach_sweep_groups` and the new `_build_groups_dict` actually drop station vars from sweeps entirely (consistent with the parallel `TestStationCoordsOnRoot` tests in test_io.py, which assert station vars are NOT in sweep coords *and* NOT in sweep data_vars for all 4 backends they cover). Net result: the 21-count assertion was stale on upstream/main as well — this commit just realigns it with the actual cross-backend behavior.
|
@openradar/developers — heads-up to reviewers: this PR was developed with the help of Claude Code (Anthropic's coding agent), specifically the Opus 4.7 (1M context) model ( Because it's a large, cross-cutting change touching every backend + the entrypoint registry + docs + tests, I'd really appreciate:
Known rough edges I'm aware of (happy to address as separate issues or in-PR depending on reviewer preference):
Thanks for your patience reviewing! |
Phase 8 follow-up. Replace every `xd.io.open_<format>_datatree(...)` call in the existing 20 notebooks with the new `xd.open_datatree(file, engine="<format>", ...)` API so the rendered docs teach the canonical surface and stop emitting `FutureWarning`s during doc builds. - 18 notebooks migrated: Assign_GeoCoords, CfRadial1, CfRadial1_Export, CfRadial1_Model_Transformation, Furuno, GAMIC, Georeference_TargetCRS, HaloPhotonics, Iris, MRR, Mapping_Sweeps, NexradLevel2, ODIM_H5, Rainbow, Transform, UF, multiple-sweeps-into-volume-scan, nexrad_read_chunks, plot-ppi. - IMD: single-file path now uses `xd.open_datatree(file, engine="imd")`; multi-file path keeps `xd.io.open_imd_datatree([files])` (the documented carve-out from the engine= API). - Open-Datatree-Engine.md: the one remaining `xd.io.open_odim_datatree` call is intentional — it's in the "deprecated shims" section demonstrating the FutureWarning. - `help(xd.io.open_*_datatree)` / `?xd.io.open_*_datatree` introspection calls updated to `help(xd.open_datatree)` / `?xd.open_datatree`. - HaloPhotonics: drop `backend_kwargs=dict(...)` indirection in favor of direct `latitude=` / `longitude=` kwargs accepted by the entrypoint. Smoke-tested via `pytest -n auto --dist loadscope docs/notebooks`: 25 of 26 notebooks pass. The one failure (`multiple-sweeps-into-volume-scan`) is `RuntimeError: Requested MovieWriter (ffmpeg) not available` — an env-only issue (ffmpeg is not installed in my dev shell, but CI has it). All migrated calls themselves execute cleanly.
|
I think we should delay it until we complete some of the tasks in Milestone 1.0. |
Summary
Closes #329
Adds
open_datatree()andopen_groups_as_dict()methods withsupports_groups = Trueto three prototype backends (ODIM, CfRadial1, NEXRAD Level2), enabling:Changes
xradar/io/backends/common.py: Add shared_build_groups_dict()helper and_deprecation_warning()utilityxradar/io/backends/odim.py: Addsupports_groups=True,open_groups_as_dict(),open_datatree(); refactoropen_odim_datatree()to delegate with deprecation warningxradar/io/backends/cfradial1.py: Same pattern (unique single-file slicing logic)xradar/io/backends/nexrad_level2.py: Same pattern (store-based with sweep resolution)xradar/io/__init__.py: Add engine registry and unifiedopen_datatree(file, engine=)functionxradar/__init__.py: Exportopen_datatreeat package levelrequirements.txt: Exclude broken netCDF4 1.7.3/1.7.4tests/io/test_backend_datatree.py: 24 new tests (integration, deprecation, equivalence)examples/notebooks/Open-Datatree-Engine.ipynb: Demo notebookDepends on
optional_groupsparameter (will require rebase)open_groups_as_dict)Test plan
pytest tests/io/test_backend_datatree.py)ruff checkcleanblack --checkcleanxd.open_datatree()works for all 3 enginesxr.open_datatree()works for all 3 enginesFutureWarningand still produce correct results