Skip to content

Consolidate Asset/Dandiset models; gate publish validation on datePublished#419

Draft
candleindark wants to merge 1 commit into
masterfrom
consolidate-models
Draft

Consolidate Asset/Dandiset models; gate publish validation on datePublished#419
candleindark wants to merge 1 commit into
masterfrom
consolidate-models

Conversation

@candleindark

@candleindark candleindark commented Jun 8, 2026

Copy link
Copy Markdown
Member

Summary

Make every model class's schemaKey value equal its class name, so that an eventual LinkML translation can use schemaKey as its type designator (designates_type). Three classes violated this (BareAsset"Asset", PublishedAsset"Asset", PublishedDandiset"Dandiset"), and fixing them takes two distinct changes:

  1. Consolidate the publication-specific variants into their base classes: PublishedDandiset merges into Dandiset and PublishedAsset into Asset. Publication requirements now fire via a datePublished-gated validator instead of separate classes.
  2. Align BareAsset.schemaKey to "BareAsset"

Stored schemaKey values in archive data are unchanged, and the old names remain as deprecated aliases so dandi-cli/dandi-archive imports keep working. Adopting this release does, however, require one lockstep dandi-cli change (see Follow-ups).

What changed (all in dandischema/models.py)
  • Merge PublishedDandisetDandiset, PublishedAssetAsset. Publication-only fields (doi, publishedBy, datePublished, releaseNotes on Dandiset; publishedBy, datePublished on Asset) become optional, and the publication requirements move into a datePublished-gated check_publication_status model validator:

    • datePublished is None (draft) → publication-only fields must be absent;
    • datePublished set (published) → enforce the former Published* rules (publishedBy/url/doi presence, stricter id/url patterns, check_filesbytes, digest_sha256check), reporting all violations in one error.

    dandi-archive injects datePublished before validating, so the gated checks fire exactly as the Published* classes did.

  • Keep BareAsset (Asset still inherits it); align its schemaKey to Literal["BareAsset", "Asset"] (so Asset narrows to "Asset", the Contributor/Activity idiom).

  • Remove the Publishable mixin; add PublishedDandiset/PublishedAsset as deprecated aliases.

  • Simplify ensure_schemakey to a plain schemaKey == class-name check.

  • to_datacite asserts the published precondition (datePublished set) the PublishedDandiset type used to guarantee.

Why gate on datePublished rather than an explicit (context-triggered) validator

The publish-only checks could instead be triggered explicitly by the caller (e.g. a Pydantic validation-context flag passed by the publish flow) rather than by the presence of datePublished. We chose data-driven gating because:

  • It can be expressed in LinkML; an explicit context cannot. Gating on a real field is a conditional constraint on the data, which is exactly what a LinkML rule expresses (gen-json-schema emits it as if/then). A caller-supplied validation mode has no LinkML equivalent — LinkML describes only the data instance, not how validation is invoked — so it would have to live as hand-written Python forever, outside the source of truth, working against the migration this change is meant to enable.
  • It travels with the data. The gated checks fire through both validate() and direct construction (Dandiset(**data) / model_validate). A context flag only rides model_validate(..., context=...): the terse PublishedDandiset(**meta) form (used by to_datacite and in tests) cannot pass one, so it would silently skip the publish checks. Beyond the call-site rewrites that would force, a constructor whose enforcement quietly depends on how it is invoked is a lasting programming hazard.
  • It matches the domain. datePublished present ⟺ published is a real invariant: a draft never carries it, and the archive injects it only when building a publishable version. "Published" is a state of the record, not a mode a caller opts into.
  • Published-ness stays introspectable. With gating, "is this published?" is just obj.datePublished is not None; with a context flag, whether an object was validated as published is ephemeral to the call and recorded nowhere on the object.
Compatibility
  • Generated JSON Schema (verified by before/after diff): draft Dandiset/Asset gain only optional, readOnly publication properties — nothing newly required, no pattern tightened — so the dandi-archive Meditor is unaffected (it hides readOnly props). The published-*.json schemas become the relaxed versions, with publication strictness enforced by the gated validator.
  • PublishedDandiset/PublishedAsset/BareAsset imports continue to work in dandi-cli/dandi-archive.
Follow-ups (separate PRs)
  • dandi-cli (lockstep with adopting this release): set schemaKey = "Asset" on bare metadata at upload, since BareAsset.schemaKey is now "BareAsset" and the server does not normalize it.
  • Later: remove the deprecated aliases once consumers adopt the consolidated names (and update dandi-archive's views/schema.py model mapping).

Test plan

  • tox -e py3 (269 passed, 17 skipped — skips are environment-only: no DOI_PREFIX / instance name not "DANDI")
  • tox -e lint,typing
  • Generated JSON Schema before/after diff reviewed
  • Added tests for the datePublished-gated publish requirements and the coherence invariant
  • Confirm against dandi-archive's publish/validation flow once it adopts this release

@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 37.77778% with 56 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.45%. Comparing base (d752738) to head (1298b26).

Files with missing lines Patch % Lines
dandischema/models.py 0.00% 55 Missing ⚠️
dandischema/datacite/__init__.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #419      +/-   ##
==========================================
+ Coverage   48.31%   48.45%   +0.13%     
==========================================
  Files          19       19              
  Lines        2434     2456      +22     
==========================================
+ Hits         1176     1190      +14     
- Misses       1258     1266       +8     
Flag Coverage Δ
unittests 48.45% <37.77%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…Published

Collapse the publication-specific model variants into their base classes so
that each class's schemaKey value matches its class name, which is what an
eventual LinkML translation needs to use schemaKey as a type designator
(designates_type). The three classes whose schemaKey differed from their class
name (BareAsset -> "Asset", PublishedAsset -> "Asset", PublishedDandiset ->
"Dandiset") were the blockers.

Changes (all in dandischema/models.py):

- Merge PublishedDandiset into Dandiset and PublishedAsset into Asset. The
  publication-only fields (doi, publishedBy, datePublished, releaseNotes on
  Dandiset; publishedBy, datePublished on Asset) become optional, and the
  publication requirements move into a datePublished-gated
  `check_publication_status` model validator on each class:
    - when datePublished is None (a draft), the publication-only fields must be
      absent;
    - when datePublished is set (published), enforce the former Published*
      requirements (publishedBy/url/doi presence, the stricter id/url patterns,
      check_filesbytes, digest_sha256check). All violations are reported
      together in one error.
  dandi-archive's publish flow injects datePublished before validating, so the
  gated checks fire exactly as the Published* classes did before.

- Keep BareAsset as a distinct class (Asset still inherits from it) but align
  its schemaKey to Literal["BareAsset"], so both BareAsset and Asset are
  schemaKey-aligned. The client (dandi-cli) is responsible for setting
  schemaKey to "Asset" when uploading bare metadata as an Asset.

- Remove the Publishable mixin; add PublishedDandiset/PublishedAsset as
  deprecated aliases of Dandiset/Asset for backward compatibility (dandi-cli,
  dandi-archive). These will be removed in a follow-up once consumers migrate.

- Simplify DandiBaseModel.ensure_schemakey to a plain schemaKey == class-name
  check now that no class intentionally diverges.

to_datacite now asserts the published precondition (datePublished set) that the
PublishedDandiset type used to guarantee. Tests updated for the merged models:
the published variants report the same missing fields as their base classes,
and the publication requirements are exercised on complete, datePublished
instances; new tests cover the publication-coherence invariant.

Generated JSON Schema diff: the draft schemas (Dandiset, Asset) gain only
optional, readOnly publication properties (nothing newly required, no pattern
tightened), so the dandi-archive Meditor is unaffected; the published-*.json
schemas become the relaxed versions, with publication strictness now enforced
by the gated Pydantic validator.

Co-Authored-By: Claude Code 2.1.161 / Claude Opus 4.8 claude-opus-4-8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant