DOC: AI Usage Policy (draft for discussion)#363
Conversation
Adds docs/ai_policy.md, adapted from xarray's policy (doc/contribute/ai-policy.md) with attribution, and xradar-specific additions: - "CI, Packaging, and Dependency Changes" subsection requiring an issue-first discussion before any AI-assisted PR touches GitHub Actions, dependencies, pyproject/environment files, pre-commit config, or security-sensitive areas. Rationale: supply-chain risk, raised by @zssherman in openradar#354. - "Disclosing AI Usage" section recommending (not requiring) that PR descriptions note the tool/model and version when AI was used. Also: - Short "AI Usage Policy" pointer section in CONTRIBUTING.md linking to the full policy. - Adds ai_policy to docs/index.md toctree. - History entry (PR number placeholder). Discussed in openradar#354.
Reframe "Large AI-Assisted Contributions" -> "Prefer Small PRs and Open an Issue First". Drop the absolute 2,000-line example in favor of review burden as the criterion (a 100-line change can also be too dense to review in isolation). Add an explicit "strongly encouraged" issue-first step so maintainers can validate scope and structure before any code is written.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #363 +/- ##
=======================================
Coverage 93.85% 93.85%
=======================================
Files 28 28
Lines 6165 6165
=======================================
Hits 5786 5786
Misses 379 379
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This looks great! Thanks for putting this together! |
|
I want to bring this to attention, the maintainers of GDAL are currently discussing restricting AI/LLM usage: OSGeo/gdal#14500 Also worth of checking arguments in gdal-dev list: https://lists.osgeo.org/pipermail/gdal-dev/2026-May/061592.html. Particular interesting example of AI introducing bloated code, which by reviewing from experienced maintainer wasn't captured:
There is a nice overview of different AI policies over at https://github.com/melissawm/open-source-ai-contribution-policies. A nice blog post from Stéfan van der Walt covers many aspects of the ongoing discussions https://blog.scientific-python.org/scientific-python/community-considerations-around-ai/. A main concern in many of the above is on copyright issues. We do not have anything mentioning copyright or code ownership in the suggested policy. Worth discussing how we should go forward. |
|
This is an interesting and difficult topic which redefines our work. Here are a few comments:
|
|
As mentioned before, GDAL, a cornerstone project in geospatial (MIT license), just adapted their AI policy, restricting AI usage to the minimum. We should be very careful in adopting AI usage, especially for the following points:
Can we legally prove that a third party doesn't actually own LLM generated code which is about to be merged into our code base? Is it possible at all to check if LLM generated code is not violating any licenses? Who owns the LLM generated code, if authorship requires human creation? |
|
I created an experimental PR using AI for both implementation and review to help the discussion (#383). |
|
@kmuehlbauer I don't think there is any practical way to enforce copyrights outside of requiring that the human submitting the PR attest thath they have not violated copyright (which anyone can just say "no"). |
|
@rcjackson Yes, sure, we are trusting each contributor that the submitted code is of their own origin and/or can be ingested legally. But, given that we know LLM's have been trained on FOSS and any other available code, how can the human submitting the PR attest that anyhow? How should the human know, if the LLM reproduced a copyrighted part of training data? We would be shifting our trust in humans into trusting black boxes. Who is the author of that particular generated code? Is it the human submitter, or does it belong to the public domain? There are so many unanswered questions with regard to LLM generated code, that I find it at least difficult if not impossible to safely use it. |
|
@kmuehlbauer there is no guarantee that a human-only PR also does not violate copyright standards either. If you are going to apply that standard to any PR. A human can also do the same thing, and often does learn from FOSS software. If you apply your standards, then no PR can ever be trusted, human or AI. |
|
@rcjackson Thanks, that's all valid. And, true, there is no such guarantee. A human contributor may be imperfect, but they are still a legible and accountable source. An LLM is not. If we treat both as equivalent, we are not being consistent, IMHO. The only escape hatch is to trust the human submitter of LLM generated code as if it would be entirely their own code. Going that path still does not account for any future legal developments. Yes, maybe I'm too cautious and unsure about it. But, obviously others feel the same if we look how the different AI policies are evolving across projects and organizations. There is still no real consensus on how to properly classify or govern LLM generated contributions. That lack of consensus itself is already a signal that the problem is not settled at all. I'm with those who converge on caution rather than assuming equivalence between human and LLM generated code. Would be great to hear more opinions. Should we be more strict or more permissive here? What concrete criteria would make LLM assisted contributions clearly acceptable without weakening our expectations around authorship, accountability, and provenance? |
Summary
Opens for community discussion an AI Usage Policy for xradar, adapted from xarray's AI Usage Policy with two xradar-specific additions.
This is a draft so the whole community can chime in before we merge. Continues the discussion started in #354.
What's in the draft
docs/ai_policy.md— full policy (attribution to xarray in the preamble).CONTRIBUTING.md— short pointer section between "Types of Contributions" and "Get Started!" linking to the full doc.docs/index.md— newai_policyentry in the toctree.docs/history.md— Development entry (PR-number placeholder to update once this lands).Structure
The policy mirrors xarray's structure so folks coming from that ecosystem find it familiar:
pyproject.toml/environment.yml/pre-commitconfig, and security-sensitive areas require an issue-first discussion; AI is not a reliable guide to the security or maintenance implications of a new dependency. Raised by @zssherman in MNT: Improve Contributing Guide #354.Discussion points
This is a policy document — language matters more than usual. Feedback especially welcome on:
docs/ai_policy.mdor somewhere else (e.g. a newdocs/contribute/subfolder to mirror xarray's layout).CC: @kmuehlbauer @syedhamidali @mgrover1 @egouden @zssherman @scollis @rcjackson @jrobrien91
Test plan
ai_policyin the toctree (cd docs && make html).