Skip to content

Doc fix for transpose#521

Merged
sghelichkhani merged 1 commit into
mainfrom
sghelichkhani/richards-core-01-equation-infra
May 28, 2026
Merged

Doc fix for transpose#521
sghelichkhani merged 1 commit into
mainfrom
sghelichkhani/richards-core-01-equation-infra

Conversation

@sghelichkhani

@sghelichkhani sghelichkhani commented May 22, 2026

Copy link
Copy Markdown
Collaborator

First of five PRs decomposing the Richards-equation foundation work (see ANATOMY-BRANCHES.md on sghelichkhani/richards-core for the full split). This is the equation-infrastructure piece; nothing Richards-specific lands here.

Equation gains two reserved eq_attrs (solution and nonlinear_coefficients) plus a new resolve_coefficient method that substitutes the equation's solution by the current trial via ufl.replace. Term implementations call resolve_coefficient instead of reading the attribute directly, so a single residual term can serve both linear and nonlinear coefficients (the substitution is a no-op when the expression does not contain the solution).

scalar_equation.diffusion_term is migrated to use the new mechanism. Existing callers that pass a solution-independent diffusivity see no behavioural change. New callers (e.g. Richards' K(h) in the follow-up PR) can declare the coefficient as a UFL expression in the equation's solution and list its attribute name in nonlinear_coefficients. The Equation constructor validates that the named attribute is a UFL expression actually containing solution, catching the common foot-gun of building the coefficient against a different Function.

About 70 lines across two files.

Comment thread gadopt/equations.py Outdated
"""
val = getattr(self, name)
solution = getattr(self, "solution", None)
if solution is None or not isinstance(val, ufl.core.expr.Expr):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you not check whether val is in "nonlinear_coefficients"? e.g. this may be the diffusion term asking for the diffusivity but it's not used as a non-linear coefficient, but we are using a different non-linear coefficient elsewhere in the equation. In that case solution is not None and so it will try to replace solution; if we simply wanted to lag diffusivity (hence it's not non-linear but does depend on solution) that would do the wrong thing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now fully addressed in the redesign. resolution is now gated on membership in nonlinear_coefficients, not on solution is not None, so a coefficient that merely happens to depend on the solution is never substituted unless you explicitly declare it

@stephankramer

Copy link
Copy Markdown
Collaborator

So this is tricky business with implications for future extensions that I'd like to consider. So we're trying to deal with coefficients that depend on the solution, making the term nonlinear. Of course, the distinction between coefficients and the actual term is a bit arbitrary, e.g. you could implement a specific nonlinear diffusion term where the solution-dependent diffusion is just hard-coded as function of trial - but that's of course not very flexible. Other example we could use the source_term, declare the "source" coefficient to be a nonlinear coefficient source=-alpha*solution - and all of a sudden we have a sink-term. However, it's not completely arbitrary: one important restriction is that the coefficient should be point-wise expressions, i.e. they should not attempt to have some kind of spatial derivative in them (unless you know what you're doing). So we define a (non-conservative) advection term as $$u\cdot\nabla c$$ where $$c$$ solution, and $$u$$ is a vector that optionally also depends on the solution: $$u(c)$$ - or you could implement an even more general "advection-like" term $$u(c)\cdot\nabla f(c)$$ - and the term implementation takes care of spatial discretisation things like integrating by parts, jump terms etc. The other consideration, which is a little more subtle and subjective, of distinguishing the trial function as it occurs explicitly in the term and as it occurs through solution-dependent coefficients - is because we don't always want to solve the time-discretised equation in its full non-linear form: a very common approach is to lag the solution in what is thought of as the coefficients, e.g.:
$$\frac{c^{n+1} - c^n}{\Delta t} + u(c^\ast) \cdot \nabla c^{n+1} - \kappa(c^\ast) \nabla^2 c^{n+1} = 0$$
where $$c^\ast$$ is a previous value (e.g. $$u^n$$ or you could iterate this in a Picard iteration). So this is why in our previous version of the equations (based on the Thetis code) has these u_lagged arguments. So you linearize the equations (in a physics dependent way) around the previous timestep: you're still solving the nonlinear PDE, except after time-discretisation it becomes linear. For the shallow water equations you typically do

$$\begin{aligned} \frac{u^{n+1} - u^n}{\Delta t} + u^\ast\cdot\nabla u^{n+1} + g\eta^{n+1} = 0 \\\ \frac{\eta^{n+1} - \eta^n}{\Delta t} + \nabla\cdot\left((b+\eta^*)u^{n+1}\right) = 0 \end{aligned}$$

Note that in the $$\nabla\cdot(Hu)$$ term we use $$u^{n+1}$$ so we need the trial of $$u$$ even though it's the equation (with the time derivative) in $$\eta$$. So things get a bit more complicated when we solve the equations coupled.

Sorry this a bit of blurb, just wanted to think ahead a bit - but I think your solution should of using ufl.replace works nicely for all of that. So I think the assumption should be that all terms are linear in trial (as far as trial appears explicitly in the formula) as they are now, and then non-linearity may enter through "non-linear coefficients" .

Two points I like a little less:

  1. The way the solution and nonlinear_coefficients are stuffed as "special" entries in the eq_attrs dict. Yes we can stuff all arguments in a dict, instead of having separate arguments, doesn't mean we should :-). I feel the existing entries in eq_attrs are very different things, they are things we want to pass on directly to the terms - whereas solution and nonlinear_coefficients is data that is used by the equation itself, i.e. we don't want to see eq.solution being used in a term directly.
  2. Your mechanism of declaring nonlinear coefficients relies on the term that uses it "to do the right thing", i.e. use kappa = eq.resolve_coefficient("diffusivity", trial) instead of kappa = eq.diffusivity. If the term doesn't, it silently does the wrong thing.

One way to deal with 2. is to only populate those equation attributes that are not non-linear, i.e. if you declare diffusivity to be non-linear, and a term tries to access eq.diffusivity you get an AttributeError. But that also suggests a different solution for 1.: have eq_attrs be a dictionary of things that are directly and unaltered passed on to the terms (by being set as attributes on eq) and have a separate dictionary argument nonlinear_coefficients that contain the coefficient expressions that need replacing solution with trial, and another argument for the solution (both of those optional, except you need to provide either both or neither).

I know this might be a pain, if you want to expose this directly to the user you'd have to pass these through to the solver interfaces - but tbh I think it might be nicer to hide this from the user. This is ultimately a time-discretisation choice. So one way one to do this is to say the user provides the expressions for the various coefficients to the solver, which may or may not depend on solution - and then the solver decides whether or not these expressions are passed on as non_linear coefficients or as standard eq_attrs. This also limits the number of changes if you just want to do it for one solver at the moment.

@w3168

w3168 commented May 26, 2026

Copy link
Copy Markdown
Collaborator

firedrakeproject/firedrake#5119 Is this PR relevant here?

@stephankramer

Copy link
Copy Markdown
Collaborator

Not directly, but it is a very interesting PR - and a further example indeed of what you could do with lagging different terms/coefficients.

@sghelichkhani

Copy link
Copy Markdown
Collaborator Author

Thanks @stephankramer . Excellent info. Let me first play back what I understood, so we're sure we're talking about the same thing.

If I read you correctly, you're happy with using ufl.replace, and the mental model should be that every term is linear in the trial as the trial appears explicitly in the term, with all the nonlinearity entering through the coefficients. You also want two guard rails on that: coefficients have to be pointwise functions of the solution so no spatial derivatives hiding inside them, and second the substitution target shouldn't be hard-wired to the trial, because a lot of the time we actually want to lag a coefficient to a previous value so that the time-discretised equation comes out linear. (The shallow water example is interesting, since Liam has already had a go at shallow water stuff to couple with groundwater)

Now here is the part that I think you are unhappy with: that I've smuggled solution and nonlinear_coefficients into eq_attrs, second, that the whole thing relies on every term author remembering to call resolve_coefficient instead of reading eq.diffusivity directly, and if they forget there's no error, just a quietly wrong answer. Third, that whether a coefficient is treated as nonlinear or lagged is a time-discretisation decision that the user setting up the physics shouldn't have to know about, so it ought to be hidden inside the solver (so basically somehow exposing this in the equation).

If thats a fair summary, how about we:

  1. pull solution, nonlinear_coefficients and a lagged_values dict out of eq_attrs and make them proper keyword arguments on Equation, all defaulting to nothing. eq_attrs goes back to meaning exactly what it meant before and the reserved-name hack disappears, so we're actually a bit closer to the current main than this PR is.
  2. lagged_values is the per-coefficient table: a coefficient named there resolves against its lagged Function, anything else resolves against the trial. That covers the shallow water case directly.
  3. For the silent-failure problem: How about we make it impossible rather than just loud. The idea is to stop terms ever touching a raw unresolved coefficient. Ther decision moves into Equation.residual, which substitutes each coefficient against its target up front and then hands the terms a thin view of the equation where reading eq.diffusivity already gives you the substituted expression. So the term just writes the natural thing and gets the right answer, and there's no second method to forget. The nice side effect is that the change to diffusion_term goes away entirely, it goes back to kappa = eq.diffusivity, and when there are no nonlinear coefficients residual takes an early exit straight to the existing code path, so every other solver is byte-for-byte unchanged and doesn't need a single edit.

That last bit matters to me too, I didn't want this to be something every solver has to grow an argument for.

On the pointwise guarantee, I'd add a check at construction that walks the coefficient expression and rejects it if it finds a spatial derivative or a facet restriction sitting on top of anything that contains the solution.
The user never sees any of this. They hand the solver the coefficient expressions, some of which happen to depend on the solution, and the solver decides what to register as nonlinear and what to lag. So the time-discretisation choice lives where you wanted it.

One thing I'd like your read on: for the mixed/coupled case, do we substitute the whole mixed solution or the indexed sub-component? It doesn't bite for the scalar Richards case I need right now, but still ..

@stephankramer

Copy link
Copy Markdown
Collaborator

I think your summary is accurate. Maybe I've come on a bit too strong about my "mental model" - I didn't necessarily ask for guardrails, I just wanted to sketch how I see these terms - also my view is of course not the only valid way! So it breaks down in two things:

  1. Terms typically have a "canonical" linearisation where it's only explicitly dependent on the solution (trial) in one place, and the other factors are thought of as coefficients which may or may not depend on trial as well - making the term non-linear in the former case. This distinction between explicit trial and coefficient is arbitrary, and I could well imagine you would have to different implementations of the same term, each with a different decomposition, e.g.:
def div_term1(eq, trial, test):  # trial is velocity u
     return div(eq.rho * trial) * dx
     
def div_term2(eq, trial, test): # trial is scalar rho
     return div(trial * eq.velocity) * dx

both implementing $$\nabla \cdot(\rho u)$$. Which one is more appropriate typically depends on physics, depending on whether $$(\nabla \rho) \cdot u$$ or $$\rho \nabla\cdot u$$ is more important. So you could use a different version in different equations. Anyway, it seems that we can make such a semi-linearisation in all terms we've implemented so far, but this of course does not preclude the possibility of a term that doesn't have an obvious decomposition, e.q. sqrt(dot(grad(u), grad(u)))*dx. So I'm not advocating this as a hard rule. Just that, if there is a "canonical semi-linearisation" that's how we should do it.
2. I said that coefficients should be point-wise and that gradients etc. should be handled in the terms explicitly. I think I stand by my claim of coefficients have to be point-wise but maybe my claim that there should be no gradients in the coefficient expressions is too strong. For example, if u is a continuous P1 field, then grad(u) is a perfectly valid, point-wise expression, it just happens to be a P0 expression. That is in fact exactly what we do with strain-rate dependent viscosity, we form it as point-wise (but piece-wise discontinous) expression and pass it as a coefficient to the viscosity term (well actually we pass it through approximation, see below). So I think the checks for not having gradients are a bit too much. Maybe we could check for facet restrictions and such, but then again this feels a bit out-of-scope for this PR: these sorts of rules don't really have to do with coefficients being non-linear or not.

The other potential slight mis-communication is that I didn't necessarily advocate for there to be functionality where the non-linear coefficients are replaced by something other than the trial function provided to residual. In the lagging case examples above, which are indeed a strong motivation for me, I was just thinking that if you are lagging diffusivity with $$\kappa(c^\ast)$$ you just provide it as a normal coefficient in eq_attrs. I mean if you want the equation to replace solution with something else (say c_star) but you have to already provide that "something else" when the equation is created, then surely whoever creates that equation could have done the substitution beforehand. Unless I'm missing something, I just hope we can make things a bit simpler if we don't add this functionality - i.e. if it is a non-linear coefficient always replace solution with trial.

So having that out of the way, I like the idea of doing the substitution before we call the terms - leaving the access of coefficients inside the terms exactly as they were - otherwise we indeed would have to be calling the resolve machinery everywhere, as potentially any coefficient could be declared as non-linear, there basically wouldn't be any "normal" coefficient left if we were to be consistent. So that's great. I do think this exposes something I never quite liked in the design of the Equation class, that it serves two purposes: 1. Something that represents the equation as a whole, that does prep-work for the equation as a whole, contains the terms of the equation 2. a bucket of things we want to pass on to the term. Clearly none of the things I mentioned under 1. should be accessed by an individual term. This sort of sandwiching of classes, where class A contains objects of class B that can also access class A, leads to very hard follow data flow (class B here are the term, which we implement as functions, except they really aren't!). Also clearly Equation is not a dataclass. It only is in its second role and even there it's overkill. So I would advocate....hm actually this is a bit out-of-scope as it requires some more change, what I was going to say: I would have Equation as a normal class with a normal __init__, which builds a dictionary of things it wants to pass on to the terms, then in its residual() method it makes a copy of that dictionary in which it substitutes solution for all entries that have been declared non-linear, and then it passes that on as an AttrDict to the terms, i.e. terms don't get to see the equation at all. Anyway that requires some more discussion in another PR.

Finally, approximation I think that may also be a design flaw if we want to generalize things further down the line. The trouble is we now have two different ways of passing on coefficient expresssions, directly in eq_attrs or via approximation, and from a term's perspective it's really not clear which one it should be. We now have viscosity term that does eq.approximation.viscosity and a diffusion term that does eq.diffusivity. As I started argueing above, in both cases we need a mechanism to make them "nonlinear". For the moment we are relying on the fact that the only case we have non-linear viscosity is in "steady state" Stokes so we now beforehand what solution we are solving for. If that viscosity term appears in a time-dependent equation (@w3168 ?) which is time-discretised with some multi-stage RK, it will need its non-linear viscosity substituing with the trial that corresponds to the different RK stages. Reversely, I could see cases appear where it would make sense for the approximation to construct the diffusivity term. Finally passing the approximation down to the term means that the term has to make all sorts of assumptions about what's available on that approximation. So I think the solution here is that we should not pass approximations down to the term. Terms should be defined more abstractly in a mathematical form that does not depend on the specific physics we have in mind. That means it should be the job of the Solver, which is the layer that knows about the physics as well as the equation it wants to solve, to grab the viscosity off the approximation and pass it down to Equation, who passes it down to a term, who then directly accesses eq.viscosity. Anyway, again this is out-of-scope for this PR.

Final, final point, on the coupled case with mixed-functions. Yeah, we need to think a bit more about this. Maybe there needs to be something like a CoupledEquation which has its own residual() where the trial that you pass in is the entire mixed Function, which is then split and the components are passed into the individual terms. However, that requires the CoupledEquation to know which component goes into which term. Say you have the momentum equation in NS or SWE, which has an advection term that needs u as the trial, and a PressureGradient term that needs p. At the moment in Stokes in the pressure_gradient_term we just pass in eq.p and ignore trial - but that's not going to be right for time-dependent, multi-stage RK.

@stephankramer

Copy link
Copy Markdown
Collaborator

Let me know what you think about the 2 things I think aren't needed (checking for gradients and substitution with something else than trial) cause that may simplify the changes quite a bit, and then I can have another look.

Comment thread gadopt/equations.py Outdated
_SPATIAL_DERIV = (Grad, Div, NablaGrad, NablaDiv, Curl, ReferenceGrad, ReferenceDiv)


def _expr_contains(expr: ufl.core.expr.Expr, source: Any) -> bool:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a depends_on() in utility.py. Can we not use that?

Comment thread gadopt/equations.py Outdated
name: self._resolve(name, trial, term_key)
for name in self._nonlinear_coeffs
}
forms.append(term(_BoundEquation(self, resolved), trial))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a new _BoundEquation for every term, even though they are all the same.

@stephankramer

Copy link
Copy Markdown
Collaborator

Actually, scratch all that. My "mental model" was based on the idea that we call residual() with diffferent linear combinations for the trial argument representing the different RK stages - which is what we used to do. However, now we simply construct a single UFL expression F that we then pass on to Irksome which will do a massive replace on all of F to replace solution with the time-staged solution and thus construct the UFL expression for the different stages. So now I'm confused why we need this PR at all? We already know beforehand which argument is going to be used for trial when residual() is called, namely the same solution. So why can't you just use that in the diffusivity expression you provide and IRKSome will do the right thing?

If that's right, and I'm not overlooking something then a) we don't need any of the stuff we discussed here and b) we could (in a future PR) make things even simpler in the terms if we scratch the trial argument, e.g. the scalar advection and diffusion terms would just use eq.scalar instead of trial, i.e. everything is a coefficient - and the code that creates the equation can do whatever linearisation, lagging, whatever by filling in what they want for the coefficients (solution dependent or not).

@sghelichkhani sghelichkhani force-pushed the sghelichkhani/richards-core-01-equation-infra branch from ad25749 to 180ec15 Compare May 28, 2026 05:35
@w3168

w3168 commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Hi @stephankramer thanks for looking through all this! I've been trying to follow on because I suggested to @sghelichkhani here #508 to try and simplify the PR by using the existing diffusion term in scalar_equation.py instead of reimplementing similar code (especially the sipg stuff) in richardson.py.

I think this PR came about because Sia wasn't getting the same results when using the nonlinear Richards equation which has a diffusivity that depends on the solution...

Judging by your last message @stephankramer I think the above should just work...

@sghelichkhani was there an error message or was it just giving different results when you tried swapping the Richards diffusion term to this one...?

@sghelichkhani sghelichkhani force-pushed the sghelichkhani/richards-core-01-equation-infra branch from 180ec15 to 9d137c0 Compare May 28, 2026 06:27
@sghelichkhani

Copy link
Copy Markdown
Collaborator Author

Yes! Thanks @stephankramer and @w3168 . This review as annoying as it was for me ended up being the most effective one!!!

To be honest I spent the past few hours struggling to find out what the issue was ... A(t this point it might be the same confusion mentioned by @stephankramer . My thinking was that diffusivity(solution) is not the same as diffusivity(trial). Because self.solution would be to lag diffusivity. BUT I hope/think Irksome here deals with this issue by recognising the solution is the same as trial and would fix this! Right?

@sghelichkhani

Copy link
Copy Markdown
Collaborator Author

Yeah I think as long as K is defined with the same solution passed on to Irksome's timestepper, irksome/tools.py:122-125 does the correct thing by replacing every instances with the stage solution. Very smart!!!

@sghelichkhani

Copy link
Copy Markdown
Collaborator Author

So as much as this was a big pain, thanks @w3168 for suggesting to rewrite this :-D

@sghelichkhani

Copy link
Copy Markdown
Collaborator Author

So this PR is just a doc fix! Feel free to approve and I merge!

@sghelichkhani sghelichkhani changed the title Add nonlinear_coefficients mechanism for solution-dependent diffusivity Doc fix for transpose May 28, 2026
@sghelichkhani sghelichkhani enabled auto-merge (squash) May 28, 2026 06:56
@stephankramer

Copy link
Copy Markdown
Collaborator

BUT I hope/think Irksome here deals with this issue by recognising the solution is the same as trial and would fix this! Right?

To be precise, the point is that because we only call residual once, we know what value we provide to the trial argument and that value is the firedrake Function that we also declare to IRKSome to be the solution Function. As long as you construct your diffusivity with the same solution (typically this is the same solution Function the user sees as well), then after calling residual() we simply have one big UFL form with solution being used in the right places - so there's nothing for IRKSome to fix or recognize as special, it does what it always does: replace solution with the intermediate solution expression for the different solver stages in RK, say $$x^{(n)} + \sum_j^i a_{ij} k_j$$ for DIRK at level $i$ where we are solving for $k_i$.

This also means that we (for now) don't have to worry about some coefficients coming differently via approximation, and viscosity should also just work with IRKSome. In the future it probably would be more robust if we get rid of the trial argument, as now we rely on "knowing what value is going to be provided for it".

Final point for future reference: if you want to do lagging in the simplest way, which is to use the values currently stored in the solution Function, you simply have to make a copy of solution and use that in the expression for the coefficient - otherwise of course IRKSome will come in and treat it as "the solution to be solved for".

@stephankramer stephankramer left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just like to point out the irony of the fact that after this long journey, with AI-assisted code, all we end up with is a single bloody em-dash! :-)

@sghelichkhani sghelichkhani merged commit 8a770d4 into main May 28, 2026
5 checks passed
@sghelichkhani sghelichkhani deleted the sghelichkhani/richards-core-01-equation-infra branch May 28, 2026 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants