A complete, tested branch implementing the proposal below is ready at https://github.com/fabio-rovai/causal-perception-implementation/tree/pn-ps-identification-bounds (15 passing tests, additive and opt-in). I can open it as a PR whenever you prefer.
distances.py compares marginals, so cross-world counterfactual coupling is invisible: add PN/PS + Fréchet bounds (opt-in)?
Hi, and thanks for open-sourcing this. I have been reading through the causal
perception implementation and I think I have spotted a subtle but important
identification gap, and I would like to check whether you would welcome a small
opt-in PR before I send one.
What I think is happening
distances.py (W2, KL, TV) takes two 1D sample arrays, and
perception.run_perception feeds it the per-individual outcome-probability
vectors as 1D marginals. The cross-world joint P(Y_0, Y_1) of the binary outcome
is never formed. On top of that, LinearANM.abduct explicitly does no noise
abduction for Y (the comment says the counterfactual probability is computed from
the classifier on counterfactual parents). So the cross-world coupling of the
binary outcome is not pinned down by anything in the pipeline, and any two SCMs
that share the two interventional marginals but differ in how they couple the two
worlds will look identical to W2/KL/TV.
A small witness
I ran a quick check with two binary potential-outcome models that share their
marginals exactly (R0 = 0.5, R1 = 0.7):
- monotone coupling, p11 = P(Y_0=1, Y_1=1) = 0.50 -> PN = P(Y_0=0 | Y_1=1) = 0.286
- independent outcomes, p11 = 0.35 -> PN = 0.500
compute_all_distances on the 1D marginals reads ~0 for W2, KL and TV in both
cases (the marginals are identical), but the probability of necessity separates
the two models by about 0.214. The marginal distances are blind to exactly that.
This is not a bug in the distances, it is an identification fact: with only the
two marginals and no abducted outcome noise, P(Y_0, Y_1) is only Fréchet-bounded.
In the fair-credit framing this matters, because a point counterfactual on a
protected attribute quietly hides an interval.
Proposal
Would you welcome a small, additive, opt-in PR that:
- reports PN and PS alongside their sharp Fréchet identification bounds from the
two marginals (with the assumption stated explicitly in the docstrings),
- names the two endpoint couplings (monotone and independent) as point estimates
inside the bounds, and
- adds a
run_* script and tests, including the witness above,
with zero change to any existing module, output or default? I would keep it
entirely separate from the current distance pipeline so nothing you rely on
moves.
Happy to sign the CLA. If this is useful I will open the PR; if you would rather
shape it differently first, I am glad to discuss here.
A complete, tested branch implementing the proposal below is ready at https://github.com/fabio-rovai/causal-perception-implementation/tree/pn-ps-identification-bounds (15 passing tests, additive and opt-in). I can open it as a PR whenever you prefer.
distances.py compares marginals, so cross-world counterfactual coupling is invisible: add PN/PS + Fréchet bounds (opt-in)?
Hi, and thanks for open-sourcing this. I have been reading through the causal
perception implementation and I think I have spotted a subtle but important
identification gap, and I would like to check whether you would welcome a small
opt-in PR before I send one.
What I think is happening
distances.py(W2, KL, TV) takes two 1D sample arrays, andperception.run_perceptionfeeds it the per-individual outcome-probabilityvectors as 1D marginals. The cross-world joint P(Y_0, Y_1) of the binary outcome
is never formed. On top of that,
LinearANM.abductexplicitly does no noiseabduction for Y (the comment says the counterfactual probability is computed from
the classifier on counterfactual parents). So the cross-world coupling of the
binary outcome is not pinned down by anything in the pipeline, and any two SCMs
that share the two interventional marginals but differ in how they couple the two
worlds will look identical to W2/KL/TV.
A small witness
I ran a quick check with two binary potential-outcome models that share their
marginals exactly (R0 = 0.5, R1 = 0.7):
compute_all_distanceson the 1D marginals reads ~0 for W2, KL and TV in bothcases (the marginals are identical), but the probability of necessity separates
the two models by about 0.214. The marginal distances are blind to exactly that.
This is not a bug in the distances, it is an identification fact: with only the
two marginals and no abducted outcome noise, P(Y_0, Y_1) is only Fréchet-bounded.
In the fair-credit framing this matters, because a point counterfactual on a
protected attribute quietly hides an interval.
Proposal
Would you welcome a small, additive, opt-in PR that:
two marginals (with the assumption stated explicitly in the docstrings),
inside the bounds, and
run_*script and tests, including the witness above,with zero change to any existing module, output or default? I would keep it
entirely separate from the current distance pipeline so nothing you rely on
moves.
Happy to sign the CLA. If this is useful I will open the PR; if you would rather
shape it differently first, I am glad to discuss here.