Switch Tensor to use ArcArray instead of Array#16256
Conversation
|
One or more of the following people are relevant to this code:
|
Coverage Report for CI Build 26910624413Coverage decreased (-0.006%) to 87.465%Details
Uncovered Changes
Coverage Regressions12 previously-covered lines in 6 files lost coverage.
Coverage Stats
💛 - Coveralls |
mtreinish
left a comment
There was a problem hiding this comment.
I'm onboard with moving to ArcArray here, but I just have one comment on the ndarray API usage. I think a lot of the into_shared() (if not all of them, I didn't check every individual call) would be better suited as an .into() to move the owned array's data into the arc array instead of copying it.
| DType::F64 => Tensor::F64($arr.mapv(|x: $src| x as f64)), | ||
| DType::C64 => Tensor::C64($arr.mapv(|x: $src| Complex32::new(x as f32, 0.0))), | ||
| DType::C128 => Tensor::C128($arr.mapv(|x: $src| Complex64::new(x as f64, 0.0))), | ||
| DType::Bit => Tensor::Bit($arr.mapv(|x: $src| x as u8).into_shared()), |
There was a problem hiding this comment.
Does this need to be into_shared(), I think we could use into() instead? Looking at the docs on this method: https://docs.rs/ndarray/latest/ndarray/type.ArcArray.html#method.into_shared it seems to indicate it will potentially clone the data again. We just made a new copy with mapv() so we should be able to just run .into() to just move it into an Arc. This applies a bunch of places I think, basically anywhere we construct a new owned array for the sole purpose of creating an ArcArray where we don't need to preserve the the original owned array.
There was a problem hiding this comment.
Since into_shared() is part of BaseArray, it needs to cover Array, ArcArray, and CowArray. The only place where it clones the underlying data is when called on a CowArray that does not own its data. This is true even if the data is strided or otherwise not contiguous in an array view.
The docs do recommend using into() instead of into_shared() when writing a function that's generic over ArcArray and Array: this is because into_shared demands Clone (because of the CowArray implementation) whereas into does not require Clone in this case when going into an ArcArray.
This is all to say: I'm pretty sure all of the into_shared in this PR are fine and can't cause data copies. I just checked, and all of them can be turned into into(). So I think this comes down to style. Do we want to prefer into() when the type inference is available instead of into_shared() as a blanket convention so that we're a bit more on our toes about the CowArray copy case?
There was a problem hiding this comment.
Yep, you're right. Looking deeper at the ndarray code, the OwnedArray DataOwned::into_shared implementation just used ArcArray::from internally:
so it's doing the same thing as into(). I prefer into_shared() as I think it's more clear (just as I had that nit comment about to_vec() on the previous PR), I only left the comment because of the docs warning. Lets leave it as is.
I think it's unlikely we'll deal with a lot of CowArray objects if we're using ArcArray already here but it's good to cover that use case just in case.
mtreinish
left a comment
There was a problem hiding this comment.
Before I enqueue I just had one small question
| use thiserror::Error; | ||
|
|
||
| /// Dynamic-dimensional [`ArcArray`]; the storage type for every [`Tensor`] variant. | ||
| type ArcArrayD<T> = ArcArray<T, IxDyn>; |
There was a problem hiding this comment.
Actually why do we need this type? Isn't it in ndarray already? https://docs.rs/ndarray/latest/ndarray/type.ArcArrayD.html
There was a problem hiding this comment.
Ah, looks like it was introduced in ndarary 0.17, but we're on 0.16. Can I upgrade to that version?
There was a problem hiding this comment.
We're stuck on 0.16 for right now because of rustworkx-core IIRC. We get arrays from rustworkx-core (like adjacency and distance matrices) so the version we use internally for Qiskit has to match the version rustworkx-core uses which is unfortunately pinned to 0.16 right now. It's something we want to fix for the next rustworkx-core release
The motivation for this change in what's stored by a tensor is to allow a cheap cloning mechanism that, in particular, a QuantumProgram can exploit. Since some arrays it deals with may be large, we don't want to force it to clone entire arrays when it needs to do things like fan out a tensor to multiple descendent nodes when evaluating the graph.
See this one commit for the diff being introduced here: 257cd56
PR Stack
AI/LLM disclosure