BUG Fall back to pickle for object-dtype numpy arrays#93
Merged
Conversation
np.save and np.lib.format.write_array cannot serialize arrays with object dtype; skip the numpy-specific path for those so they fall back to pickle.
Owner
|
Thank you! Much appreciated. I added a test for this for future-proofing and will do a minor release asap with the fix |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I think Jug's file store has mismatched write and read paths for
numpyobject arrays. The write usesnp.lib.format.write_arrayand then the read tries to usenp.lib.format.read_arraywhich, since numpy 1.16.3, has hadallow_pickle=False.Reproduction
You have to pass
--aggressive-unloadto force Jug to evict each completed task's result frommemory so downstream tasks must reload from disk. (The same failure occurs
without the flag whenever two workers run concurrently, because worker B never
holds worker A's in-memory result.)
Fix
Change the writer: detect
dtype=objectat write time and routethrough
encode_to(the zlib+pickle path) so the existingdecode_fromfallback works correctly.
I am open to other fixes also this just seemed the most obvious (if a touch inelegant).
Let me know what you think--happy to do something different if you think it would be better.