Skip to content

BUG Fall back to pickle for object-dtype numpy arrays#93

Merged
luispedro merged 1 commit into
luispedro:mainfrom
justinrporter:main
Jun 17, 2026
Merged

BUG Fall back to pickle for object-dtype numpy arrays#93
luispedro merged 1 commit into
luispedro:mainfrom
justinrporter:main

Conversation

@justinrporter

Copy link
Copy Markdown
Contributor

I think Jug's file store has mismatched write and read paths for numpy object arrays. The write uses np.lib.format.write_array and then the read tries to use np.lib.format.read_array which, since numpy 1.16.3, has had allow_pickle=False.

Reproduction

import jug
import numpy as np

@jug.TaskGenerator
def make_feature_names():
    return np.array(['foo', 'bar', 'baz'], dtype=object)

@jug.TaskGenerator
def use_names(names):
    return len(names)

names = make_feature_names()
result = use_names(names)
$ jug execute --aggressive-unload mwe_bug.py
[...]
CRITICAL Exception while running mwe_bug.use_names:
  Error -3 while decompressing data: incorrect header check

You have to pass --aggressive-unload to force Jug to evict each completed task's result from
memory so downstream tasks must reload from disk. (The same failure occurs
without the flag whenever two workers run concurrently, because worker B never
holds worker A's in-memory result.)

Fix

Change the writer: detect dtype=object at write time and route
through encode_to (the zlib+pickle path) so the existing decode_from
fallback works correctly.

if not compress_numpy and type(value) is np.ndarray and object.dtype.kind != 'O':
    np.lib.format.write_array(output, value)
    return

I am open to other fixes also this just seemed the most obvious (if a touch inelegant).

Let me know what you think--happy to do something different if you think it would be better.

np.save and np.lib.format.write_array cannot serialize arrays with
object dtype; skip the numpy-specific path for those so they fall
back to pickle.
luispedro added a commit that referenced this pull request Jun 17, 2026
luispedro added a commit that referenced this pull request Jun 17, 2026
@luispedro luispedro merged commit 7ed1b3b into luispedro:main Jun 17, 2026
7 checks passed
@luispedro

Copy link
Copy Markdown
Owner

Thank you! Much appreciated. I added a test for this for future-proofing and will do a minor release asap with the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants