Skip to content

Fix mle ValueError on count tables with spaces in sgRNA/gene names (#7)#8

Merged
davidliwei merged 1 commit into
mainfrom
fix/mle-count-table-tab-split
Jun 17, 2026
Merged

Fix mle ValueError on count tables with spaces in sgRNA/gene names (#7)#8
davidliwei merged 1 commit into
mainfrom
fix/mle-count-table-tab-split

Conversation

@davidliwei

Copy link
Copy Markdown
Owner

Fixes #7.

Problem

mageck2 mle crashes with a cryptic numpy error on a count table that mageck2 test (RRA) parses fine:

ValueError: setting an array element with a sequence. The requested array
has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

from np.matrix(ginst.nb_count) in read_gene_from_file.

Root cause

read_gene_from_file split count-table lines on generic whitespace (line.strip().split()), while the rest of mageck2 splits on tab (e.g. mageckCount.py, splitter='\t').

When an sgRNA or gene name contains a space — common for control entries such as Non-Targeting Control in a with-control library — the whitespace split shifts the count columns. A non-numeric token (Control) is then read as a count; float() raised ValueError, which was silently swallowed without appending. That left one sample's list one element short, making nb_count ragged and crashing later in np.matrix(). RRA was unaffected because it splits on tab.

Reproduced with a minimal table containing a spaced gene name; the real read_gene_from_file now parses it correctly.

Changes

  • Split non-CSV count tables on \t, consistent with the rest of mageck2 (the actual fix).
  • Fail loudly with line/column/value context on an unparseable count, instead of silently producing a ragged matrix and an opaque downstream crash.

🤖 Generated with Claude Code

read_gene_from_file split count-table lines on generic whitespace, while
the rest of mageck2 (e.g. mageckCount.py) splits on tab. When an sgRNA or
gene name contains a space -- common for control entries such as
"Non-Targeting Control" in a with-control library -- the whitespace split
shifts the count columns, a non-numeric token is read as a count, and the
ValueError was silently swallowed without appending. That left nb_count
ragged and later crashed in np.matrix() with an opaque "inhomogeneous
shape" error. RRA (mageck2 test) was unaffected because it splits on tab.

- Split non-CSV count tables on '\t', consistent with the rest of mageck2.
- Fail loudly with line/column/value context on an unparseable count
  instead of silently producing a ragged matrix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@davidliwei davidliwei merged commit 21698ea into main Jun 17, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValueError when running mle

1 participant