Skip to content

feat: support column-projected worksheet reads#660

Closed
alexander-beedie wants to merge 1 commit into
tafia:masterfrom
alexander-beedie:feat/worksheet-range-cols
Closed

feat: support column-projected worksheet reads#660
alexander-beedie wants to merge 1 commit into
tafia:masterfrom
alexander-beedie:feat/worksheet-range-cols

Conversation

@alexander-beedie

@alexander-beedie alexander-beedie commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Ref: pola-rs/polars#27677

This PR allows for projecting columns at read time (for xlsx/xlsb), so we can omit columns from the streaming reader before collection - this means peak allocation tracks the selected columns, not the entire sheet's Range.

The linked issue shows a ~40GB allocation which would only need to be a few MB if the required columns could be declared (and the remaining columns filtered out) before we allocate.

@lukapeschke lukapeschke left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this! I'm thinking we could maybe make the new methods take an non-exhaustive options struct ? This would allow to add support for more options later without having to modify the existing API.

Something like this:

#[derive(Debug, Clone, Default)]
#[non_exhaustive]
pub struct RangeOptions {
    column_indices: Option<Vec<u32>>, // or an ordered set. Or (u32, u32) coordinates
    // In another PR, allow to select rows as well
    row_indices: Option<Vec<u32>>, 
}


fn worksheet_range_opts(
       &mut self,
       name: &str,
       opts: RangeOptions,
   ) -> Result<Range<Data>, Self::Error> {]

This would allow us to select only the rows we need , and we could also move the header_row option there eventually, since it's more sheet-specific than reader-specific.

Looks good otherwise, thanks for the extra tests 👍

@alexander-beedie alexander-beedie marked this pull request as draft June 13, 2026 04:01
@alexander-beedie

alexander-beedie commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for addressing this! I'm thinking we could maybe make the new methods take an non-exhaustive options struct ? This would allow to add support for more options later without having to modify the existing API.

@lukapeschke: adding support for rows here is definitely a good move; do you have an idea if there are other meaningful options that would be useful outside of row/col definitions?

In the meantime I'll drop this PR into Draft and start adding rows support right away. I also realised it would be nice to accept RangeBounds for improved ergonomics - probably just need suitable From/Into definitions...

reader.worksheet_range_ref_cols("Sheet1", 0..5)?;       // range
reader.worksheet_range_ref_cols("Sheet1", 0..=4)?;      // inclusive
reader.worksheet_range_ref_cols("Sheet1", 2..)?;        // from col 2 to end
reader.worksheet_range_ref_cols("Sheet1", [1, 3, 5])?;  // explicit/disjoint list
reader.worksheet_range_ref_cols("Sheet1", ..)?;         // all (no projection)

...but ideally we would support declaring disjoint ranges/indices in one go to select a region 🤔

reader.worksheet_range_ref_region("S", 0..5, ..)?;           // range of cols, all rows
reader.worksheet_range_ref_region("S", [1,3,5], 0..100)?;    // specific cols, range of rows
reader.worksheet_range_ref_region("S", [0..3, 8..10], ..)?;  // disjoint col ranges, all rows

@alexander-beedie alexander-beedie force-pushed the feat/worksheet-range-cols branch from 8f9705a to 5005741 Compare June 13, 2026 07:04
@alexander-beedie

alexander-beedie commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

Closing here and raising as a fresh PR... New version now handles regions (that can potentially be disjoint) instead of only cols, and is essentially a rewrite 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants