Skip to content

feat: support region (column|row) projection for worksheet reads#661

Open
alexander-beedie wants to merge 1 commit into
tafia:masterfrom
alexander-beedie:feat/region-projected-reads
Open

feat: support region (column|row) projection for worksheet reads#661
alexander-beedie wants to merge 1 commit into
tafia:masterfrom
alexander-beedie:feat/region-projected-reads

Conversation

@alexander-beedie

@alexander-beedie alexander-beedie commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Ref: pola-rs/polars#27677

Allows for projecting a region (columns/rows) from the sheet's Range at read time so we can collect/allocate only what is needed, instead of (potentially) allocating the entire Range. The linked issue shows a ~40GB allocation that would only need to be a few MB if the required region could be declared before reserving memory.

  • Introduces an IndexSet for row/col indices that can init from a range, index, list of indices, or list of ranges.
  • Extends ReaderRef with worksheet_range_region_ref and worksheet_range_region.
  • Common code all factored out into collect_cells_into_range.
  • No changes to the existing public API.

In use

reader.worksheet_range_region_ref("Sheet", 0..5, ..)?;          // range of cols, all rows
reader.worksheet_range_region_ref("Sheet", [1,3,5], 0..100)?;   // specific cols, range of rows
reader.worksheet_range_region_ref("Sheet", [0..3, 8..10], ..)?; // disjoint col ranges, all rows

(This PR supersedes #660, which only supported column projection).

@lukapeschke lukapeschke left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this will be really helpful 🙏

To answer your question on #660

do you have an idea if there are other meaningful options that would be useful outside of row/col definition

I'd still be in favor of a struct for selection options to avoid an API breakage later.

I see a few things I'd like to add to a SelectOptions struct (we can do it in other PRs to keep everything reviewable):

  1. I'd like to deprecated the HeaderRow param on the reader and move it to the sheet: Right now it mutates the reader, so if we have a first sheet where the header is on row 11 (for example because there is a chart above), we set with_header_row(HeaderRow::Row(10)). But if we want to read another sheet with the same reader, we need to set with_header_row(HeaderRow::default()) first, otherwise the first 10 rows will be ignored
  2. A max_rows parameter would also be nice: it would allow direct support for fastexcel's limit parameters, and would also be convenient for things such as retrieving only the header row with when HeaderRow is FirstNonEmptyRow
  3. A max_cells safeguard to avoid memory explosions

I guess an API like this would be nice to use:

#[derive(Debug, Clone, Default)]
#[non_exhaustive]
pub struct RangeOptions {
    cols: IndexSet,
    rows: IndexSet, 
}

impl RangeOptions {
    pub fn with_cols(self, cols: impl Into<IndexSet>) -> Self;
    pub fn with_rows(self, rows: impl Into<IndexSet>) -> Self;
    // In later PRs
    pub fn with_header_row(self, header_row: HeaderRow) -> Self;
    pub fn with_row_limit(self, limit: u32) -> Self;
    pub fn with_max_cells(self, limit: u32) -> Self;
}

Comment thread src/index_set.rs
Comment thread src/utils.rs
@alexander-beedie alexander-beedie force-pushed the feat/region-projected-reads branch 6 times, most recently from d5b5d63 to ff8c01a Compare June 15, 2026 13:28
@jmcnamara

Copy link
Copy Markdown
Collaborator

Folks. Let me know when this is complete/ready for merge.

@lukapeschke

Copy link
Copy Markdown
Contributor

Thanks for the updates @alexander-beedie ! 🙏 Just curious about your opinion on the extensible options struct vs the cols and rows parameters ?

@alexander-beedie

Copy link
Copy Markdown
Contributor Author

Thanks for the updates @alexander-beedie ! 🙏 Just curious about your opinion on the extensible options struct vs the cols and rows parameters ?

Yup, I think it's worthwhile - just thinking about implementation :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants