feat(xlsx): implement cell style extraction with rich text and worksheet layout#653
feat(xlsx): implement cell style extraction with rich text and worksheet layout#653ddimaria wants to merge 1 commit into
Conversation
…eet layout Add style support for Calamine xlsx files. Public API: - `Xlsx::worksheet_style(sheet)` returns a row x col grid of cell styles using run-length encoding for memory efficiency on large workbooks. - `Xlsx::worksheet_layout(sheet)` returns column widths and row heights. Style types (in `src/style.rs`): - `Style` with optional Font / Fill / Borders / Alignment / NumberFormat / Protection. - `Color` with theme + tint resolution and indexed-color fallback. - `RichText` / `TextRun` for cells with mixed inline formatting. - `StyleRange` with RLE storage and a `cells()` iterator. Parser in `src/xlsx/style_parser.rs` handles fonts (bold / italic / underline / strikethrough / sz / color), fills, borders (with color and style per side), number formats (built-in + custom format codes), alignment (horizontal / vertical / wrap / indent / shrink / text rotation incl. stacked), protection (locked default per OOXML), theme colors with tint, and sysClr lastClr fallback. Shared-string reader now decodes rich text runs and preserves their formatting, while also handling plain text that precedes rich runs (consistent with upstream PR tafia#637). Includes benchmarks in `benches/style.rs` and test fixtures (styles.xlsx, borders.xlsx, EMSI_JobChange_UK.xlsx, problematic_formats.xlsx, styles_1M.xlsx) covering the various code paths. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@jmcnamara I just pushed up a lint fix for this one |
| @@ -0,0 +1,206 @@ | |||
| // SPDX-License-Identifier: MIT | |||
There was a problem hiding this comment.
Rather than generate this file dynamically, and require another dependency (even if it is a dev one) it would be simpler to just generate the required files and add them to the tests directory.
Also, this doesn't matter for a benchmark test case, but I think Excel only supports 32k styles per workbook.
| @@ -0,0 +1,206 @@ | |||
| // SPDX-License-Identifier: MIT | |||
There was a problem hiding this comment.
Same comment as other benchmark test case, generate the file and add it to the tests directory.
|
|
||
| use calamine::{open_workbook, Reader, Xlsx}; | ||
|
|
||
| /// Example demonstrating how to capture column widths and row heights from Excel files |
There was a problem hiding this comment.
Give this example a better name like read_row_and_column_dimensions.rs. Also add it to the examples/README.md file.
| /// Example demonstrating how to capture column widths and row heights from Excel files | ||
| fn main() -> Result<(), Box<dyn std::error::Error>> { | ||
| // Open an Excel file | ||
| let path = format!("{}/tests/styles.xlsx", env!("CARGO_MANIFEST_DIR")); |
There was a problem hiding this comment.
Don't bother to use env!("CARGO_MANIFEST_DIR") in the examples since it isn't necessary. Just use the explicit path instead: let path = "tests/styles.xlsx";
|
@ddimaria I will try to do a high level review of this over the next few days/weekend. You will need to rebase/resolve conflicts due to the commits that will be merged ahead of this. |
Summary
Adds style support for Calamine xlsx files. Supersedes #538, which had stale review comments and a long discussion thread that made the page hard to load. This PR is a clean, single squashed commit on top of
master.Public API
What's included
src/style.rs:Style,Color,Font,Fill,Borders,Alignment,NumberFormat,Protection,RichText,TextRun,StyleRange,WorksheetLayout.src/xlsx/style_parser.rs: parser forstyles.xml, theme colors with tint resolution, indexed-color fallback,sysClrlast-resolved fallback.StyleRangefor memory efficiency on large workbooks (tested with a 1M-cell fixture).<r>runs and preserves per-run formatting. Also handles the case (consistent with xlsx: fix ignoring rich text <r> after initial plain <t> #637) where rich runs follow an initial plain<t>.benches/style.rs(criterion).styles.xlsx,borders.xlsx,EMSI_JobChange_UK.xlsx,problematic_formats.xlsx,styles_1M.xlsx.Notes on review feedback from #538
STYLE_FEATURE.md(was flagged for removal in the previous review).profile.json.gz).xlsbsupport and conditional formatting are intentionally left to follow-up PRs (Implement conditional formatting #628 already exists for conditional formatting).Test plan
cargo test --all-features— 267 tests pass (159 integration, 63 doc, 45 unit), 0 failures.cargo check --tests --all-features— clean (warnings are pre-existing).cargo fmt --check/cargo clippy— happy to address any remaining lints during review.