Skip to content

feat(xlsx): implement cell style extraction with rich text and worksheet layout#653

Open
ddimaria wants to merge 1 commit into
tafia:masterfrom
ddimaria:feat/styles
Open

feat(xlsx): implement cell style extraction with rich text and worksheet layout#653
ddimaria wants to merge 1 commit into
tafia:masterfrom
ddimaria:feat/styles

Conversation

@ddimaria

@ddimaria ddimaria commented May 12, 2026

Copy link
Copy Markdown

Summary

Adds style support for Calamine xlsx files. Supersedes #538, which had stale review comments and a long discussion thread that made the page hard to load. This PR is a clean, single squashed commit on top of master.

Public API

// row × col grid of cell styles (RLE-compressed internally)
let styles = xlsx.worksheet_style("Sheet1")?;
for (row, col, style) in styles.cells() {
    if let Some(font) = style.get_font() {
        if font.is_bold() { /* ... */ }
    }
}

// column widths and row heights
let layout = xlsx.worksheet_layout("Sheet1")?;

What's included

  • src/style.rs: Style, Color, Font, Fill, Borders, Alignment, NumberFormat, Protection, RichText, TextRun, StyleRange, WorksheetLayout.
  • src/xlsx/style_parser.rs: parser for styles.xml, theme colors with tint resolution, indexed-color fallback, sysClr last-resolved fallback.
  • Run-length encoded StyleRange for memory efficiency on large workbooks (tested with a 1M-cell fixture).
  • Rich text support: the shared-string reader now decodes <r> runs and preserves per-run formatting. Also handles the case (consistent with xlsx: fix ignoring rich text <r> after initial plain <t> #637) where rich runs follow an initial plain <t>.
  • Benchmarks in benches/style.rs (criterion).
  • Test fixtures: styles.xlsx, borders.xlsx, EMSI_JobChange_UK.xlsx, problematic_formats.xlsx, styles_1M.xlsx.

Notes on review feedback from #538

  • Squashed into a single commit per @jmcnamara's request.
  • Removed STYLE_FEATURE.md (was flagged for removal in the previous review).
  • Removed accidentally committed profiler artifact (profile.json.gz).
  • xlsb support and conditional formatting are intentionally left to follow-up PRs (Implement conditional formatting #628 already exists for conditional formatting).

Test plan

  • cargo test --all-features — 267 tests pass (159 integration, 63 doc, 45 unit), 0 failures.
  • cargo check --tests --all-features — clean (warnings are pre-existing).
  • cargo fmt --check / cargo clippy — happy to address any remaining lints during review.

…eet layout

Add style support for Calamine xlsx files.

Public API:
- `Xlsx::worksheet_style(sheet)` returns a row x col grid of cell styles
  using run-length encoding for memory efficiency on large workbooks.
- `Xlsx::worksheet_layout(sheet)` returns column widths and row heights.

Style types (in `src/style.rs`):
- `Style` with optional Font / Fill / Borders / Alignment / NumberFormat
  / Protection.
- `Color` with theme + tint resolution and indexed-color fallback.
- `RichText` / `TextRun` for cells with mixed inline formatting.
- `StyleRange` with RLE storage and a `cells()` iterator.

Parser in `src/xlsx/style_parser.rs` handles fonts (bold / italic /
underline / strikethrough / sz / color), fills, borders (with color and
style per side), number formats (built-in + custom format codes),
alignment (horizontal / vertical / wrap / indent / shrink / text
rotation incl. stacked), protection (locked default per OOXML), theme
colors with tint, and sysClr lastClr fallback.

Shared-string reader now decodes rich text runs and preserves their
formatting, while also handling plain text that precedes rich runs
(consistent with upstream PR tafia#637).

Includes benchmarks in `benches/style.rs` and test fixtures
(styles.xlsx, borders.xlsx, EMSI_JobChange_UK.xlsx,
problematic_formats.xlsx, styles_1M.xlsx) covering the various code
paths.

Co-authored-by: Cursor <cursoragent@cursor.com>
@ddimaria

Copy link
Copy Markdown
Author

@jmcnamara I just pushed up a lint fix for this one

@@ -0,0 +1,206 @@
// SPDX-License-Identifier: MIT

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than generate this file dynamically, and require another dependency (even if it is a dev one) it would be simpler to just generate the required files and add them to the tests directory.

Also, this doesn't matter for a benchmark test case, but I think Excel only supports 32k styles per workbook.

@@ -0,0 +1,206 @@
// SPDX-License-Identifier: MIT

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as other benchmark test case, generate the file and add it to the tests directory.

Comment thread examples/layout.rs

use calamine::{open_workbook, Reader, Xlsx};

/// Example demonstrating how to capture column widths and row heights from Excel files

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give this example a better name like read_row_and_column_dimensions.rs. Also add it to the examples/README.md file.

Comment thread examples/layout.rs
/// Example demonstrating how to capture column widths and row heights from Excel files
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Open an Excel file
let path = format!("{}/tests/styles.xlsx", env!("CARGO_MANIFEST_DIR"));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't bother to use env!("CARGO_MANIFEST_DIR") in the examples since it isn't necessary. Just use the explicit path instead: let path = "tests/styles.xlsx";

@jmcnamara

Copy link
Copy Markdown
Collaborator

@ddimaria I will try to do a high level review of this over the next few days/weekend.

You will need to rebase/resolve conflicts due to the commits that will be merged ahead of this.

@jmcnamara jmcnamara self-assigned this Jun 3, 2026
@jmcnamara jmcnamara added the needs work for merge The PR needs some rework or clarification. No suitable for merge, yet. label Jun 3, 2026
@jmcnamara jmcnamara mentioned this pull request Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs work for merge The PR needs some rework or clarification. No suitable for merge, yet.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants