fix(markdown): preserve pipe characters inside inline code spans in table cells#7884
fix(markdown): preserve pipe characters inside inline code spans in table cells#7884sahiee-dev wants to merge 2 commits into
Conversation
…able cells marked's splitCells only exempts backslash-escaped pipes when splitting table rows into cells, so `||` or `a || b` inside a table cell was treated as column delimiters instead of literal content. Add a preprocessing step in MarkdownManager.parse() that escapes pipe characters inside backtick code spans on table-row lines before handing the markdown to marked. marked already converts \| back to | after splitting, so no post-processing is needed. Fixes ueberdosis#7858
🦋 Changeset detectedLatest commit: e52bcc4 The changes in this PR will be included in the next version bump. This PR includes changesets to release 72 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
✅ Deploy Preview for tiptap-embed ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Updated the PR based on your feedback. The fix is now entirely inside I also found and fixed a subtle bug in the first revision: The fix now extracts only the table lines before re-lexing, and I've added a regression test that would have caught this issue. |
…izer The preprocessing logic now lives entirely in extension-table via a block-level markdownTokenizer, keeping MarkdownManager generic. Also fixes a subtle raw-field length mismatch where helper.blockTokens was called on the full remaining source, causing marked to advance its cursor past the end of the table and silently drop content that followed. Fix: extract only the table lines before re-lexing so raw always corresponds to the original source length.
Changes Overview
Fixes a parsing bug where pipe characters (
|) inside backtick inline code spans were incorrectly treated as table column delimiters, causing table rows with cells like`||`or`a || b`to split into the wrong number of columns and lose their code formatting.Implementation Approach
marked's internalsplitCellsfunction splits table rows by replacing every|with a split marker, only exempting backslash-escaped pipes (\|). It has no awareness of backtick code spans, so`||`in a table cell was split into multiple columns.The fix adds a preprocessing step in
MarkdownManager.parse(), before the markdown string is passed to marked's lexer, any|characters found inside backtick code spans on table-row lines (lines starting with|) are escaped to\|.Since
splitCellsalready converts\|back to|after splitting, the cell content is restored correctly with no second pass needed.The preprocessing handles single, double, and triple backtick spans and is a no-op for lines that are not table rows or contain no backtick spans, so it does not affect any other markdown parsing.
Testing Done
Added 5 tests to
packages/extension-table/__tests__/tableMarkdown.spec.ts:`||`in a table cell as a single cell with a code mark`||`,or,`a || b`correctly&&(was also affected)All 6 tests in the file pass (1 pre-existing, 5 new).
Verification Steps
vitefromdemos/) and open:Confirm the table renders with 3 columns and inline code formatting on each code span.
Click Extract Markdown and confirm the output preserves the backtick wrappers.
Additional Notes
The root cause is entirely inside
marked'ssplitCells, it is not something Tiptap can fix by overriding the table tokenizer without reimplementing the full table block rule.The preprocessing approach is the minimal, safe intervention point within Tiptap's own code.
Related Issues
Fixes #7858