CsvConverter throws UnicodeDecodeError when charset detection from partial content returns 'ascii'

## Bug

\CsvConverter\ uses \stream_info.charset\ (detected from first 4096 bytes) to decode CSV files. When the first 4096 bytes are ASCII-only but the full file contains non-ASCII characters (e.g., accented characters, UTF-8), \.decode('ascii')\ raises \UnicodeDecodeError\.

## Reproducible example

\\\python
from markitdown import MarkItDown
import io

# CSV with UTF-8 characters beyond the first 4096 bytes
buf = io.BytesIO(('x' * 4096 + ',caf\u00e9\n').encode('utf-8'))
md = MarkItDown()
result = md.convert(buf)
\\\

## Expected

CSV content is decoded successfully, falling back to charset_normalizer when the detected charset fails.

## Actual

\UnicodeDecodeError\ because \stream_info.charset\ reports \'ascii'\ but the file contains UTF-8.

## Affected file

\packages/markitdown/src/markitdown/converters/_csv_converter.py\ lines 45-48

## Note

This is the same class of bug as #1505 (PlainTextConverter), which was fixed in PR #1938.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CsvConverter throws UnicodeDecodeError when charset detection from partial content returns 'ascii' #1949

Bug

Reproducible example

CSV with UTF-8 characters beyond the first 4096 bytes

Expected

Actual

Affected file

Note

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

CsvConverter throws UnicodeDecodeError when charset detection from partial content returns 'ascii' #1949

Description

Bug

Reproducible example

CSV with UTF-8 characters beyond the first 4096 bytes

Expected

Actual

Affected file

Note

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions