Fix incorrect Content-Length for StringIO with multi-byte characters by veeceey · Pull Request #7201 · psf/requests

veeceey · 2026-02-10T08:35:01Z

Summary

super_len() uses seek/tell to measure the length of file-like objects such as StringIO and BytesIO. However, StringIO.tell() returns the character position, not the byte offset. For strings containing multi-byte UTF-8 characters (e.g. emoji), this produces an incorrect Content-Length header that violates RFC 9110 section 8.6.

For example, io.StringIO("\U0001F4A9") (a single emoji) previously returned a length of 1 (character count) instead of 4 (UTF-8 byte count), causing the server to receive a Content-Length: 1 header while 4 bytes are actually sent.

This is the same class of bug that was fixed for plain str bodies in #6586 -- str is encoded to UTF-8 before measuring, but StringIO was not. This PR makes StringIO handling consistent with str by reading the remaining text, encoding it to UTF-8, and measuring the byte length.

Before

str       → Content-Length: 4  ✓
bytes     → Content-Length: 4  ✓
BytesIO   → Content-Length: 4  ✓
StringIO  → Content-Length: 1  ✗  (character count, not byte count)

After

str       → Content-Length: 4  ✓
bytes     → Content-Length: 4  ✓
BytesIO   → Content-Length: 4  ✓
StringIO  → Content-Length: 4  ✓

Changes

src/requests/utils.py: In super_len(), detect io.StringIO and read+encode the remaining text to compute the UTF-8 byte length instead of relying on tell().
tests/test_utils.py: Added test_super_len_stringio_multibyte covering single emoji, mixed content, partially-read StringIO, and position preservation.

Test plan

All existing TestSuperLen tests pass (ASCII StringIO, BytesIO, partially-read files, etc.)
New test verifies correct byte count for multi-byte characters
New test verifies correct byte count for partially-read StringIO
New test verifies file position is preserved after super_len() call

veeceey · 2026-03-10T13:23:26Z

just wanted to follow up and see if this is good to go or needs more work

StringIO.tell() returns the character position, not the byte offset, so super_len() returned the wrong value for StringIO objects containing multi-byte UTF-8 characters (e.g. emoji). This caused an incorrect Content-Length header that violates RFC 9110 section 8.6. Read the remaining text and encode it to UTF-8 to measure the true byte length, consistent with how plain str bodies are already handled. Closes psf#6917

StantonMatt

I reproduced the issue and the fix locally.

On current main, the reporter's core case still gives StringIO super_len=1 Content-Length=1 while plain str gives 4/4. On this branch the same StringIO case gives super_len=4 Content-Length=4, and the cursor-preservation/partially-read cases pass for me:

TOX_WORK_DIR=.codex-tmp/tox/requests-7201 tox -e py312-default -- tests/test_utils.py -q -k super_len
11 passed, 208 deselected

One small test gap: since #6917 is user-visible through the prepared request header, I think it would be worth adding a direct PreparedRequest().prepare(..., data=io.StringIO(...)) assertion for Content-Length == "4" as well. The new super_len coverage is useful, but a header-level assertion would lock the actual behavior that regressed and make this less dependent on the prepare_content_length -> super_len path staying obvious.

veeceey force-pushed the fix/stringio-content-length-warning branch from 20d9eef to 0406663 Compare March 12, 2026 04:25

StantonMatt reviewed Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix incorrect Content-Length for StringIO with multi-byte characters#7201

Fix incorrect Content-Length for StringIO with multi-byte characters#7201
veeceey wants to merge 1 commit into
psf:mainfrom
veeceey:fix/stringio-content-length-warning

veeceey commented Feb 10, 2026 •

edited

Loading

Uh oh!

veeceey commented Mar 10, 2026

Uh oh!

StantonMatt left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

veeceey commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Before

After

Changes

Test plan

Uh oh!

veeceey commented Mar 10, 2026

Uh oh!

StantonMatt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

veeceey commented Feb 10, 2026 •

edited

Loading

StantonMatt left a comment •

edited

Loading