Skip to content

Fix incorrect Content-Length for StringIO with multi-byte characters#7201

Open
veeceey wants to merge 1 commit into
psf:mainfrom
veeceey:fix/stringio-content-length-warning
Open

Fix incorrect Content-Length for StringIO with multi-byte characters#7201
veeceey wants to merge 1 commit into
psf:mainfrom
veeceey:fix/stringio-content-length-warning

Conversation

@veeceey
Copy link
Copy Markdown

@veeceey veeceey commented Feb 10, 2026

Summary

Fixes #6917.

super_len() uses seek/tell to measure the length of file-like objects such as StringIO and BytesIO. However, StringIO.tell() returns the character position, not the byte offset. For strings containing multi-byte UTF-8 characters (e.g. emoji), this produces an incorrect Content-Length header that violates RFC 9110 section 8.6.

For example, io.StringIO("\U0001F4A9") (a single emoji) previously returned a length of 1 (character count) instead of 4 (UTF-8 byte count), causing the server to receive a Content-Length: 1 header while 4 bytes are actually sent.

This is the same class of bug that was fixed for plain str bodies in #6586 -- str is encoded to UTF-8 before measuring, but StringIO was not. This PR makes StringIO handling consistent with str by reading the remaining text, encoding it to UTF-8, and measuring the byte length.

Before

str       → Content-Length: 4  ✓
bytes     → Content-Length: 4  ✓
BytesIO   → Content-Length: 4  ✓
StringIO  → Content-Length: 1  ✗  (character count, not byte count)

After

str       → Content-Length: 4  ✓
bytes     → Content-Length: 4  ✓
BytesIO   → Content-Length: 4  ✓
StringIO  → Content-Length: 4  ✓

Changes

  • src/requests/utils.py: In super_len(), detect io.StringIO and read+encode the remaining text to compute the UTF-8 byte length instead of relying on tell().
  • tests/test_utils.py: Added test_super_len_stringio_multibyte covering single emoji, mixed content, partially-read StringIO, and position preservation.

Test plan

  • All existing TestSuperLen tests pass (ASCII StringIO, BytesIO, partially-read files, etc.)
  • New test verifies correct byte count for multi-byte characters
  • New test verifies correct byte count for partially-read StringIO
  • New test verifies file position is preserved after super_len() call

@veeceey
Copy link
Copy Markdown
Author

veeceey commented Mar 10, 2026

just wanted to follow up and see if this is good to go or needs more work

StringIO.tell() returns the character position, not the byte offset,
so super_len() returned the wrong value for StringIO objects containing
multi-byte UTF-8 characters (e.g. emoji).  This caused an incorrect
Content-Length header that violates RFC 9110 section 8.6.

Read the remaining text and encode it to UTF-8 to measure the true
byte length, consistent with how plain str bodies are already handled.

Closes psf#6917
@veeceey veeceey force-pushed the fix/stringio-content-length-warning branch from 20d9eef to 0406663 Compare March 12, 2026 04:25
Copy link
Copy Markdown

@StantonMatt StantonMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reproduced the issue and the fix locally.

On current main, the reporter's core case still gives StringIO super_len=1 Content-Length=1 while plain str gives 4/4. On this branch the same StringIO case gives super_len=4 Content-Length=4, and the cursor-preservation/partially-read cases pass for me:

TOX_WORK_DIR=.codex-tmp/tox/requests-7201 tox -e py312-default -- tests/test_utils.py -q -k super_len
11 passed, 208 deselected

One small test gap: since #6917 is user-visible through the prepared request header, I think it would be worth adding a direct PreparedRequest().prepare(..., data=io.StringIO(...)) assertion for Content-Length == "4" as well. The new super_len coverage is useful, but a header-level assertion would lock the actual behavior that regressed and make this less dependent on the prepare_content_length -> super_len path staying obvious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect Content-Length header with StringIO body

2 participants