Skip to content

Support nonstandard UTF-8 keys in torrent metainfo files#35

Merged
folz merged 2 commits into
masterfrom
claude/elegant-davinci-dwxn4j
Jun 10, 2026
Merged

Support nonstandard UTF-8 keys in torrent metainfo files#35
folz merged 2 commits into
masterfrom
claude/elegant-davinci-dwxn4j

Conversation

@folz

@folz folz commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Fixes #14

  • Decode nonstandard "name.utf-8" and "path.utf-8" keys (written by Vuze/Azureus, BitComet, etc.) into fields on SingleFile and MultiFile, matching libtorrent/Transmission/anacrolix/parse-torrent behavior
  • Other unknown keys remain ignored; torrent/1 shape validation unchanged; new fields default to nil
  • The crash in Torrents with additonal "<blah>.utf-8" fields are unable to be parsed. #14 no longer reproduces on master (fixed by the Implement Bento.Decoder.transform/2 #24/Refactor Bento.Metainfo #25 transform rewrite); this restores the silently-dropped data
  • Tests: single-file and multi-file torrents with .utf-8 keys, plus unknown-keys tolerance
  • Benchmarks (Bento.torrent!/1, Elixir 1.14/OTP 25, prod, best-of-5 × 2): single-file 5.7/6.1 → 5.6/5.7 µs/op; multi-file (321 files) 1049/850 → 941/987 µs/op — within noise; mix bench only covers the untouched parser
  • CI: ubuntu-latest jobs green; ubuntu-20.04 legs never schedule (GitHub retired those runners — pre-existing on all runs)

https://claude.ai/code/session_01BsQoJKLGSnM4Qh1UpAc19H

Clients such as Vuze/Azureus write nonstandard "name.utf-8" and
"path.utf-8" keys holding the UTF-8 encoding of torrents whose standard
fields use a legacy charset. Decoding these crashed in v0.9.x
(binary_to_existing_atom on unknown keys); since the Decoder.transform/2
rewrite they parse, but the data was silently dropped.

Decode these keys into struct fields of the same name on SingleFile and
MultiFile (including "path.utf-8" in files entries), keep ignoring other
unrecognized keys so torrent shape validation is unaffected, and add
regression tests covering nonstandard keys.

Benchmarked Bento.torrent!/1 before/after (Elixir 1.14 / OTP 25,
MIX_ENV=prod): single-file ~5.6-7.6 us/op and multi-file (321 files)
~850-1150 us/op in both cases - within run-to-run noise. The mix bench
suite only exercises Bento.Parser.parse/1, which is unchanged.

Fixes #14

https://claude.ai/code/session_01BsQoJKLGSnM4Qh1UpAc19H
@github-actions github-actions Bot added document Edited the document, or there was some typo in the document test Changes or issues related to unit testing labels Jun 10, 2026
@folz folz merged commit f33eb7d into master Jun 10, 2026
5 of 17 checks passed
@folz folz deleted the claude/elegant-davinci-dwxn4j branch June 10, 2026 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

document Edited the document, or there was some typo in the document test Changes or issues related to unit testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Torrents with additonal "<blah>.utf-8" fields are unable to be parsed.

2 participants