Support nonstandard UTF-8 keys in torrent metainfo files#35
Merged
Conversation
Clients such as Vuze/Azureus write nonstandard "name.utf-8" and "path.utf-8" keys holding the UTF-8 encoding of torrents whose standard fields use a legacy charset. Decoding these crashed in v0.9.x (binary_to_existing_atom on unknown keys); since the Decoder.transform/2 rewrite they parse, but the data was silently dropped. Decode these keys into struct fields of the same name on SingleFile and MultiFile (including "path.utf-8" in files entries), keep ignoring other unrecognized keys so torrent shape validation is unaffected, and add regression tests covering nonstandard keys. Benchmarked Bento.torrent!/1 before/after (Elixir 1.14 / OTP 25, MIX_ENV=prod): single-file ~5.6-7.6 us/op and multi-file (321 files) ~850-1150 us/op in both cases - within run-to-run noise. The mix bench suite only exercises Bento.Parser.parse/1, which is unchanged. Fixes #14 https://claude.ai/code/session_01BsQoJKLGSnM4Qh1UpAc19H
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #14
"name.utf-8"and"path.utf-8"keys (written by Vuze/Azureus, BitComet, etc.) into fields onSingleFileandMultiFile, matching libtorrent/Transmission/anacrolix/parse-torrent behaviortorrent/1shape validation unchanged; new fields default tonilBento.Decoder.transform/2#24/RefactorBento.Metainfo#25 transform rewrite); this restores the silently-dropped data.utf-8keys, plus unknown-keys toleranceBento.torrent!/1, Elixir 1.14/OTP 25, prod, best-of-5 × 2): single-file 5.7/6.1 → 5.6/5.7 µs/op; multi-file (321 files) 1049/850 → 941/987 µs/op — within noise;mix benchonly covers the untouched parserubuntu-latestjobs green;ubuntu-20.04legs never schedule (GitHub retired those runners — pre-existing on all runs)https://claude.ai/code/session_01BsQoJKLGSnM4Qh1UpAc19H