Skip to content

folz/bento

Repository files navigation

Bento ci hex.pm

Bento is a Bencoding library for Elixir focusing on incredibly fast speed without sacrificing simplicity, completeness, or correctness.

The parser is a single tail-recursive state machine over the input binary, using several techniques to get the most out of the BEAM:

  • A single pass over the input, scanned by byte offset, with strings extracted as zero-copy sub-binaries in one slice.
  • Containers tracked on an explicit stack rather than the call stack, so values return without per-value tuple allocations and arbitrarily deep nesting is safe.
  • Decoding options resolved once, up front into functions, keeping the hot loop free of conditionals.
  • IO list encoding, with encoder dispatch on the value's type directly and the Bento.Encoder protocol reserved for structs and custom types.

Bento rejects all malformed input - including the out-of-order and duplicate dictionary keys that BEP-3 forbids - with errors that report the byte position and the offending byte - never a multi-megabyte error message. This guarantees you're working with a well-formed bencoded file.

Encoding always produces canonical Bencoding: dictionary keys are normalized to strings, emitted in byte-wise sorted order, and key collisions (like %{:a => 1, "a" => 2}) are rejected rather than silently emitting an invalid dictionary - so hashes computed over Bento's output (like torrent info-hashes) are correct.

Documentation

Documentation is available on Hexdocs.

Installation

Bento is available in Hex. The package can be installed by:

  1. Add bento to your list of dependencies in mix.exs:
{:bento, "~> 2.0"}
  1. Then, update your dependencies.
$ mix do deps.get + deps.compile

Usage

Encoding an Elixir data type:

iex> Bento.encode([1, "two", [3]])
{:ok, "li1e3:twoli3eee"}
iex> Bento.encode!(%{"foo" => ["bar", "baz"], "qux" => "norf"})
"d3:fool3:bar3:baze3:qux4:norfe"

Decoding a bencoded string:

iex> Bento.decode("li1e3:twoli3eee")
{:ok, [1, "two", [3]]}
iex> Bento.decode!("d3:fool3:bar3:baze3:qux4:norfe")
%{"foo" => ["bar", "baz"], "qux" => "norf"}

Decoding errors tell you where and what went wrong:

iex> Bento.decode("d3:foo")
{:error, %Bento.SyntaxError{position: 6, ...}}
iex> Bento.decode!("i4x2e")
** (Bento.SyntaxError) unexpected byte at position 2: 0x78 ("x")

Decoding options

  • keys: :strings | :atoms | :atoms! | (key -> term) - how dictionary keys are decoded.
  • strings: :reference | :copy - :reference (default) returns zero-copy sub-binaries into the input; use :copy when decoded values outlive the input (e.g. stored in ETS), so a small retained string doesn't keep a large input binary alive.
  • dicts: :strict | :lenient | :ordered - :strict (default) requires unique, canonically sorted keys as BEP-3 mandates; :lenient skips those checks for non-conforming files; :ordered returns Bento.OrderedDict structs preserving wire order, so even non-canonical input re-encodes byte-for-byte.
iex> Bento.decode!("d1:bi1e1:ai2ee", dicts: :ordered) |> Bento.encode!()
"d1:bi1e1:ai2ee"

For streams carrying several consecutive values, Bento.decode_prefix/2 parses one value off the front and returns the rest:

iex> Bento.decode_prefix("i1ei2e")
{:ok, 1, "i2e"}

Structs

Structs can derive Bento.Encoder, optionally restricting fields and skipping nils; keys are pre-encoded at compile time:

defmodule MyMeta do
  @derive {Bento.Encoder, skip_nil: true}
  defstruct [:announce, :info, :comment]
end

Already-encoded parts (like a cached info dictionary) can be spliced in without re-encoding via Bento.Fragment:

iex> Bento.encode!(%{"info" => Bento.Fragment.new(cached_info)})

Torrents

Bento is also metainfo-aware and comes with a *.torrent decoder out of the box:

iex> File.read!("./test/_data/ubuntu-14.04.4-desktop-amd64.iso.torrent") |> Bento.torrent!()
%Bento.Metainfo.Torrent{
  info: %Bento.Metainfo.SingleFile{
    length: 1069547520,
    md5sum: nil,
    "piece length": 524288,
    pieces: <<109, 235, 143, 234, 36, 25, 142, 36, 20, 3, 227, 227, 134, 136,
      205, 130, 176, 104, 192, 33, 45, 230, 152, 2, 239, 131, 240, 217, 180,
      251, 153, 170, 31, 127, 175, 166, 9, 254, 133, 8, 42, 229, 43, 139, 86,
      ...>>,
    private: 0,
    name: "ubuntu-14.04.4-desktop-amd64.iso",
    "name.utf-8": nil
  },
  announce: "http://torrent.ubuntu.com:6969/announce",
  "announce-list": [
    ["http://torrent.ubuntu.com:6969/announce"],
    ["http://ipv6.torrent.ubuntu.com:6969/announce"]
  ],
  "creation date": ~U[2016-02-18 20:12:51Z],
  comment: "Ubuntu CD releases.ubuntu.com",
  "created by": nil,
  encoding: nil
}

In addition to parsing torrents via Bento.torrent!/1, It's also available decoding any bencoded data into any struct you choose, like so:

defmodule Name do
  defstruct [:family, :given]
end

iex> Bento.decode!("d6:family4:Folz5:given6:Rodneye", as: %Name{})
%Name{family: "Folz", given: "Rodney"}

Testing

Beyond unit tests, Bento is tested against:

  • A conformance suite of accept/reject vectors in test/bencode_test_suite/, covering the BEP-3 grammar and its edge cases (leading zeros, unterminated values, length overruns, non-string keys, duplicate and unsorted keys, and so on).
  • Property-based tests: encode/decode round-trips over arbitrary (including non-UTF-8) data, canonical-encoding invariants, and fuzzing via random mutation and truncation of valid input - decoding must always return a positioned error and never crash.
$ mix test

Benchmarking

The benchmark suite lives in bench/ as a standalone project and measures both throughput and memory across shape-isolated inputs (large file lists, huge piece strings, many small messages, deep nesting, real torrents):

$ cd bench
$ mix deps.get
$ mix bench.gen      # generate the synthetic corpus
$ mix bench.decode
$ mix bench.encode
$ mix bench.retention  # demonstrate strings: :reference vs :copy retention

Runs are saved under bench/output/runs/ and automatically compared against previous runs, so before/after numbers for a change come for free. HTML reports are written to bench/output/.

We currently benchmark against: Bento (this project), bencode, and Bencodex.

We are aware of, but unable to benchmark against: exbencode (build errors), elixir_bencode (module name conflicts with Bencode), and bencoder (does not compile on Elixir 1.17+).

PRs that add libraries to the benchmarks are greatly appreciated!

License

See LICENSE.

About

🍱 A fast, correct, pure-Elixir library for reading and writing Bencoded metainfo (.torrent) files.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages