🤖 Add voice support (send + receive + DAVE/MLS E2EE)#29
Conversation
Implements joining/leaving/moving voice channels, sending audio (Opus passthrough + PCM encode), receiving audio (per-user sinks), and DAVE/MLS end-to-end encryption, faithful to discord.http's pure-asyncio, stdlib-first design. - Phase 0: merge VoiceState/PartialVoiceState into channel.py; delete voice.py - Phase 1: op-4 sender, VOICE_SERVER_UPDATE parser + routing, voice-client registry on Client, PartialChannel.connect() - voice/ subpackage: voice WS (v8) + heartbeat, UDP/IP-discovery, RTP framing, aead_aes256_gcm_rtpsize via cryptography (no PyNaCl), ctypes libopus binding (optional), Ogg/Opus parser, async AudioSource/player + FFmpeg sources, receiver + sinks, DAVE/MLS via the optional davey extra - Phase 7: ExponentialBackoff, voice-close-code reconnect/resume, opt-in resume_voice persistence, default insta-leave on shard reset - Phase 8: public exports, voice example, pyproject voice package + extra Transport encryption needs no extra deps. libopus (PCM encode/decode) and FFmpeg (transcoding) are optional external runtimes; davey (DAVE E2EE) is the sole optional Python dep, gated behind discord.http[voice]. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an end-to-end Discord voice subsystem: voice-state models and connect API, client voice registry, gateway/shard parsing and routing, voice websocket and UDP transport, RTP AES‑GCM encryptor, libopus bindings and Ogg parser, playback/receive pipeline, optional DAVE E2EE, tests, and an example. ChangesDiscord voice implementation
Sequence Diagram(s)sequenceDiagram
participant Client
participant Shard
participant VoiceConnection
participant VoiceSocket
participant VoiceUDPProtocol
Client->>Shard: change_voice_state(guild_id, channel_id)
Shard-->>VoiceConnection: voice_state_update, voice_server_update
VoiceConnection->>VoiceSocket: connect()/IDENTIFY or RESUME
VoiceSocket-->>VoiceConnection: ready, session_description
VoiceConnection->>VoiceUDPProtocol: create_udp() and discover_ip(ssrc)
VoiceConnection->>VoiceConnection: build Encryptor(secret_key) / init DAVE
VoiceUDPProtocol->>VoiceConnection: incoming RTP -> VoiceReceiver.unpack()
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
my bad |
`PLW0717` ("Try clause contains too many statements") is a valid, active
rule in the canonical lint toolchain (`make lint` -> uv-resolved ruff
0.15.x) and was deliberately ignored in the original pyproject.toml. An
earlier commit removed it after a misdiagnosis from a system-installed
ruff build that didn't recognize the selector; that re-exposed 10
PLW0717 violations (7 in pre-existing files). Restoring the maintainer's
ignore entry makes `make lint` pass clean again.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
coderabbit review limit is insane @codex review |
AlexFlipnote
left a comment
There was a problem hiding this comment.
Please do test the PR as well and show it working before I take another look, these are the very obvious things I found.
| bot_user = self.bot.application.bot if self.bot.application else None | ||
| if bot_user is not None and int(data["user_id"]) == bot_user.id: | ||
| vc = self.bot._get_voice_client(int(data["guild_id"])) | ||
| if vc is not None: |
There was a problem hiding this comment.
This is a very odd spot to just randomly create asyncio tasks, as none of the parser methods are even use it. Why even?
| ) | ||
|
|
||
| self._revive_voice_clients() | ||
|
|
There was a problem hiding this comment.
This might cause issues because the websocket for intents and voice are two seperate scenarios. I do not believe the voice is affected by a kill and resume inside of the intent socket
| from .receiver import * | ||
| from .sinks import * | ||
|
|
||
| __all__ = ( |
There was a problem hiding this comment.
Don't use __all__ inside of __init__, move it to each individual files for a cleaner approach to the main file instead, just like all the others.
| await self.connection.close_transport() | ||
| self.client._remove_voice_client(self.guild_id) | ||
|
|
||
| async def move_to(self, channel: "PartialChannel") -> None: |
There was a problem hiding this comment.
move_to should accept both a partialchannel or int, for more convinience and less forcing a user to create a partialchannel for one task
|
|
||
| def play( | ||
| self, | ||
| audio: Any, # noqa: ANN401 |
There was a problem hiding this comment.
Don't use Any.. be more strict
| } | ||
|
|
||
|
|
||
| class OpusError(Exception): |
There was a problem hiding this comment.
These I would rather move to the universal errors.py file in the root
| OpusError | ||
| If an explicit ``name`` was given but the library could not be loaded. | ||
| """ | ||
| global _lib, _loaded # noqa: PLW0603 |
There was a problem hiding this comment.
This looks.. a bit sketchy.. please figure out a different way to avoid using global, even you ignoring says a lot
|
|
||
|
|
||
| class VoiceCloseCode(IntEnum): | ||
| """ The voice gateway websocket close codes that govern reconnect behaviour. """ |
| The shared HTTP session, or a freshly created one owned by this socket. | ||
| """ | ||
| try: | ||
| session = self.connection.voice_client.client.state.http.session |
There was a problem hiding this comment.
Please.. find a cleaner approach to this, this is just.. why
|
|
||
|
|
||
| @client.command() | ||
| async def join(ctx: Context): |
There was a problem hiding this comment.
Make it instead, where you target a voice channel instead of replying on hard-coded ints
Implements AlexFlipnote's review comments on AlexFlipnote#29: - enums: rename VoiceOp -> VoiceOpType to match the library's *Type naming - errors: move OpusError/OpusNotLoaded into the root errors.py - opus: drop the module-level `global` loader state in favour of an encapsulated loader object - socket: VoiceCloseCode now subclasses BaseEnum; reuse the bot's shared aiohttp session instead of the deep try/except + self-owned session - gateway_udp: narrow the transport with isinstance instead of a type: ignore, keeping full type safety - connection: parse the voice endpoint with the library's URL helper, import has_dave/DaveManager at module level, and read the reconnect attempt cap from the new Client.voice_reconnect_attempts variable - dave: early-return guard in set_passthrough_mode and a match/case in handle_binary - client (voice): move_to accepts a channel or an int id; play uses a strict AudioSourceInput type alias instead of Any - gateway: voice clients are driven from shard special handlers, not from asyncio tasks created inside the parser; the parser methods are pure again - gateway/client: decouple voice from the intent socket lifecycle (drop the READY/RESUMED revive + reset teardown and the resume_voice plumbing), since the voice websocket manages its own reconnect - voice package: drop __all__ from __init__ and rely on each module's __all__ - example: target the caller's current voice channel instead of hard-coded ids - logging: use f-strings throughout the voice package to match the codebase - tests: port the ogg parser tests to unittest (the project's runner) All checks green: ruff clean, pyright 0 errors, 106 unittest tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thanks for the review! Pushed
Also converted the whole voice package's logging to f-strings to match the codebase, and ported the ogg-parser tests to Tested & working ( Functional smoke checks (run via
|
get_member_voice_state() only returns data when the library is caching voice states. The guild_voice_states intent makes Discord SEND the updates, but gateway_cache decides what is kept; with no gateway_cache the cache flags are None, update_voice_state() no-ops, and the bot always thinks the caller is not in a voice channel. Enable GatewayCacheFlags.guilds | GatewayCacheFlags.voice_states (voice states need a cached guild to hang on) and document why both the intent and the cache flag are required. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
discord_http/client.py (1)
133-156:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReject negative
voice_reconnect_attemptsvalues.
discord_http/voice/connection.py:274-322uses this as a loop bound for reconnects. Negative values currently skip the loop entirely and tear the voice client down immediately, which is a surprising failure mode for a public config knob.Suggested fix
debug_events: bool = False, voice_reconnect_attempts: int = 5 ): + if voice_reconnect_attempts < 0: + raise ValueError("voice_reconnect_attempts must be >= 0") + if application_id is not None: _log.warning( "application_id parameter is no longer needed, it will be fetched automatically." )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@discord_http/client.py` around lines 133 - 156, Validate the voice_reconnect_attempts parameter in the constructor and reject negative values: in the method that sets voice_reconnect_attempts (look for the constructor where the voice_reconnect_attempts parameter is accepted and assigned to self.voice_reconnect_attempts), add a check like if voice_reconnect_attempts < 0: raise ValueError("voice_reconnect_attempts must be non-negative") so callers get a clear error instead of silently skipping the reconnect loop used in discord_http/voice/connection.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@discord_http/channel.py`:
- Around line 541-548: The new VoiceClient is registered with
client._add_voice_client before awaiting vc.connect, but if vc.connect raises or
times out the stale entry remains; wrap the await vc.connect(...) in a
try/except (or try/finally) and on any exception remove the registration (call
the matching removal API, e.g. client._remove_voice_client(self.guild_id) or
client._remove_voice_client(self.guild_id, vc) depending on the existing
signature) before re-raising the exception so failed connect attempts do not
poison the voice-client registry; reference VoiceClient,
client._add_voice_client, vc.connect and the removal helper when making the
change.
- Around line 2864-2873: The returned VoiceState is being built using the stale
self.channel_id instead of the authoritative channel_id from the fresh payload
(r.response); update the channel resolution so you read channel_id from
r.response (e.g., channel_id = r.response.get("channel_id") or equivalent), then
call self._state.cache.get_channel using the guild (resolved from self.guild_id)
and that fresh channel_id to produce the channel variable, and pass that channel
into the VoiceState constructor (VoiceState(state=self._state, data=r.response,
guild=guild, channel=channel)).
In `@discord_http/voice/connection.py`:
- Around line 260-269: The resume path never marks the connection as connected
because _resume() clears self._connected_event but on_resumed() only logs;
update the resume completion to set the connected state so waiters unfreeze:
either call self._connected_event.set() (and any internal connected flag) inside
on_resumed() when the RESUME handshake is confirmed, or set the event
immediately after a successful await self.socket.connect(resume=True) in
_resume(); apply the same change to the other resume block (the one around lines
437-446) so is_connected() becomes true after a successful resume.
- Around line 174-179: The reconnect backoff is being reset on every retry
because connect() unconditionally calls self._backoff.reset(), so
_full_reconnect()'s intended exponential backoff never grows; change the logic
so that self._backoff.reset() is only called when starting a fresh reconnect
sequence or after a successful connection, not on each connect() attempt — e.g.,
remove/reset call from connect() and instead call self._backoff.reset() at the
start of _full_reconnect() (or add a parameter to connect() like
reset_backoff=False and only reset when True); update any similar retry loops
referenced (the block around lines 297-315) to use the same pattern.
- Around line 412-417: The code sets self.dave_protocol_version and
reinitializes when dave_version > 0 but does not clear an existing DAVE session
when the new SESSION_DESCRIPTION reports no DAVE; update the SESSION_DESCRIPTION
handling (where self.dave_protocol_version is set and reinit_dave_session is
called) to explicitly tear down any previous DAVE state when dave_version == 0
by clearing self.dave_session and any related keys/metadata (and invoking the
existing DAVE cleanup path if one exists) so can_encrypt() and the Opus wrappers
cannot use stale E2EE state on a transport-only connection.
In `@discord_http/voice/gateway_udp.py`:
- Around line 89-97: The RTCP filtering is wrong because masking data[1] with
0x7F changes RTCP payload types 200–204 to 72–76 so they are not dropped; update
the check in gateway_udp.py to inspect the raw second byte (use payload_type =
data[1] instead of data[1] & 0x7F) before the 200–204 range check and keep the
existing drop behavior so receiver = self.connection.voice_client._receiver only
gets RTP packets; adjust any related logic using payload_type accordingly.
In `@discord_http/voice/player.py`:
- Around line 335-345: The read method in player.py currently returns a trailing
partial PCM frame on asyncio.IncompleteReadError, which can be treated as a full
frame downstream; change the handler in async def read(self) so it never emits
short frames: on IncompleteReadError always return b"" (EOF) rather than
bytes(exc.partial). Locate the read method and the FRAME_SIZE reference and
replace the partial-return behavior with a direct EOF return; ensure _spawn and
_stdout logic remains unchanged.
- Around line 617-637: is_playing() and is_paused() rely on _end and _resumed
but _end is never set when the playback task _task naturally completes, so
finished players still report as active; fix by ensuring the playback task sets
the _end event (and clears/sets _resumed appropriately) when it finishes or
errors: attach a done callback to the _task created where playback starts (refer
to the _task variable and the is_playing/is_paused methods) that calls
self._end.set() and ensures self._resumed.clear() (or the correct resumed state)
so state queries reflect a finished player.
- Around line 321-333: The FFmpeg option parsing in the constructor uses plain
str.split which breaks quoted arguments; replace uses of `.split()` for both
before_options and options with `shlex.split()` to perform shell-aware
tokenization (i.e., compute before_args = shlex.split(before_options) if
before_options is not None else None and extend args with shlex.split(options)
when options is not None) in the class's __init__ so the arguments passed to
super().__init__ are correctly tokenized for create_subprocess_exec.
In `@discord_http/voice/receiver.py`:
- Around line 67-77: The start method currently just assigns self.sink and can
be called multiple times, leaking the previous sink/decoder/sequence state;
before overwriting self.sink in start(self, sink: "AudioSink") call the
receiver's stop() or cleanup() routine (the same cleanup() used to tear down
decoders and sequence state) to stop the previous listening session and free
resources, then set self.sink and reinitialize any decoder/sequence state needed
for the new session (ensure any references to the old decoder or sequence are
cleared/reset so the new session starts fresh).
In `@discord_http/voice/socket.py`:
- Around line 391-393: The code increments self._out_seq but always writes zero
into the 2-byte sequence field when building the binary frame; update the frame
construction to pack the incremented/masked sequence (self._out_seq) into the
first two bytes (replacing the struct.pack(">H", 0) usage) so the DAVE sequence
is actually transmitted before sending via self.ws.send_bytes(frame), ensuring
sequence masking already applied ((self._out_seq) & 0xFFFF) is used.
In `@examples/voice_example.py`:
- Line 59: The example prints seconds but labels them as milliseconds; update
the two return/send_message calls that reference vc.latency and average_latency
to convert seconds->ms (multiply by 1000) before formatting—e.g., use
(vc.latency * 1000) and (average_latency * 1000) with the same .1f formatting
and "ms" suffix; locate the calls using symbols vc.latency and average_latency
in the example (the send_message/return lines) and change the displayed values
accordingly.
---
Outside diff comments:
In `@discord_http/client.py`:
- Around line 133-156: Validate the voice_reconnect_attempts parameter in the
constructor and reject negative values: in the method that sets
voice_reconnect_attempts (look for the constructor where the
voice_reconnect_attempts parameter is accepted and assigned to
self.voice_reconnect_attempts), add a check like if voice_reconnect_attempts <
0: raise ValueError("voice_reconnect_attempts must be non-negative") so callers
get a clear error instead of silently skipping the reconnect loop used in
discord_http/voice/connection.py.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 36ff5bd3-c233-4647-8a1b-dbd5212df41a
📒 Files selected for processing (26)
discord_http/channel.pydiscord_http/client.pydiscord_http/errors.pydiscord_http/gateway/cache.pydiscord_http/gateway/parser.pydiscord_http/gateway/shard.pydiscord_http/guild.pydiscord_http/utils.pydiscord_http/voice.pydiscord_http/voice/__init__.pydiscord_http/voice/client.pydiscord_http/voice/connection.pydiscord_http/voice/dave.pydiscord_http/voice/encryptor.pydiscord_http/voice/enums.pydiscord_http/voice/gateway_udp.pydiscord_http/voice/oggparse.pydiscord_http/voice/opus.pydiscord_http/voice/player.pydiscord_http/voice/receiver.pydiscord_http/voice/sinks.pydiscord_http/voice/socket.pyexamples/voice_example.pypyproject.tomltests/test_voice_encryptor.pytests/test_voice_oggparse.py
💤 Files with no reviewable changes (1)
- discord_http/voice.py
| self._out_seq = (self._out_seq + 1) & 0xFFFF | ||
| frame = struct.pack(">H", 0) + bytes([opcode & 0xFF]) + payload | ||
| await self.ws.send_bytes(frame) |
There was a problem hiding this comment.
Write the incremented DAVE sequence into the binary frame.
_out_seq is incremented here and then ignored, so every outbound binary frame is sent with sequence 0. That breaks the sequencing this socket tracks for binary voice messages and can desync DAVE/MLS exchanges after the first frame.
Suggested fix
self._out_seq = (self._out_seq + 1) & 0xFFFF
- frame = struct.pack(">H", 0) + bytes([opcode & 0xFF]) + payload
+ frame = struct.pack(">H", self._out_seq) + bytes([opcode & 0xFF]) + payload
await self.ws.send_bytes(frame)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@discord_http/voice/socket.py` around lines 391 - 393, The code increments
self._out_seq but always writes zero into the 2-byte sequence field when
building the binary frame; update the frame construction to pack the
incremented/masked sequence (self._out_seq) into the first two bytes (replacing
the struct.pack(">H", 0) usage) so the DAVE sequence is actually transmitted
before sending via self.ws.send_bytes(frame), ensuring sequence masking already
applied ((self._out_seq) & 0xFFFF) is used.
| after: | ||
| A callback invoked with any error once playback finishes. | ||
| """ | ||
| from .player import AudioPlayer, _resolve_source |
There was a problem hiding this comment.
I do question the usecase of import inside here a bit
There was a problem hiding this comment.
Did you want it at the top of the file?
|
Please test that it works before I review again 🙃 |
|
I'm on it |
Two reasons the bot thought the caller was not in voice: - ctx.author is only set for interactions tied to a message; for a slash command it is None. The invoking member is ctx.user. - Without Intents.guilds the bot never receives GUILD_CREATE, so no guild is cached: ctx.guild falls back to an empty stub and Cache.update_voice_state bails (get_guild returns None), so voice states are never stored. Add the guilds intent so the guild (and its voice states) actually get cached. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The voice handshake never completed: connect() always timed out waiting on the voice gateway. Debugging live against a real channel surfaced five distinct protocol bugs (each revealed by the next close code once the prior was fixed: timeout -> 4003 -> 4006 -> 4017 -> 4005 -> connected). socket.py: - _send_json: send JSON control frames as TEXT (send_str) instead of BINARY (send_bytes). The voice gateway reserves binary frames for DAVE/E2EE opcodes, so IDENTIFY sent as binary was silently ignored -> handshake timeout. - HELLO handling: send IDENTIFY/RESUME *before* starting the heartbeat loop (via a single ordered _handle_hello coroutine). Previously the heartbeat task was created first and op 3 raced ahead of op 0 -> 4003 Not authenticated. - _heartbeat_loop: sleep one interval before the first beat so nothing is sent between IDENTIFY and READY (mirrors discord.py's keep-alive). - send_binary: frame outbound DAVE messages as opcode(1B)+payload with NO 2-byte sequence prefix. Inbound frames carry a seq prefix but outbound must not; the extra leading 0x00 made Discord read the frame as opcode 0 (IDENTIFY) -> 4005 Already authenticated. Switch to ws.receive() loop so Discord's close reason is logged (invaluable for diagnosis). - add send_transition_ready (op 23) as a JSON frame, per the protocol. connection.py: - on_voice_server_update: preserve the endpoint port. Discord assigns voice servers on non-443 ports (e.g. :2053) and the session/token are bound to that host:port; stripping it connected us to a different instance -> 4006 Session is no longer valid. Only strip a scheme if present. dave.py: - _handle_proposals: decode the operation_type(1B) prefix and pass (operation_type, proposals) to davey.process_proposals (it requires both args); concatenate commit + welcome for the reply. - _handle_commit / _handle_welcome: strip the transition_id(2B) prefix before passing the MLS blob to davey, and ack non-zero transitions via the JSON TRANSITION_READY (op 23) rather than a binary frame. Verified end to end against a live channel that requires DAVE: the bot joins, plays audio, stays connected, and negotiates E2EE (can_encrypt() == True).
Fixed the voice handshake (and DAVE framing) —
|
Address review feedback on the long attribute chain `self.voice_client.client.voice_reconnect_attempts`. Introduce a `client` property on VoiceConnection and use it everywhere the owning bot client was reached via `self.voice_client.client`.
|
Does look much better at least, will test and review it myself when I get the time. |
|
There are a few bugs I was able to spot while doing a quick test
The reason this is very important is doe to a test code like this user_vc = ctx.guild.get_member_voice_state(ctx.user.id)
if not user_vc:
return ctx.response.send_message("Not in a VC", ephemeral=True)
async def call_after():
vc = self.bot._get_voice_client(ctx.guild.id)
if not vc:
vc = await user_vc.channel.connect()
vc.play("funny.mp3")
return await ctx.edit_original_response(content="Now playing...")
return ctx.response.defer(thinking=True, call_after=call_after)If you are able to find the user, you should definetly be able to connect to it
QoL things
|
|
There are some other things I noticed as well while reviewing:
|
What's the expected behaviour here? Should the bot re-connect when a member joins the VC? |
|
If the bot is playing and you leave, it should continue to play, if you join again, and it still plays, still play |
|
Shouldn't the "continue to play" be handled by the developer rather than the library? The bot stops playing because the DAVE connection is reset and needs to be re-established when only the bot is in the voice chat. |
warn them that when they go online (or debug) make it tell there "Yeah, cannot do anything" and perhaps we by default, make the bot leave? I mean if they have to re-establish the connection, just make the bot leave at that point instead, or make an option for it to automatically handle it |
… APIs
Address review feedback on voice support:
- Rejoin (DAVE WrongEpoch -> 4006): route DAVE control ops 21/22/24, which
arrive as JSON text frames rather than binary, to the DAVE manager so the
local MLS epoch stays in sync with the gateway. Add a configurable policy
for the 4006 "session invalid" close (most common when a channel empties
and Discord tears down the DAVE session): disconnect by default, or
reconnect when connect(reconnect_on_session_invalid=True) is passed.
- Shutdown reconnect: tear down active voice clients in GatewayClient.close()
before killing shards, so the voice sockets don't try to reconnect on the
1006 close during shutdown.
- ResourceWarning ("unclosed transport" / "I/O operation on closed pipe"):
close the ffmpeg subprocess transport and reap the process in
_FFmpegAudio.cleanup() so the Windows Proactor pipe transports are released
deterministically instead of by the GC.
- PartialVoiceState.channel: add a `channel` property to PartialVoiceState
(and override it on VoiceState) so user_vc.channel.connect() works and is
statically known for both types.
- Public get_voice_client: rename Client._get_voice_client to the public
Client.get_voice_client and update all callers and the example.
Follow-up: voice rejoin, shutdown, ResourceWarning + public APIsPushed 1. Rejoining a VC breaks (DAVE
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
discord_http/voice/dave.py (1)
374-376:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve the negotiated target version here.
_handle_prepare_transition()already records the next(transition_id, protocol_version), but these two blocks overwrite that tuple with(transition_id, self._version). WhenEXECUTE_TRANSITIONlater reads_pending_transition, it can re-apply the old version instead of the negotiated one, so mid-call DAVE transitions never switchcan_encrypt()to the new epoch.Also applies to: 399-401
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@discord_http/voice/dave.py` around lines 374 - 376, The code is overwriting the negotiated protocol version by setting self._pending_transition = (transition_id, self._version) in the send_transition_ready branches; instead, preserve the negotiated target version that _handle_prepare_transition recorded. Change those assignments so they keep the existing protocol_version (e.g., if self._pending_transition exists use its second element) or explicitly set the tuple to (transition_id, negotiated_protocol_version) instead of self._version; this ensures EXECUTE_TRANSITION reads the negotiated epoch (refer to _handle_prepare_transition, _pending_transition, and EXECUTE_TRANSITION).discord_http/client.py (1)
133-133:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winValidate that
voice_reconnect_attemptsis non-negative.The parameter lacks validation to ensure it's >= 0. Negative values could cause unexpected behavior in voice connection retry logic.
🛡️ Proposed fix to add validation
voice_reconnect_attempts: int = 5 ): + if voice_reconnect_attempts < 0: + raise ValueError("voice_reconnect_attempts must be non-negative") + if application_id is not None:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@discord_http/client.py` at line 133, Add a runtime validation for the voice_reconnect_attempts parameter to ensure it is non-negative: in the function or class initializer that accepts/sets voice_reconnect_attempts (the constructor where voice_reconnect_attempts: int = 5 is declared), check if voice_reconnect_attempts < 0 and raise a ValueError with a clear message like "voice_reconnect_attempts must be >= 0" (or alternatively clamp to 0 if preferred) before using or assigning the value so downstream voice retry logic cannot receive negative values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@discord_http/client.py`:
- Line 133: Add a runtime validation for the voice_reconnect_attempts parameter
to ensure it is non-negative: in the function or class initializer that
accepts/sets voice_reconnect_attempts (the constructor where
voice_reconnect_attempts: int = 5 is declared), check if
voice_reconnect_attempts < 0 and raise a ValueError with a clear message like
"voice_reconnect_attempts must be >= 0" (or alternatively clamp to 0 if
preferred) before using or assigning the value so downstream voice retry logic
cannot receive negative values.
In `@discord_http/voice/dave.py`:
- Around line 374-376: The code is overwriting the negotiated protocol version
by setting self._pending_transition = (transition_id, self._version) in the
send_transition_ready branches; instead, preserve the negotiated target version
that _handle_prepare_transition recorded. Change those assignments so they keep
the existing protocol_version (e.g., if self._pending_transition exists use its
second element) or explicitly set the tuple to (transition_id,
negotiated_protocol_version) instead of self._version; this ensures
EXECUTE_TRANSITION reads the negotiated epoch (refer to
_handle_prepare_transition, _pending_transition, and EXECUTE_TRANSITION).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 57efe455-592e-486a-8bee-6864bcd44034
📒 Files selected for processing (10)
discord_http/channel.pydiscord_http/client.pydiscord_http/gateway/client.pydiscord_http/gateway/shard.pydiscord_http/voice/client.pydiscord_http/voice/connection.pydiscord_http/voice/dave.pydiscord_http/voice/player.pydiscord_http/voice/socket.pyexamples/voice_example.py
🚧 Files skipped from review as they are similar to previous changes (7)
- discord_http/gateway/shard.py
- examples/voice_example.py
- discord_http/channel.py
- discord_http/voice/client.py
- discord_http/voice/player.py
- discord_http/voice/socket.py
- discord_http/voice/connection.py
…ransition version - Client: raise ValueError when voice_reconnect_attempts is negative, so the voice retry logic can't receive an invalid count. - DaveManager: when handling a commit/welcome that drives a transition, preserve the negotiated protocol version recorded by DAVE_PREPARE_TRANSITION instead of overwriting it with the current version. A new _pending_transition_version() helper keeps the pending version when the transition_id matches and otherwise falls back to the current version (e.g. when the commit arrives first), so EXECUTE_TRANSITION applies the negotiated epoch.
It seems to be quite more solid at least, handshake, reconned and encryption are implemented quite nice. Might want to look into those and then as well, perhaps resolve the ruff/pylint errors to now have github actions complain lmao. Also just make sure it runs and things before I review again yes yes. |
CodeRabbit findings: - channel.py: remove stale voice-client registry entry if connect() fails, so a failed/timed-out join no longer blocks all future joins. - channel.py: PartialVoiceState.fetch() now resolves channel from the fresh fetched payload's channel_id instead of the stale partial's id. - connection.py: exponential backoff now grows across reconnect attempts (connect() no longer resets backoff on internal reconnect calls). - connection.py: a successful RESUMED frame re-sets the connected event so is_connected() recovers after a server-crash resume. - connection.py: clear the stale MLS/DAVE session when a new SESSION_DESCRIPTION negotiates dave_protocol_version 0. - gateway_udp.py: detect RTCP control packets on the unmasked second byte so types 200-204 are actually dropped. - player.py: use shlex.split() for ffmpeg before_options/options to preserve shell quoting. - player.py: FFmpegPCMAudio.read() discards any trailing short PCM frame and signals EOF instead of forwarding a partial frame into libopus. - player.py: AudioPlayer marks itself finished on natural EOF so is_playing() reports correctly. - receiver.py: stop the previous listening session before swapping sinks. - examples/voice_example.py: convert latency (seconds) to ms before labeling. Maintainer feedback: - connection.py: parse the voice server endpoint via utils.URL while preserving the schemeless host:port semantics required to avoid 4006.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
discord_http/channel.py (1)
2890-2900:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve a guild object when the cache misses.
fetch()now uses the freshchannel_id, butguildstill comes only fromcache.get_guild(). When cache is cold or disabled, the returnedVoiceStatehasguild_idset butguild=None, andVoiceState._from_data()then skipsmemberhydration even if the REST payload includes it. Fall back to aPartialGuildhere, like the other fetch paths in this module do.Suggested fix
guild = self._state.cache.get_guild(self.guild_id) + if guild is None: + from .guild import PartialGuild + guild = PartialGuild(state=self._state, id=self.guild_id) + channel = None channel_id = utils.get_int(r.response, "channel_id") if channel_id is not None: channel = self._state.cache.get_channel(self.guild_id, channel_id)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@discord_http/channel.py` around lines 2890 - 2900, The guild lookup when building the VoiceState uses cache.get_guild(self.guild_id) and can return None, causing VoiceState to miss member hydration; change the construction to fall back to a PartialGuild when get_guild returns None (create PartialGuild(self._state, id=self.guild_id) or the module's equivalent) so VoiceState(state=self._state, data=r.response, guild=partial_guild, channel=channel) always has a guild-like object; update the code path that sets guild (the block using self._state.cache.get_guild and VoiceState(...)) to perform this fallback so fetch() and VoiceState._from_data() can hydrate members from the REST payload.discord_http/voice/connection.py (1)
439-452:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail fast when Discord doesn't offer a supported encryption mode.
Line 441 silently falls back to
SUPPORTED_MODES[0]even when none of Discord's advertisedmodesmatch. That makessend_select_protocol()announce an unsupported mode and turns a clear incompatibility into a later close/timeout during the handshake.Proposed fix
- self.mode = next((m for m in SUPPORTED_MODES if m in modes), SUPPORTED_MODES[0]) + self.mode = next((m for m in SUPPORTED_MODES if m in modes), None) + if self.mode is None: + raise RuntimeError( + f"Discord did not offer a supported voice encryption mode: {modes!r}" + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@discord_http/voice/connection.py` around lines 439 - 452, The code silently falls back to SUPPORTED_MODES[0] when none of Discord's advertised modes match, causing send_select_protocol to announce an unsupported mode; change the logic in the block that sets self.mode so it validates that at least one mode in data.get("modes", []) intersects SUPPORTED_MODES and, if none match, raise an explicit error (or return/close the connection) instead of defaulting, then only proceed to call create_udp/discover_ip/_ready_event and socket.send_select_protocol when a supported mode has been chosen (refer to the variables/methods: modes, SUPPORTED_MODES, self.mode, create_udp, discover_ip, self._ready_event, and send_select_protocol).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@discord_http/channel.py`:
- Around line 549-559: The except Exception block only catches Exception, so
asyncio.CancelledError (a BaseException subclass on Py3.11+) can skip cleanup
and leave _voice_clients[self.guild_id] populated; update the
PartialChannel.connect flow to also handle cancellation by catching
asyncio.CancelledError or BaseException around the vc.connect call and calling
client._remove_voice_client(self.guild_id) before re-raising; ensure vc.connect,
client._remove_voice_client, and the surrounding connect logic are referenced so
the cleanup runs for cancellations as well as regular exceptions.
In `@discord_http/voice/connection.py`:
- Around line 410-413: The loop variable named `_scheme` triggers RUF052; rename
it to a normal local variable (e.g., `scheme`) in the for loop that iterates
over ("wss://", "https://") and update its uses in the loop body where
`host_port.startswith(_scheme)` and `host_port[len(_scheme):]` are referenced
(the code that manipulates `host_port` in connection logic inside
discord_http/voice/connection.py). Ensure only the loop variable name is changed
and that `host_port` slicing and startswith checks now use `scheme`.
---
Outside diff comments:
In `@discord_http/channel.py`:
- Around line 2890-2900: The guild lookup when building the VoiceState uses
cache.get_guild(self.guild_id) and can return None, causing VoiceState to miss
member hydration; change the construction to fall back to a PartialGuild when
get_guild returns None (create PartialGuild(self._state, id=self.guild_id) or
the module's equivalent) so VoiceState(state=self._state, data=r.response,
guild=partial_guild, channel=channel) always has a guild-like object; update the
code path that sets guild (the block using self._state.cache.get_guild and
VoiceState(...)) to perform this fallback so fetch() and VoiceState._from_data()
can hydrate members from the REST payload.
In `@discord_http/voice/connection.py`:
- Around line 439-452: The code silently falls back to SUPPORTED_MODES[0] when
none of Discord's advertised modes match, causing send_select_protocol to
announce an unsupported mode; change the logic in the block that sets self.mode
so it validates that at least one mode in data.get("modes", []) intersects
SUPPORTED_MODES and, if none match, raise an explicit error (or return/close the
connection) instead of defaulting, then only proceed to call
create_udp/discover_ip/_ready_event and socket.send_select_protocol when a
supported mode has been chosen (refer to the variables/methods: modes,
SUPPORTED_MODES, self.mode, create_udp, discover_ip, self._ready_event, and
send_select_protocol).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fcca89ab-034f-4855-b90e-55ce10029035
📒 Files selected for processing (8)
discord_http/channel.pydiscord_http/client.pydiscord_http/voice/connection.pydiscord_http/voice/dave.pydiscord_http/voice/gateway_udp.pydiscord_http/voice/player.pydiscord_http/voice/receiver.pyexamples/voice_example.py
🚧 Files skipped from review as they are similar to previous changes (6)
- discord_http/client.py
- examples/voice_example.py
- discord_http/voice/gateway_udp.py
- discord_http/voice/receiver.py
- discord_http/voice/dave.py
- discord_http/voice/player.py
…dings
This addresses a second round of review feedback plus a real reconnect bug
observed in the field, across the voice stack.
== reconnect_on_session_invalid (voice/connection.py) ==
Symptom: a human leaving+rejoining the bot's channel rebuilds the DAVE/MLS
group and Discord closes the voice websocket with 4006 ("session no longer
valid"). The old reconnect re-issued op4 change_voice_state with the SAME
channel_id, which is a no-op for the voice server (the bot never left at the
gateway level), so no fresh VOICE_SERVER_UPDATE ever arrived and every attempt
timed out on _server_event.wait(). A subsequent play command timed out too.
Fix is a hybrid that mirrors discord.py's verified patterns:
- _soft_reconnect(): on 4006, first try a no-leave recovery -- close the old
socket/UDP locally (WITHOUT sending op4), open a fresh VoiceSocket and
re-IDENTIFY (op 0) with the still-valid session_id/token/endpoint. This is
discord.py's _potential_reconnect approach and avoids any user-visible
leave/rejoin blip when the credentials are still usable.
- _full_reconnect(force_refresh=...): only if the soft attempt fails do we fall
back to the leave/rejoin bounce. _force_voice_refresh() sends
change_voice_state(channel_id=None), waits for the gateway leave ack
(_left_event) so the leave+rejoin is not coalesced into a no-op, then connect()
rejoins -- forcing Discord to allocate a fresh voice server. The bounce is
gated behind force_refresh so unrelated reconnect paths (resume fallback,
generic codes) do NOT cause a spurious leave/rejoin.
- connect() now joins using the stable voice_client.channel.id instead of the
mutable self.channel_id, because the leave echo (channel_id=None) clobbers
self.channel_id via on_voice_state_update and would otherwise rejoin nothing.
- _left_event is set/cleared in on_voice_state_update to signal the leave ack.
Why hybrid rather than always-leave: discord.py treats 4006 as an unhandled
code and always leaves+rejoins, but its 4014 path (_potential_reconnect) proves
a no-leave re-IDENTIFY works when creds survive. Soft-first gives a no-blip
recovery in the common case while keeping the robust leave/rejoin fallback. One
soft attempt per 4006 event prevents a soft->4006->soft loop.
== Review findings ==
channel.py (PartialChannel.connect): catch BaseException, not just Exception,
around vc.connect(). asyncio.CancelledError is a BaseException on 3.11+, so a
cancelled/timed-out connect previously skipped cleanup and left a stale
_voice_clients entry that blocked all future connects in the guild.
voice/client.py: removed the VoiceClient.client/.bot duplication, keeping `bot`
to match the library-wide convention (self.bot is used 243x vs self.client 10x,
all in voice; no external/test refs to a voice .client). Updated internal refs
and the VoiceConnection.client property to read self.voice_client.bot.
voice/player.py:
- FFmpegOpusAudio._packets switched from list+pop(0) (O(n)) to collections.deque
+ popleft() for the FIFO packet queue.
- AudioPlayer._run now logs a warning and re-raises if _cleanup() is interrupted
by CancelledError, so the silently-skipped trailing silence/speaking-off is at
least observable. Did not use asyncio.shield (risks hanging shutdown).
voice/gateway_udp.py: IP-discovery detection strengthened from a single
data[1]==0x02 check to data[0]==0x00 and data[1]==0x02 and len(data)>=74 (the
fixed discovery response shape), so a coincidental RTP packet can't resolve the
discovery future.
voice/socket.py:
- _schedule now retains strong references to dispatched tasks in
self._dispatch_tasks (with a done-callback to discard), instead of relying on
a fire-and-forget create_task that could be GC'd mid-flight; dropped the noqa.
- Added VoiceCloseCode.dave_e2ee_required = 4017 for completeness/discoverability
(referenced in a _receive_loop comment).
voice/sinks.py: documented that WaveSink mixes all speakers into one stream and
ignores the `user` arg (behavior unchanged) to prevent surprise.
voice/connection.py (also): added a clarifying note in _full_reconnect that the
backoff is reset once before the loop and grows across attempts; renamed a loop
variable _scheme -> scheme (RUF052); reworked a DAVE docstring summary (D205).
== Validation ==
ruff check --config pyproject.toml (ruff 0.15.15, matching uv.lock) -> clean
pyright on all touched files -> 0 errors/warnings (the pre-existing dave.py:141
typing error on the baseline is unrelated and untouched)
pytest -k "voice or channel" -> 11 passed
Routine, automatically-recovered voice events were logging at INFO/WARN and spamming the console during normal operation (e.g. a member rejoining triggers a 4006 + MLS rebuild that the library transparently recovers from). Demote those to DEBUG; keep INFO/WARN/ERROR only for conditions that are NOT auto-recovered and likely need operator attention. Demoted to DEBUG (transient / self-healing): - connection.py: disconnect+teardown (4014/4022), server-crash resume (4015), 4006 disconnect (when reconnect not opted in), soft-reconnect attempt/success/ fallback, resume-failed fallback, per-attempt "Reconnecting..." line, failed reconnect attempt (will retry), and reconnect success. - socket.py: websocket ERROR frame (the receive loop breaks and the connection auto-reconnects). - dave.py: "Failed to process MLS proposals/commit/welcome" -- each immediately calls _recover_from_invalid_commit() (notify gateway + reinit), so they are fully handled. Kept as-is (genuine, non-auto-recovered failures): - connection.py: rate-limited give-up (WARN), reconnect exhausted after N attempts + teardown (ERROR), DAVE op received without the davey library (WARN). - socket.py: unexpected exception in a dispatch handler (ERROR). - dave.py: DAVE session init failure -- degrades to no session (WARN). - player.py: cleanup interrupted by cancellation -- re-raised, not swallowed (WARN). No behavior change; logging levels only.
|
@AlexFlipnote review |


Summary
Adds voice media support to discord.http — previously the data-plane models (
VoiceState/PartialVoiceState) and theVOICE_STATE_UPDATEparser existed, but there was no op-4 sender, no voice WebSocket, no UDP/RTP/encryption/Opus/player, and noconnect()entry point (the author had deliberately deferred it atgateway/parser.py:1339).This PR implements joining/leaving/moving voice channels, sending audio, receiving audio, and DAVE/MLS end-to-end encryption, staying faithful to discord.http's design: pure-asyncio (no threads), stdlib-first, minimal dependencies.
What changed
Phase 0 — refactor
VoiceState/PartialVoiceStateintochannel.pyand deletedvoice.py, freeing thevoice/name for the new media subpackage. Fixed all importers (gateway/parser.py,gateway/cache.py,guild.py,client.py,__init__.py).Phase 1 — main-gateway plumbing
Shard.change_voice_state(...)(op 4), mirroringchange_presence.VOICE_SERVER_UPDATEparser placeholder + routing of voice state/server updates to the owning voice client (bot-only, scheduled as tasks so the receive loop never blocks).Client(_voice_clients,_get/_add/_remove_voice_client, publicvoice_clients).PartialChannel.connect()(type-guarded — allows unknown/voice/stage-voice).voice/subpackage (~4,000 LOC, 14 modules)aead_aes256_gcm_rtpsize(AES-GCM).ctypeslibopus binding (optional runtime), Ogg/Opus parser.AudioSource/AudioPlayer, FFmpeg Opus passthrough + PCM sources,PCMVolumeTransformer, auto-resolvingplay().AudioSinks (WaveSink,CallbackSink).daveypackage, import-guarded.Phase 7 — robustness
ExponentialBackoffhelper; voice-close-code handling (4014/4022 leave, 4015 resume, 4021 stop, else backoff); opt-inClient(resume_voice=...)persistence; default insta-leave on shard reset; latency surface.Phase 8 — exports / packaging / example
voice/__init__.py+ top-level re-export),examples/voice_example.py, andpyproject.tomlupdates: added thediscord_http.voicepackage, thevoice = ["davey>=0.1.0"]optional extra, and the voice package to the ruff/pyright include lists.Why it was implemented this way
cryptography, whoseAESGCMimplements Discord's first-classaead_aes256_gcm_rtpsizemode — so we avoid PyNaCl entirely (discord.py's voice dep). Opus is a stdlibctypesbinding to a user-installed libopus (no bundled binaries, no pip dep); FFmpeg is an external executable, not a Python dep.daveyis the one optional Python dep, gated behinddiscord.http[voice]. DAVE/MLS cannot be implemented without an MLS library, but default installs add nothing and still work with transport encryption. A clear, actionable error is raised only if Discord forces a DAVE version anddaveyis absent.AudioSource.read()isasync; FFmpeg runs viaasyncio.create_subprocess_exec; the player paces with drift-correctedasyncio.sleep; UDP uses aDatagramProtocol; the heartbeat is an asyncio task.PCMVolumeTransformeruses stdlibarraysinceaudioopis gone in 3.13.connect()lives onPartialChannel(type-guarded, deny only when the type is known and not voice-capable) soVoiceChannel/StageChannelinherit it.resume_voicepersistence (intent-gated) for revival on shard READY/RESUMED.Verification
ruff checkandpyrightare clean across the package (0 errors).daveyall absent — graceful degradation confirmed.Notes
PLW0717) frompyproject.toml— it is not a real rule in current ruff and was aborting every lint run.VOICE_DESIGN_DOC.md(untracked planning doc) was updated to mark the receive path and DAVE/MLS as in-scope, but is intentionally not included in this PR since it isn't tracked upstream.🤖 Generated with Claude Code
Summary by CodeRabbit