feat: add live streaming mode — direct top-quality-layer request and video-frame-cache fast-start#4578
feat: add live streaming mode — direct top-quality-layer request and video-frame-cache fast-start#4578cloudwebrtc wants to merge 11 commits into
Conversation
A subscriber that requests the top spatial layer would briefly decode a lower layer before settling on the requested one (e.g. layer 0 -> layer 2), a visible low->high quality ramp. Two distinct causes: 1. Simulcast.Select latched opportunistically onto the first key frame of any layer <= target, so a lower layer's key frame (which usually arrives first) was selected before the requested layer's. 2. When the subscriber joined before the publisher started, the layers are detected gradually and `maxSeen` climbs 0->1->2. AllocateOptimal caps the target at `min(maxSeen, requested)`, so the target itself ramped up and Select followed it. The new behavior is gated behind a `LiveStreamingMode` Room config option (default false -> original opportunistic behavior unchanged): - Select latches directly onto the target layer during initial acquisition. - Forwarder gains an initial-acquisition grace: while not yet streaming and the requested layer has not been seen, the target/key-frame-request aim straight at the requested layer instead of the highest seen so far. Gated on `maxSeen < requested` so steady-state behavior (incl. overshoot) is unchanged. - If the requested layer never shows up within the grace, the key frame requester triggers a re-allocation so the target falls back to the highest layer actually seen, avoiding a stall/black screen. - On bind, a live-streaming subscriber requests the highest layer up front (instead of the adaptive-stream LOW start) so it is acquired directly. LiveStreamingMode is threaded from config.RoomConfig through the participant (GetLiveStreamingMode) and SubscribedTrack/DownTrack params into the Forwarder and Simulcast layer selector. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Foundation for bootstrapping a newly added down track from a cached GOP instead of requesting a fresh key frame (PLI) from the publisher, which spikes the publisher uplink and re-sends the key frame to all subscribers. This phase only builds and maintains the cache (no change to forwarding behavior yet); the replay/handoff path is a follow-up. - GOPCache retains, per spatial layer, the RTP packets from the most recent key frame onward (deep-copied, since payload buffers are pooled). Bounded by packet count and bytes; a GOP exceeding the bound, or any packet carrying a dependency descriptor (SVC, out of scope for now), invalidates the layer so callers fall back to a PLI. - ReceiverBase maintains the cache in forwardRTP when enabled; gated behind the LiveStreamingMode Room config (off by default -> zero overhead). - LiveStreamingMode is threaded to the publisher-side receiver via a new WithLiveStreamingMode receiver option (mirrors the subscriber-side wiring). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Builds on the phase 1 GOP cache: in live streaming mode a newly added down track is bootstrapped by replaying the publisher's cached key-frame-anchored GOP instead of requesting a fresh key frame via PLI, which spikes the publisher uplink and re-sends the key frame to every subscriber. - TrackReceiver gains GetCachedGOP(layer); ReceiverBase serves it from the GOP cache, DummyReceiver delegates, RED/wrapped receivers inherit it. - DownTrack.WriteRTP becomes a thin priming wrapper: while priming, live packets are queued (deep-copied) instead of forwarded; otherwise the fast path is a single atomic load. - bootstrapFromGOP replays the cached GOP (each packet anchored to "now" so the resume timestamp math behaves like a normal resume / keeps A-V sync), then drains the queued live packets (deduped against the replayed range) and clears priming so subsequent packets forward directly. On no usable cache or a catch-up overflow it resyncs and falls back to the PLI path. The forwarder lock serializes munging, so only the replay path advances munger state while priming. - keyFrameRequester attempts the GOP bootstrap once before sending a PLI. Also log every PLI sent to the publisher (SendPLI) to make it easy to confirm PLIs are avoided under live streaming mode. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add debug logging to make it easy to correlate publisher key frames with subscriber join/leave and the GOP-cache fast-start path: - GOPCache logs when it stores a new key frame (starts a new GOP) and when a previous key frame is overwritten (a logger is threaded into NewGOPCache). - DownTrack logs when it replays the cached GOP to bootstrap a new subscriber and when the replay has caught up to live. - sendSubscribedQualityUpdate logs the layer-set change pushed to the publisher, which is what makes the publisher's encoder emit a fresh key frame on a subscriber join/leave (as opposed to an SFU PLI). All are Debugw; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Bound the GOP cache solely by duration instead of packet count / bytes: a GOP whose span since its key frame exceeds the limit is invalidated (the subscriber falls back to a PLI) rather than trimmed, since replay must start at the key frame. Removes the packet/byte caps. - Make the limit a room config option `gop_cache_max_duration` (defaults to 2s when unset), threaded to the receiver alongside live_streaming_mode. - Track GOP-cache bootstrap hit/miss per publisher track and log the running totals, so the cache hit rate over a join burst can be read directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| // GOP to bootstrap without a PLI), incoming live packets are queued instead of forwarded so the | ||
| // cached GOP can be sent first; see bootstrapFromGOP. | ||
| func (d *DownTrack) WriteRTP(extPkt *buffer.ExtPacket, layer int32) int32 { | ||
| if d.priming.Load() { |
There was a problem hiding this comment.
Can this be moved to connected? Would be good to add another check in packet path. There are already quite a few (all because of my crappy coding). So, wanted to check if it is possible to move this to when bindAndConnected happens.
There was a problem hiding this comment.
Do you mean to replace d.priming with d.connected as the condition here?
There was a problem hiding this comment.
No @cloudwebrtc . I was wondering if the flush can happen in
Line 2542 in 08ab361
WriteRTP is the packet forward path. Like I said, I already added too many things in there. One check should be fine. But, was just wondering if the flush can be event driven on connect and not have to check in packet path. There are enough complications that I think it will be needed in the packet path. But, I don't fully understand the whole flow yet and wanted to check if it can happen when connected happens.
There was a problem hiding this comment.
I think we can leave it in the packet path in this PR and re-visit later if there is a different place where we can add it.
| primeQueue []primedPacket | ||
| primeOverflow bool | ||
| gopBootstrapAttempted atomic.Bool | ||
|
|
There was a problem hiding this comment.
I will have to read this file more carefully. I am not able to read the diffs well. But, a couple of points to note
- there is a case of a dummy start (happens for clients like Go SDK which uses pion and needs an RTP packet to fire OnTrack (not true in latest pion, but we still do the dummy packet start). Will that interact properly with this?
- there is handling for upstream codec change, would this need some handling to prevent any races here?
Rename the key-frame-anchored cache and all related identifiers, the Room config item (gop_cache_max_duration -> video_frame_cache_max_duration), log strings, and the source files (gopcache.go -> videoframecache.go). Behavior is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
VP8/AV1 simulcast can carry a dependency-descriptor extension yet is still selected on the subscriber side by the simulcast layer selector, so the old "DD present -> invalidate" rule wrongly forced those feeds to PLI fallback. Gate cache creation on the receiver's video layer mode instead: only simulcast / single-layer-per-stream feeds are cached; true SVC (MULTIPLE_SPATIAL_LAYERS_PER_STREAM) gets no cache and falls back to PLI as before. Drop the per-packet DD invalidation accordingly. Since DD-carrying simulcast packets are now cached, nil out the pooled ExtDependencyDescriptor pointer in copyExtPacket to avoid retaining recycled memory; the simulcast forward path neither reads it nor emits it on egress. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rename the gopPkt test helper to vfcPkt and replace the remaining GOP references in comments with VFC for naming consistency. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Live streaming mode (direct top-layer acquisition + video-frame-cache fast start) is enabled only for VP8/H264/H265 - codecs that always carry one spatial layer per RTP stream. VP9/AV1 are excluded for now: they may be SVC (multiple spatial layers per stream), where per-layer key-frame replay and the dependency-descriptor selector do not fit. The codec is judged once on the publisher side and the resulting flag flows through the subscriber pipeline: - MediaTrackSubscriptions.AddSubscriber narrows the per-subscription LiveStreamingMode using the negotiated upstream codec, so the down track, forwarder, and initial-quality logic just consume the flag - no codec checks downstream. - ReceiverBase.EnableVideoFrameCache gates the cache on the receiver codec. Reading the upstream codec at subscription time (available even when the subscriber joined first) sets the forwarder's flag correctly at construction, so the acquisition grace is armed without depending on DetermineCodec timing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| for _, p := range frames { | ||
| rp := *p | ||
| rp.Arrival = now | ||
| d.writeRTP(&rp, layer) |
There was a problem hiding this comment.
Both GOPcache and primeQueue send rtp packet as fast as possible, which could overwhelm the buffer/bandwidth in the network path to cause packet loss, also the upstream buffer cache doesn't synchronize with the GOP cache would not handle the NACK request from the subscriber.
It would be better to reuse the upstream buffer to handle the GOP to eliminate the NACK missing and also reduce memory usage.
There was a problem hiding this comment.
Hi @cnderrauber, Could you please point out the location of the upstream buffer? Perhaps we need to adjust the length of the upstream buffer? In live streaming mode, we need at least 1-2 seconds of buffering starting from the most recent key frame.
Live streaming mode
Adds an opt-in
room.live_streaming_mode(default off) for fast, full-quality startup of late-joining subscribers.What it does
0 → 1 → 2, removing the visible low→high quality ramp.Notes
room.gop_cache_max_duration, default 2s), simulcast / single-layer only; falls back to a PLI when no usable GOP is available.live_streaming_mode=falseall paths revert to the original behavior.Verification
Unit tests + race check pass. Needs real-client A/V-sync verification (replay anchors packet arrival to "now").