Replies: 14 comments 7 replies
-
ObservationsMPEG-TS is good format for our use case. Allows streaming and processing without downloading entire segment. Our workflow does not allow for packet loss in the transport layer. If we would switch to UDP-based protocol what are the benefits? Payments rely on segment boundaries. We don't need to get rid of segment boundaries to acomplish streaming workflow. Currently payment info is packed in HTTP header field. grpc is not suited for large file transfer so HTTP is used for segment transfer. Proposed protocolWebsocket We would get statefull, full duplex communication. We would get ordered, binary messages and text messages, maybe JSON encoded. Good option if browser needs to send files to our network. Proposed custom protocolWire Format: Type ( Length ( Interface: type DecodeResult struct {
bytes_consumed int
}
type AtomSize struct {
OwnSize int32
ChildrenSize int32
}
type Atom interface {
MyTag() uint16
Decode(from []byte) DecodeResult
EncodedSize() int
Encode(to []byte)
}
Backward compatibility: Because New fields are placed last so old version won't read them, effectively just skipping those bytes. Unknown child atoms would be skipped. Because Streaming: No streaming planned. Moving streaming responsibility into next abstraction layer that uses this wire protocol. Our existing non-streaming code already suffers from HTTP protocol that is handling framing on file boundary and requires Instead of streaming interface we can use convention: Rename your large Message priority: Can be added if required on sender side. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the work @AlexKordic @cyberj0g. Good spec! I agree with most ideas! Here are some additional aspects that I think we need to cover. B<>O FailoverCurrently, when O stops responding, B retries the given segment with another O. What should we do when there are no segments? How to switch to another O and what should be retried? And should we retry at all? O Zero-Downtime DeploymentOne of the requested features by orchestrator operators is restarting an orchestrator without losing streams. I documented some approaches we can take in Orchestrator Zero-Downtime Deployment Feature Spec. We need to think about how to address this requirement in the context of low latency transcoding. Lossless vs Lossy TranscodingAFAIU we want to completely replace our current segment-based transcoding with low latency transcoding. Then, I think we should cover how to approach two separate streaming use cases:
What we currently have is lossless (TCP-based, segment retries). If we want to focus on low latency, then we may need to think about lossy transcoding (UDP-based, no retries). |
Beta Was this translation helpful? Give feedback.
-
It has big overhead (It was designed for very different use case/transport layer, and usage of it over internet just wastes precious bandwidth. @Thulinma can tell more about it 😄 ). Also,
We're not talking about allowing packet loss. here good explanation why QUIC is built over UDP. Summary:
That's exactly features we're looking for.
We will not get rid of boundaries, but we need to find a way to signal boundaries inside stream and embed payment inside stream. |
Beta Was this translation helpful? Give feedback.
-
There will be no segments like we have now - separate HTTP requests, but stream will be divided in logical segments anyway (need for payment anyway), so B will keep latest segment in memory so it can send it to another O if needed.
Even now it can't be done without help of O node itself. And if O node will be doing it - it can be done same way as in things like Nginx - by handing down connections from old instance to new one.
I believe we haven't (yet) discussed that use case. |
Beta Was this translation helpful? Give feedback.
-
We could refactor each component of the system to work in low-latency constraints. Here is current data flow: Broadcaster receives entire input media file before sending it to Orchestrator. So the latency would look like: To convert to low-latency logic we could:
This is for broadcaster ingest low-latency change. Other components should do similar transformation without changing the transport protocol. |
Beta Was this translation helpful? Give feedback.
-
Some initial questions, now that I've read through the comments we've had so far:
|
Beta Was this translation helpful? Give feedback.
-
From some quick unscientific testing so that we have some numbers to talk around: Latency from source -> screen
|
Beta Was this translation helpful? Give feedback.
-
Note: movflags: faststart requires second pass on entire output MP4 file, preventing streaming of output. |
Beta Was this translation helpful? Give feedback.
-
I can only comment on latency from low-level video encoding/decoding POV since this is where I have solid experience. From video codec POV, main factor that affects decode latency is what kind/strategy of motion compensation is used in the stream in question. On the other end of spectrum, streams designed for low-latency streaming will usually have all reference frames in the past, no B frames, and much shorter latency. In extreme cases (such as with real-time communication as with Zoom or similar solutions) stream can be limited to the single reference frame, and that reference frame may be just a previous one (all the frames in the stream apart from periodic IDR frame are P frames), or stream may be even without inter-coding at all (just I frames). In such case, decoder-incurred latency will be around one frame. On the encoder side, we have full control over the settings, so again we can move between the extremes of having very well compressed, bidirectional stream with B frames (and huge encode window, perhaps even two-pass encoding) and generating low latency stream with single reference frame in the past. The solution that I worked on while at Airtame had less than 1/10s of latency between capture on the local machine and display on the remote one, and that was including network transfer time (albeit on local WiFi network). I believe one can do even better than this. I think that ideally we'd want to know what the customer wants (or at the very least, detect type of the stream being sent and match encoder settings to decoder settings) and move on better latency <-> better bandwidth axis. |
Beta Was this translation helpful? Give feedback.
-
Manual testing of latencyOn same machine, started
|
Beta Was this translation helpful? Give feedback.
-
Here is a template for discussing protocol-related questions we need to decide on - or document decisions, if there's already a consensus. Feel free to comment. Questions1. Mist <> Broadcaster protocolDetailsWebSocketPros:
Cons:
gRPCIPC: Shared Memory, pipes, etc.Pros:
Cons:
Embedding Mist (or go-livepeer) as a shared libraryPros: Cons:
2. Broadcaster - Orchestrator - Transcoder protocolDetailsWebSocketPros:
Cons:
gRPCPros:
Cons:
QUICPros:
Cons:
3. Do we need same protocol Mist <> B and B-O-T? |
Beta Was this translation helpful? Give feedback.
-
Summary from our meetingIts beneficial to use same protocol between all nodes. We want to use The downside is: adding dependency to third-party Performance difference between We will make incremental changes to meet the deadline and leave some optimizations for later. Priority is streaming workflow. Optimization to connect Mist and |
Beta Was this translation helpful? Give feedback.
-
Update: an implementation option with direct streaming between Mist and Transcoders is currently being explored. Proof of concept: #2410 |
Beta Was this translation helpful? Give feedback.
-
Direct Mist to Transcoder streaming specificationAboutThe idea is to stream the video through the backend in a more direct way, between Mist and T, so that it won't need to pass through B and O nodes. This seem to be the simplest way to achieve low latency, and makes sense from architectural standpoint, but may have pitfalls. Sequence diagrams below describe current transcoding process, and what changes will need to be made to implement direct streaming. Current implementationInteractions in red will need to be revised for direct streaming. Direct streamingInteractions in green are new logic. Specific updatesMist
go-livepeerOn a higher level, changes to B-O-T are minor and boil down to following:
SecuritySegment signingSegments won't be signed directly anymore. Instead, a 'transcoding ticket' containing secure URLs will be signed (no changes to the code, just no video bytes in the metadata objects passed through gRPC). Potentially, ETH private keys of MB and T can be used for TLS. VerificationVerification remains intact, except segments and renditions are downloaded from different URLs. Failure modesThe transcoding request from Mist to B will stay sync, therefore, all fail-overs, which are already built in, will work as expected:
The Mist should stop serving the segment in case any error reported by B. Unadressed questions, potential concerns and improvements (in order of priority)
ConclusionWith this proposal, we may achieve a very low latency fairly easily and make an overall architecture simpler, even delete a large amount of code from go-livepeer. Still, there are some concerns regarding a) protocol-level security b) locking ourselves to less flexible video streaming protocols and excluding B and O nodes from direct video delivery. By using non-application level protocols M<>T, we may be shifting future work to C/C++ parts of the pipeline, which may be less desirable, as go-livepeer generally allows to develop on a higher level, because of language, blockchain friendliness, and semantics directly related to the service we are creating. On the other hand, it's highly likely (from what is known about how major streaming services are functioning, even when taking Mist as an example) that we'll eventually arrive to direct streaming architecture, when scaling the service up, even if current low latency milestone would be implemented in go-livepeer B-O-T protocol, as was proposed originally. |
Beta Was this translation helpful? Give feedback.
-
Abstract
'Low latency' means sending transcoded packets back T->O->B as they are ready, without accumulating them into segments.
Motivation
It's a common requirement for live streams to have a low latency, comparable to ping time between producer and consumers. It would be essential, if Livepeer to serve as a backend for Zoom-like service.
Current implementation
Points that will need to be addressed in low latency implementation
Beta Was this translation helpful? Give feedback.
All reactions