Low latency transcoding #2300

cyberj0g · 2022-02-18T09:39:36Z

cyberj0g
Feb 18, 2022

Abstract

'Low latency' means sending transcoded packets back T->O->B as they are ready, without accumulating them into segments.

Motivation

It's a common requirement for live streams to have a low latency, comparable to ping time between producer and consumers. It would be essential, if Livepeer to serve as a backend for Zoom-like service.

Current implementation

Mist server buffers whole segment before sending it to B
B-O-T transcode on segment level
B-O discovery happens once first segment is received
Metadata for capability discovery is read by B once first segment is received

Points that will need to be addressed in low latency implementation

Payment tickets are per-segment, and with low latency there won't be segments, we need a way to generate and send payment tickets as stream goes. It may be problematic to validate payment ticket hash without segment boundaries. 'Control' protocol needs a way to signal virtual segment boundaries.
Running B-O discovery once stream already started introduces additional latency. Possible solution is to pre-discover without input stream.
As we need to send both 'control' messages and stream data both ways at same time, it may make sense to replace gRPC and HTTP with protocol which natively support multiplexing and online two-way communication, like HTTP/3 (QUIC). There's stable Go implementation of it.
Video metadata (codec, resolution, pixel format) needs to be received from Mist, which already have it from RTMP stream. It should be passed either as part of the protocol, or through some form of IPC from Mist internal data structures.
Right now, stream recording done on the broadcaster (it saves each segment to S3 storage). Before implementing new streaming solution we need to move recording process into Mist itself.

AlexKordic · 2022-02-22T15:14:51Z

AlexKordic
Feb 22, 2022

Observations

MPEG-TS is good format for our use case. Allows streaming and processing without downloading entire segment.

Our workflow does not allow for packet loss in the transport layer. If we would switch to UDP-based protocol what are the benefits?

Payments rely on segment boundaries. We don't need to get rid of segment boundaries to acomplish streaming workflow.

Currently payment info is packed in HTTP header field.

grpc is not suited for large file transfer so HTTP is used for segment transfer.

Proposed protocol

Websocket

We would get statefull, full duplex communication.

We would get ordered, binary messages and text messages, maybe JSON encoded.

Good option if browser needs to send files to our network.

Proposed custom protocol

BSD socket TLV over TCP

Wire Format:

Type (T in the TLV) value represents one custom struct on both sender and reciever.

Length (L in the TLV) is split into two fields OwnSize and ChildrenSize, combined define byte-count until next TLV entry.

Interface:

type DecodeResult struct {
	bytes_consumed int
}

type AtomSize struct {
	OwnSize      int32
	ChildrenSize int32
}

type Atom interface {
	MyTag()             uint16
	Decode(from []byte) DecodeResult
	EncodedSize()       int 
	Encode(to []byte)
}

Atoms can contain other Atoms .

Backward compatibility:

Because Tag and Length are read first unknown atoms can be skipped.

New fields are placed last so old version won't read them, effectively just skipping those bytes.

Unknown child atoms would be skipped.

Because AtomSize specifies sizes of fields and contained children, we are free to upgrade protocol with fields and new atoms without need to have version of the protocol.

Streaming:

No streaming planned. Moving streaming responsibility into next abstraction layer that uses this wire protocol.

Our existing non-streaming code already suffers from HTTP protocol that is handling framing on file boundary and requires Content-Length to be known in advance forcing "downstream" code to cache/download files before processing takes place.

Instead of streaming interface we can use convention: Rename your large Data Atom into DataChunk Atom and adjust code logic to handle data chunks in a streaming pipeline maneer.

Message priority:

Can be added if required on sender side.

0 replies

leszko · 2022-02-23T11:17:19Z

leszko
Feb 23, 2022
Maintainer

Thanks for the work @AlexKordic @cyberj0g. Good spec! I agree with most ideas! Here are some additional aspects that I think we need to cover.

B<>O Failover

Currently, when O stops responding, B retries the given segment with another O. What should we do when there are no segments? How to switch to another O and what should be retried? And should we retry at all?

O Zero-Downtime Deployment

One of the requested features by orchestrator operators is restarting an orchestrator without losing streams. I documented some approaches we can take in Orchestrator Zero-Downtime Deployment Feature Spec. We need to think about how to address this requirement in the context of low latency transcoding.

Lossless vs Lossy Transcoding

AFAIU we want to completely replace our current segment-based transcoding with low latency transcoding. Then, I think we should cover how to approach two separate streaming use cases:

Lossless - when latency does not matter (e.g. VOD or watching streams)
Lossy - when latency (e.g. online calls)

What we currently have is lossless (TCP-based, segment retries). If we want to focus on low latency, then we may need to think about lossy transcoding (UDP-based, no retries).

2 replies

AlexKordic Mar 8, 2022

Currently, when O stops responding, B retries the given segment with another O. What should we do when there are no segments? How to switch to another O and what should be retried? And should we retry at all?

Same timeout analogy can work for low-latency workflow. Instead sub-real-time timeout for entire segment we could have sub-real-time timeout for few frames. B will detect backpressure from O earlier.

AlexKordic Mar 8, 2022

What we currently have is lossless (TCP-based, segment retries)

Maybe we don't need to change this. If we move lossy responsibility upstream into Mist everything is good.

Mist should provide UDP termination for the rest of the system.

darkdarkdragon · 2022-02-23T19:29:51Z

darkdarkdragon
Feb 23, 2022

@AlexKordic

MPEG-TS is good format for our use case. Allows streaming and processing without downloading entire segment.

It has big overhead (It was designed for very different use case/transport layer, and usage of it over internet just wastes precious bandwidth. @Thulinma can tell more about it 😄 ). Also, Allows streaming and processing without downloading entire segment. - not much different from other containers like mp4.

Our workflow does not allow for packet loss in the transport layer. If we would switch to UDP-based protocol what are the benefits?

We're not talking about allowing packet loss. here good explanation why QUIC is built over UDP. Summary:

Because TCP is implemented in operating system kernels and middleboxes, widely deploying significant changes to TCP is next to impossible. However, since QUIC is built on top of UDP and the transport functionality is encrypted, it suffers from no such limitations.

Key features of QUIC and HTTP/3 over TCP+TLS and HTTP/2 include

Reduced connection establishment time - 0 round trips in the common case
Improved congestion control feedback
Multiplexing without head of line blocking
Connection migration
Transport extensibility

That's exactly features we're looking for.

Payments rely on segment boundaries. We don't need to get rid of segment boundaries to acomplish streaming workflow.

We will not get rid of boundaries, but we need to find a way to signal boundaries inside stream and embed payment inside stream.

0 replies

darkdarkdragon · 2022-02-23T19:42:37Z

darkdarkdragon
Feb 23, 2022

@leszko

What should we do when there are no segments?

There will be no segments like we have now - separate HTTP requests, but stream will be divided in logical segments anyway (need for payment anyway), so B will keep latest segment in memory so it can send it to another O if needed.
But here is another interesting question we didn't discuss but I think we should keep in mind:
If say, failover took 3 seconds - what should happen with playback? Should source stream wait for transcoded ones? If it will wait - then it will increase overall latency.
Or should transcoded segment be skipped so transcoded streams will be in sync with source with lowest latency? @Thulinma @iameli what do you think?

O Zero-Downtime Deployment

Even now it can't be done without help of O node itself. And if O node will be doing it - it can be done same way as in things like Nginx - by handing down connections from old instance to new one.
btw, in QUIC there is something called Connection Migration - I haven't looked into it in details, but maybe it could be leveraged for zero-downtime deployment.

If we want to focus on low latency, then we may need to think about lossy transcoding (UDP-based, no retries).

I believe we haven't (yet) discussed that use case.

0 replies

AlexKordic · 2022-03-08T17:27:57Z

AlexKordic
Mar 8, 2022

We could refactor each component of the system to work in low-latency constraints.

Here is current data flow:

Broadcaster receives entire input media file before sending it to Orchestrator.

So the latency would look like:

To convert to low-latency logic we could:

Read first 2048 bytes
Run GetCodecInfo()
Decide now which O to send segment
Transfer this 2048 byte chunk and following chunks of this segment to selected Os
Send HTTP response now. Response being point 14 in data flow diagram, so move it before point 2

This is for broadcaster ingest low-latency change. Other components should do similar transformation without changing the transport protocol.

Changed diagram might look like:

Aiming for shorter latency:

1 reply

AlexKordic Apr 12, 2022

Correcting latency diagram, from:

To:

Based on manual latency measurement

The segment-transfer latency is minimal because all components are on same machine or in same local network. Wondering is this the case in most our orchestrators.

thomshutt · 2022-03-09T10:28:37Z

thomshutt
Mar 9, 2022
Maintainer

Some initial questions, now that I've read through the comments we've had so far:

Do we have a concrete E2E latency goal in mind?
I was surprised at the “Zoom-like service” requirement, since that’s real-time - is that something we want to be able to support, or are we still targetting a one-to-many live streaming use case?
Do we have a flow diagram like Alex's, but with the average latency at each stage marked? This could help to inform the design / an incremental approach as we move towards removing the chunk concept from the workflow. One thought I had was that a less disruptive change would be to move to a more asynchronous version of the current system
Would this be a complete replacement of the current chunked system (i.e no need to keep support for the old flow)?

0 replies

thomshutt · 2022-03-10T12:44:19Z

thomshutt
Mar 10, 2022
Maintainer

From some quick unscientific testing so that we have some numbers to talk around:

Latency from source -> screen

Provider	Latency	Low-Latency Offering	Notes
Livepeer	18s (12s when Keyframe Interval set to 2 in OBS)	N/A	-
Mux	16s	7s	Low Latency mode incurs no extra cost
Vimeo	15s	5s	Low Latency mode incurs no extra cost. Preview player has < 2s latency.
Kaltura	24s	N/A	Uses 4s segments by default and needed a support ticket to change.

0 replies

AlexKordic · 2022-03-22T10:48:42Z

AlexKordic
Mar 22, 2022

Note: movflags: faststart requires second pass on entire output MP4 file, preventing streaming of output.

0 replies

MikeIndiaAlpha · 2022-03-28T15:37:06Z

MikeIndiaAlpha
Mar 28, 2022

I can only comment on latency from low-level video encoding/decoding POV since this is where I have solid experience. From video codec POV, main factor that affects decode latency is what kind/strategy of motion compensation is used in the stream in question.
For example, typical YouTube stream will be optimised for bandwidth, meaning with very aggressive bidirectional prediction (B frames present). Such a stream may "naturally" cause overhead of some frames (I'd say around 10 typically) because of reordering and the need to decode several reference frames "in the future" to be able to decode "next" frame. Note that this still means that it is possible to have latency of say, less than 1/3 of second.

On the other end of spectrum, streams designed for low-latency streaming will usually have all reference frames in the past, no B frames, and much shorter latency. In extreme cases (such as with real-time communication as with Zoom or similar solutions) stream can be limited to the single reference frame, and that reference frame may be just a previous one (all the frames in the stream apart from periodic IDR frame are P frames), or stream may be even without inter-coding at all (just I frames). In such case, decoder-incurred latency will be around one frame.

On the encoder side, we have full control over the settings, so again we can move between the extremes of having very well compressed, bidirectional stream with B frames (and huge encode window, perhaps even two-pass encoding) and generating low latency stream with single reference frame in the past.

The solution that I worked on while at Airtame had less than 1/10s of latency between capture on the local machine and display on the remote one, and that was including network transfer time (albeit on local WiFi network). I believe one can do even better than this.

I think that ideally we'd want to know what the customer wants (or at the very least, detect type of the stream being sent and match encoder settings to decoder settings) and move on better latency <-> better bandwidth axis.

0 replies

AlexKordic · 2022-04-12T10:32:39Z

AlexKordic
Apr 12, 2022

Manual testing of latency

On same machine, started B, O, T. Using ffmpeg to post transcoding job.

segment duration	`FFMPEG >> B` time	Transcoding time
2 seconds	2 seconds	0.236 seconds (11.8%)
16 seconds	16 seconds	1.191 seconds (7.4%)

0 replies

cyberj0g · 2022-05-16T12:35:45Z

cyberj0g
May 16, 2022
Author

Here is a template for discussing protocol-related questions we need to decide on - or document decisions, if there's already a consensus. Feel free to comment.

Questions

1. Mist <> Broadcaster protocol

Details

WebSocket

Pros:

Already supported by Mist

Cons:

Not efficient for IPC
Needs application-level protocol

gRPC

IPC: Shared Memory, pipes, etc.

Pros:

Maximum efficiency

Cons:

Not a protocol

Embedding Mist (or go-livepeer) as a shared library

Pros:

Cons:

Neither designed as a library
Potential Go - CGO - C++ issues
Mist uses multi-process model
Still need to build and support an interface layer

2. Broadcaster - Orchestrator - Transcoder protocol

Details

WebSocket

Pros:

High compatibility, based on HTTP/2

Cons:

Needs application-level protocol

gRPC

Pros:

Already used as a control protocol in B-O-T
Stable implementation for Go and C++
Widely used
Easier to build application protocol with generated contracts

Cons:

Non-streaming, can't read less than a message at once, may be an issue for very big binary payloads
Some serialization/deserialization overhead

QUIC

Pros:

Future of internet (probably)

Cons:

Not very mature Go implementation
UDP-based, requires TCP fallback mode for max compatibility

3. Do we need same protocol Mist <> B and B-O-T?

0 replies

AlexKordic · 2022-05-16T16:15:44Z

AlexKordic
May 16, 2022

Summary from our meeting

Its beneficial to use same protocol between all nodes.

We want to use gRPC for media transfer. We can benefit as our code base is already using gRPC. We can refactor media transport that is now using blocking HTTP.

The downside is: adding dependency to third-party gRPC library would delay corresponding Mist release until gRPC is reimplemented.

Performance difference between websocket and gRPC is considered roughly the same.

We will make incremental changes to meet the deadline and leave some optimizations for later. Priority is streaming workflow.

Optimization to connect Mist and T directly will be scheduled later as this changes our data flow considerably.

0 replies

cyberj0g · 2022-05-30T08:15:46Z

cyberj0g
May 30, 2022
Author

Update: an implementation option with direct streaming between Mist and Transcoders is currently being explored. Proof of concept: #2410

0 replies

cyberj0g · 2022-05-30T15:01:18Z

cyberj0g
May 30, 2022
Author

Direct Mist to Transcoder streaming specification

About

The idea is to stream the video through the backend in a more direct way, between Mist and T, so that it won't need to pass through B and O nodes. This seem to be the simplest way to achieve low latency, and makes sense from architectural standpoint, but may have pitfalls. Sequence diagrams below describe current transcoding process, and what changes will need to be made to implement direct streaming.

Current implementation

Interactions in red will need to be revised for direct streaming.

Direct streaming

Interactions in green are new logic.

Specific updates

Mist

Add per-segment secret URLs generation

Generate a set of secret and unique URLs for source segment, each rendition and each verification rendition, which potentially may be used. For verification renditions, B may generate Mist output URL, because there's no need for Mist to be aware of verification rendition URL during transcoding. Source segment URL contains stream id and may not need a secret URL. Combination of these approaches may be viable to protect Mist from unwanted streams.
Extract metadata from segments and pass it to B

B-O-T require video segment metadata (codecs, pixel format, resolution) to negotiate capabilities. If video segment is not accessible during orchestrator selection, it needs to be supplied with transcoding request.
Store segments

Recording capability is currently implemented on B, which won't have a direct access to segments anymore. This is required at least for verification. Alternatively, B may download required segments back from Mist without blocking transcoding process.

go-livepeer

On a higher level, changes to B-O-T are minor and boil down to following:

Use URLs from new fields
Don't download segments on T
Don't connect with Os without direct streaming support. In my PoC, there are some subtle changes on T which break compatibility, but maybe we can work this out and allow new M-B to connect to old Os and Ts, they just will download the segment first and won't be low latency - e.g. if low latency is not requested by the user.

Security

Segment signing

Segments won't be signed directly anymore. Instead, a 'transcoding ticket' containing secure URLs will be signed (no changes to the code, just no video bytes in the metadata objects passed through gRPC). Potentially, ETH private keys of MB and T can be used for TLS.

Verification

Verification remains intact, except segments and renditions are downloaded from different URLs.

Failure modes

The transcoding request from Mist to B will stay sync, therefore, all fail-overs, which are already built in, will work as expected:

transcoding slower than real time - timeout on B, Mist gets error, Orchestrator's session is dropped
lost connection - retry on B, secret URLs are disclosed to new T

The Mist should stop serving the segment in case any error reported by B.

Unadressed questions, potential concerns and improvements (in order of priority)

Are HTTPS (M->T) and SRT (T->M) viable protocol choice, and how hard it is to use TLS key derived from cryptographic proof of specific B-O-T combination which produced the rendition?
How to properly test new configuration? We need to either add Mist to testing environment, or mimic its behavior with LPMS.
How to transfer stream-level output of custom capabilities (AI) to user?
How hard it would be to switch to 'truer' streaming, when T and Mist are re-using TCP connections for multiple segments, and does it make sense to pursue this?
Are HLS playlists already managed by Mist?

Conclusion

With this proposal, we may achieve a very low latency fairly easily and make an overall architecture simpler, even delete a large amount of code from go-livepeer. Still, there are some concerns regarding a) protocol-level security b) locking ourselves to less flexible video streaming protocols and excluding B and O nodes from direct video delivery. By using non-application level protocols M<>T, we may be shifting future work to C/C++ parts of the pipeline, which may be less desirable, as go-livepeer generally allows to develop on a higher level, because of language, blockchain friendliness, and semantics directly related to the service we are creating. On the other hand, it's highly likely (from what is known about how major streaming services are functioning, even when taking Mist as an example) that we'll eventually arrive to direct streaming architecture, when scaling the service up, even if current low latency milestone would be implemented in go-livepeer B-O-T protocol, as was proposed originally.

4 replies

Titan-Node May 30, 2022

We would love to invite you to do a Q&A discussion about this topic in our weekly Water Cooler Chat.
The community has some great questions about this implementation!

thomshutt May 31, 2022
Maintainer

Hey @Titan-Node, the water cooler is usually a little difficult to make for us as most of the team are in Europe / further east but please feel free to dump any questions on here and we'll try to answer them

Titan-Node May 31, 2022

Hi @thomshutt, great feedback! We are thinking of moving the Water Cooler to a more suitable time zone for EU as the community (and other stake holders) are very keen to hear about the work the devs are doing directly from them!

If we moved the Water Cooler to 5pm CET would that likely work for the EU team? That means the Q&A session would be from 5:30pm to 6pm CET. Otherwise I think we can accommodate even earlier if necessary.

Ideally I'd like to interview the whole team over the next year or so and dive into each aspect of the code for the community.

Who would be the best person to schedule these calls? @thomshutt Would you like to start our Q&A session next Monday? That would be June 6th, 2022.

Let me know!

yondonfu May 31, 2022
Maintainer

Segments won't be signed directly anymore.

I think it is important for Os to sign rendition data [1].

The rendition signature serves as an attestation from O that it performed the job correctly and this attestation can be validated (i.e. that it was generated by a specific O) by a third party if a B wishes to prove to the third party that specific rendition data was sent by O. Without this signature, if B receives bad rendition data it is unable to hold O accountable with a third party (i.e. a smart contract, a trusted service, etc.).

Despite supporting authenticated & encrypted communication between parties, TLS alone is not sufficient to provide this capability because the server in a TLS connection only signs data during the handshake and the data exchanged during the actual session is authenticated using a symmetric key preventing parties from proving anything about data exchanged during the session - the technical reason is explained in this SO post.

The split O + T setup with direct streaming between M and T makes rendition signing by O tricky. In a recent call, @Thulinma suggested to have T pass rendition hashes to O so that O can sign the hash and pass the signature to B. In order to handle the case where T passes an invalid hash to O (i.e. different from the rendition data pushed to M), B could download the rendition data from M and generate the hash. Then, B could check the rendition hash received from O against the rendition hash it generated on its own. And O could check the rendition hash received from B against the rendition hash it generated on its own. If we want to support [1], O would need to be aware of the source segment hash as well which leaves questions around how O will get access to this data and how O will determine that this is the correct source segment hash.

Something along the lines of the above might be doable, but it feels like it complicates the workflow a lot and there are a number of edge cases that could crop up vs. the data being signed as it passes through B & O.

[1] At the moment, Bs sign source segments and Os sign rendition segments. Bs might not need to sign source segments if we update the message that is signed by Os such that it contains the hash of the source segment. With the updated message format, the signature from O would not only be directly tied with the rendition segment returned by O, but it would also be tied with the source segment from B. Then, B can accept/reject the rendition segment based on whether the source segment hash in the signed message from O corresponds with B's source segment. I'm not completely sure about this yet so I am putting this in a footnote for now.

lost connection - retry on B, secret URLs are disclosed to new T

If the rendition URLs are disclosed to a new T what happens if the new T pushes to the URLs and then the old T also pushes to the URLs? I think M would likely need to only accept the first rendition received and if it is already receiving rendition data and another push comes in it would need to reject the second push.

If M already received part of the rendition data from the old T and then switches to a new T I'm assuming that the full source segment will be streamed from M -> new T (O can pass the source URL to the new T) and the new T will stream rendition data back to M. In this case, M would already have received the first part of the rendition data from the old T so presumably it would not care about that part of the rendition data it is receiving from the new T. Does this mean M needs to know to discard that part of the rendition data?

Low latency transcoding #2300

Abstract

Motivation

Current implementation

Points that will need to be addressed in low latency implementation

Replies: 14 comments · 7 replies

Observations

Proposed protocol

Proposed custom protocol

leszko Feb 23, 2022 Maintainer

B<>O Failover

O Zero-Downtime Deployment

Lossless vs Lossy Transcoding

thomshutt Mar 9, 2022 Maintainer

thomshutt Mar 10, 2022 Maintainer

Manual testing of latency

cyberj0g May 16, 2022 Author

Questions

1. Mist <> Broadcaster protocol

WebSocket

gRPC

IPC: Shared Memory, pipes, etc.

Embedding Mist (or go-livepeer) as a shared library

2. Broadcaster - Orchestrator - Transcoder protocol

WebSocket

gRPC

QUIC

3. Do we need same protocol Mist <> B and B-O-T?

Summary from our meeting

cyberj0g May 30, 2022 Author

cyberj0g May 30, 2022 Author

Direct Mist to Transcoder streaming specification

About

Current implementation

Direct streaming

Specific updates

Mist

go-livepeer

Security

Segment signing

Verification

Failure modes

Unadressed questions, potential concerns and improvements (in order of priority)

Conclusion

thomshutt May 31, 2022 Maintainer

yondonfu May 31, 2022 Maintainer

Replies: 14 comments 7 replies

leszko
Feb 23, 2022
Maintainer

thomshutt
Mar 9, 2022
Maintainer

thomshutt
Mar 10, 2022
Maintainer

cyberj0g
May 16, 2022
Author

cyberj0g
May 30, 2022
Author

cyberj0g
May 30, 2022
Author

thomshutt May 31, 2022
Maintainer

yondonfu May 31, 2022
Maintainer