Replies: 4 comments
-
This is codec dependent. E.g., VP9 has a concept called super frames with multiple frames in it; see 5.26 in spec. AFAIK there isn't a similar concept for H.264. In the codec registry we say:
Generally not for decoders, but encoders may use the one on
It's a sanity check -- many hardware decoders don't properly support other types. Chrome only checks the first frame.
You're right there are reconstruction mechanisms other than IDR, but per the registry type key must be an IDR frame: |
Beta Was this translation helpful? Give feedback.
-
Thanks, I appreciate the answers and the links to the documents. It seems I had still missed quite some. Considering the Considering to keyframe, I read up on the spec a bit more ;). It seems that an I-frame with a "recovery point SEI message" (D.2.8 in the h264 spec) should be allowed as start of decoding as well (especially if the The streams I work with (from a camcorder) have I-frames with recovery point SEI messages every 12 frames, but IDR frames only once every 600ish frames (in addition to the (out of spec) fact that they start with a non-IDR I-frame . For random access in a video player, having to decode 599 frames (worst case) for random access/seeking would be a considerable lag for a user. |
Beta Was this translation helpful? Give feedback.
-
One clarification on |
Beta Was this translation helpful? Give feedback.
-
Dan said: "VP9 has a concept called super frames with multiple frames in it... AFAIK there isn't a similar concept for H.264/AVC. In the codec registry we say... [BA] VP9 and AV1 support spatial scalability, whereas H.264/AVC does not (only temporal). One concern I have is whether the WebCodecs and Encoded Transform specifications align on this issue. For example, RTCEncodedVideoFrameMetadata includes support for both |
Beta Was this translation helpful? Give feedback.
-
have some questions about the EncodedVideoChunk that (afaict) are not addressed in the spec. Maybe I missed some additional information somewhere; if so, please point me in the right direction.
The specific case I'm referring to is decoding an (annexB) H264 stream
Should an EncodedVideoChunk always contain one Frame?
type
(to indicate whether it's a key frame), it feels that it points to specifically one frame.timestamp
fieldIn my experience (in Chrome 110), if you split your frame into two
EncodedVideoChunk
s (and callvideoDecoder.decode()
twice, either the first call will get you aVideoFrame
with half the data and the second call will fail, or the first call will fail (i.e. the decoder seems to expect that one call todecode
is (at least) one fullVideoFrame
of data.Likewise, if you feed two full frames of data into
EncodedVideoChunk
, the decoder seems to decode the firstVideoFrame
and ignore the second.Issue #38 suggests that the spec says that it should always be one frame, I would be grateful if someone could point me to the right spot (so I can ask MDN to add it to their documentation)
Does the provided
timestamp
do anything?In my experience (as also described in #565 ), the timestamp field seems to be unused while decoding. Is there actually a usecase where timestamp is useful in this context (else wouldn't it make more sense to make the field optional for decoding? -- or document that it can just contain anything)
Does the provided
type
do anything (useful)?In my experience (in Chrome 110), the
type
field seems to be completely ignored. As far as I can tell, this is not entirely per-spec (VideoDecoder.decode()
states "If chunk.type is not key, throw a DataError."), however this is what it does.It feels to me that in this context
type
is not really doing anything useful, since the codec has to check anyways to see if the content is actually a key frame. It does make the load in the webapp higher, because I have to manually parse my bytestream to see if the next frame is a keyframe or not.(finally, related but not entirely on the subject of EncodedVideoChunk) what is a key chunk anyways?
The spec says: "An encoded chunk that does not depend on any other frames for decoding". This sounds like an I-frame to me.
However my Chrome implementation seems to demand an IDR frame (which (afaik) is an Iframe with the additional properties that no subsequent frame will refer to a previous frame), which I also could see make sense.
Is this something the spec has (or people here have) an opinion on?
(I ask because I have some camcorder files that have their first IDR frame only after about 500 frames, including 20 I-frames, and VLC plays the first 500 frames fine. I don't know the H264 spec well enough to know if this is per-spec or a violation)
Finally, thanks for all the work on this, I do appreciate your efforts! I hope this place is the correct spot to put these questions; if you prefer to have them as issues or something else, let me know!
Beta Was this translation helpful? Give feedback.
All reactions