Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for containers? #24

Closed
pthatcherg opened this issue Sep 18, 2019 · 29 comments
Closed

API for containers? #24

pthatcherg opened this issue Sep 18, 2019 · 29 comments
Labels
extension Interface changes that extend without breaking. won't fix No changes

Comments

@pthatcherg
Copy link
Contributor

It comes up as a common question: can we have an API for media containers? It's something that can be done JS/wasm and is arguably orthogonal to WebCodecs. But for some formats that you might consider video (GIF, (M)JPEG), the line is blurry between container and codec.

This is a tracking issue for a conversation around this topic. My current opinion is to leave it out of WebCodecs until it's more mature and then perhaps readdress it later.

@pthatcherg pthatcherg added the maybe Ideas that might be in scope, and worth discussing label Sep 18, 2019
@steveanton
Copy link
Contributor

The concern I have with containers is that it exposes a lot of API and specification surface but does not unlock anything new that can't already be done (pretty efficiently even) today.

Following the principles of the Extensible Web Manifesto (https://extensiblewebmanifesto.org/), we should focus on delivering low-level codecs first. Only after that's done (or in parallel by a different group) consider tackling containers.

@pthatcherg
Copy link
Contributor Author

I agree completely. That's a great way to put it.

@guest271314
Copy link
Contributor

Key use-cases

  • Non-realtime encoding/decoding/transcoding, such as for local file editing
  • Decoded and encoding images
  • Reencoding multiple input media streams in order to merge many encoded media streams into one encoded media stream.

and potentially

  • Live stream uploading

each could be considered an API that at least in part performed some form of code to edit a file. A file could be considered a "container" when technically compared to

  • Extremely low latency live streaming (<3s delay)
  • Cloud gaming
  • Advanced Real-time Communications:
    -- e2e encryption
    -- control over buffer behavior
    -- spatial and temporal scalability

and potentially

  • Live stream uploading

where any and all of the above is encompassed within the use case of recording media into a container whether that "container" be an array of images with an index element for "metadata", i.e., width, height, frame duration, or other adjustments made "mid-stream" or post-production ("codec" or instruction), to a .json (or, if preferred Matroska or WebM) "container", for download of both the entire procedure output by WebCodecs and specific time slices into a single "container" (file structure).

Since the topic is at hand and the maintainers of this repository spans a wide range of topics it might be helpful to create a glossary to point to exactly what you (this repository) mean the definition of the term that you are using.

Non-goal

  • Direct APIs for media containers (muxers/demuxers)

indicates the technical proximity of "codecs" to "containers".

Whether or not WebCodecs includes the reading, writing, editing, etc. of both "codecs" and "containers" that internal decision will not prevent nor preclude the fact of the actual use-case for a single API which is capable of both codec and container creation, extension, modification, etc., to avoid the necessity of attempting to utilizing what is available in different specifications that were not initially conceived as being interoperable with other APIs, both existing and proposed.

One recent use case for not omitting to include the capability to perform the same procedures within the scope of "containers" the same as "codecs" is that the two are symbiotic and when a single API designed and maintained with that consideration at the forefront and throughout has the potential to solve more than one existing issue where separate portions of code in the same domain, for example, media, could have very different output due to different authors' intent at the time they wrote and merged the code: time passes, new technologies that the issue resolves are now an issue because the implementers may or may not be in accord with the various branches of "media". A single API from creation to editing to production of media streams and files is the explicit use case.

@guest271314
Copy link
Contributor

Example use case: write audio to a WebM file https://plnkr.co/edit/Inb676?p=preview. Ideally, from a front-end perspective, this single API should be able to encode VP8 and Opus to a file, or if Opus is missing from an existing file, write the audio to the file.

@pthatcherg pthatcherg added probably not Probably not in scope, but maybe worth tracking/discussing and removed maybe Ideas that might be in scope, and worth discussing labels Sep 18, 2019
@guest271314
Copy link
Contributor

@steveanton

(or in parallel by a different group)

What is necessary to start such a group? Post the proposal at https://discourse.wicg.io? (Note, am not a member of W3C).

@guest271314
Copy link
Contributor

@pthatcherg

It's something that can be done JS/wasm

While there does exist code which can write input images and output a WebM file (https://github.com/GoogleChromeLabs/webm-wasm; https://github.com/thenickdude/webm-writer-js) neither of the repository authors are interested in implementing writing audio to the same output file

thenickdude/webm-writer-js#8 (comment)

I don't have any plans to work on adding audio, sorry, and I'm not sure where best to begin either (it probably depends on what format you can capture the audio in and what environment you expect to run in, Chrome, arbitrary browser, Electron, etc).

GoogleChromeLabs/webm-wasm#12

It seems this project doesn't support encoding audio+video yet, just video-only? It this feasible, or would it be better to just use the more heavyweight ffmpeg.js project for this?

GoogleChromeLabs/webm-wasm#12 (comment)

Yeah, there’s no support for audio and I don’t have any plans to add it. This project was born out of the lackluster capabilities of MediaStreamRecorder.

ffmpeg.js is definitely one choice. But if you already have an encoded audio and video stream, an mkv muxer might do. That would be a lot faster and smaller. Hope this helps!

It seems this project doesn't support encoding audio+video yet, just video-only? It this feasible, or would it be better to just use the more heavyweight ffmpeg.js project for this?

There is an implementation which is capable of writing audio as Opus to a WebM container https://github.com/kbumsik/opus-media-recorder/.

Meaning this repository can reach maturity without a corresponding container writer being available and having proximal maturoty to write the output of WebCodecs to a file.

Thus, simply because "JS/wasm" exists does not mean that implementations exist to meet the requirements described at Key use-cases, particuarly

  • Non-realtime encoding/decoding/transcoding, such as for local file editing.

@guest271314
Copy link
Contributor

@padenot
Copy link
Collaborator

padenot commented Oct 17, 2019

To write down what was said during TPAC, this might be very important to avoid the proliferation of badly muxed files. Muxing is rather hard to get right.

A possible solution might be a vouched library, as noted above, but there is always the problem of updating it for bug fixes.

@chrisn
Copy link
Member

chrisn commented May 7, 2021

I've been following the general discussion around WebCodecs, and the need for a media container API seems to be recognised, but I thought I'd add my own use case as an example.

I maintain a library waveform-data.js that produces data for waveform visualisation from audio.

This uses Web Audio decodeAudioData() - but has the well-known problems: it runs on the main thread so UI updates stall during decoding, it requires the entire encoded audio to be held in memory, there's no indication of progress so I can't tell how long it will take to complete, and there's no way to cancel the decode.

For this use case, the simplest solution would be to allow decodeAudioData() to run from a worker context, with an extended API to allow progress notifications and cancellation.

WebCodecs also solves these issues, but introduces a new one. Because the library is generic, it will accept any audio format that decodeAudioData supports. So in order to use Web Codecs the library would have to include code to parse all the container formats, or define an API that moves container parsing to users of the library. Both options increase the amount of JavaScript that needs to be delivered, and unnecessarily so because parsing the container is a capability the browser already has. Also, leaving container parsing to library users would make the library much harder for people to use.

@chcunningham
Copy link
Collaborator

Triage note: marking 'extension', as this would clearly be a new API.

@chcunningham chcunningham added the extension Interface changes that extend without breaking. label May 12, 2021
@chcunningham
Copy link
Collaborator

chcunningham commented May 17, 2021

@chrisn thanks for the use case.

I'm a little torn. I find the argument about existing demuxers to be persuasive, but less so on the muxing side. Browsers have long compiled-in full featured demuxers for <video> and MSE, but for muxing I think the only example is MediaRecorder, and the files it produces are pretty basic. For example, I don't think we currently ship a muxer that could produce a fragmented MP4.

We've found JS demuxing performance is quite good. Performance equal, there are some advantages to JS like rapid extensibility and perfect interoperability. My hope is that the download hit is largely amortized away by caching. WDYT?

But the JS answer rings a little hollow because the available libraries for this are pretty limited right now. If folks like the idea, we could organize a community / WG effort to build / centralize.

@dalecurtis
Copy link
Contributor

I think containers are an entirely separate API from WebCodecs. The interfaces and processing model are likely entirely different from WebCodecs. E.g., it's likely a streams based API would work very well for containers. It will also need its own containers registry which describes per-container behavior.

IMHO, the options for solving this use case are:

  • Do nothing and let the JS ecosystem for this flourish.
  • Create an entirely new WebContainers API for muxing/demuxing in a new spec.
  • Extend MediaSourceExtensions for demuxing and MediaRecorder for muxing in their respective specs.

@padenot
Copy link
Collaborator

padenot commented May 19, 2021

I agree with both @dalecurtis and @chcunningham. Gecko is also running in-content-process WASM demuxers for security reasons (essentially, libogg compiled to WASM running in process), and confirms the findings of the link above. This has been shipped in release for a few versions without a single problem reported.

I prefer option 1 and 2 in @dalecurtis's comment, and this can be a gradual solution (1 then 2 if really needed).

3 I like less, those objects are not at the same abstraction level, and MediaRecorder only supports real-time media (not offline processing), although Gecko implements a proprietary extension that allows encoding faster than real-time, that we only used for testing (not exposed to the web of course).

@davedoesdev
Copy link

Is there a list of container projects? I've written a WebM muxer but this doesn't seem like the right place to keep track of them.

@dalecurtis
Copy link
Contributor

I ended up writing a quick explainer for the third bullet in #24 (comment) (Extending MediaRecorder for muxing):

https://github.com/dalecurtis/mediarecorder-muxer/blob/main/explainer.md

Have your thoughts in #24 (comment) changed at all now that WebCodecs is more fleshed out @padenot?

At least internally folks don't seem to hate it. It's only targeted towards simple use cases as a hedge against a more complete containers API (which we (Chromium) are unlikely to undertake anytime soon). It looks like a fairly small implementation delta. Is this interesting at all?

cc: @youennf

@guillaumebrunerie
Copy link

Did anyone find a way to generate mp4 files client-side in Chrome? I tried all possible ways, but couldn't find one that works:

  • MediaRecorder only supports real-time encoding, and doesn't support mp4 in Chrome on Mac anyway (only mkv and webm).
  • WebCodecs doesn't give any way to save the encoded stream into an mp4 file.
  • I tried WebCodecs + mp4box-js, but it seems to require a pretty deep understanding of the mp4 format, boxes and stuff, which I don't have, I just want a "save to mp4" function.

I'm working on a 2D animation app in the browser, I can currently easily export animations as a sequence of frames but not being able to export them as an mp4 file is pretty limiting. The format needs to be mp4 as the videos are meant to be imported in other programs that unfortunately only support mp4.

The other options I have left are

  • Ask my users to use Safari instead (!), as MediaRecorder on Safari does support mp4. Unfortunately my users typically use Chrome, and there is a number of other things that are better supported in Chrome, like file system access and multiscreen support.
  • Use MediaRecorder with something like ffmpeg-wasm, but last time I checked it was around 25M which seems overkill for my app which is otherwise only around 0.5M (and it would only support real-time encoding anyway).
  • Ask my users to download the animation as a sequence of frames, and to then use QuickTime to turn it into a video. Not ideal, obviously.

I'm not sure if it should be in this or another specification, but it seems like a pretty important missing use case. If the reason for it not being included is because it can already be done in Javascript, please link to a library that can actually do it.

@dalecurtis
Copy link
Contributor

FWIW, MP4 support for MediaRecorder is being worked on in Chrome. You can follow along here: https://bugs.chromium.org/p/chromium/issues/detail?id=1072056

What went wrong with mp4box.js exactly? https://github.com/gpac/mp4box.js/blob/master/test/qunit-iso-creation.js shows how to handle creation. I think the only thing you might need to tweak is the segment size.

@guillaumebrunerie
Copy link

FWIW, MP4 support for MediaRecorder is being worked on in Chrome. You can follow along here: https://bugs.chromium.org/p/chromium/issues/detail?id=1072056

Great to hear, thanks for the link!

What went wrong with mp4box.js exactly? https://github.com/gpac/mp4box.js/blob/master/test/qunit-iso-creation.js shows how to handle creation. I think the only thing you might need to tweak is the segment size.

Actually I think it is mux.js that I have tried, not mp4box.js. It has a test file https://github.com/videojs/mux.js/blob/main/test/mp4-generator.test.js with a very promising name, but all the code there goes pretty deep into box types and things like that, so I could not manage to create a working mp4 file from that.
I did not know about this mp4box example, but I'll definitely give it another try, thank you!

@guillaumebrunerie
Copy link

I managed to make MP4Box work with WebCodecs! See code below and a working example at https://codepen.io/Latcarf/pen/NWBmJVw.

The main thing I am still very confused about is the codec string. I couldn’t find a single example of valid H264 codec string on MDN, and after some trial and error I settled on avc1.64003d (found somewhere online) which seems to mostly work, but I have very little understanding of what it means (even after trying to read everything I can find about profiles and levels). It also doesn’t seem to always work, for instance if you change the size of the video to 200×200, it fails with a rather cryptic DOMException: Encoding error. (without any more explanation).

It would be great if either there was for instance a catch-all codec string h264 (or avc1, or mp4) which would mean "Choose whatever avc1.xxxxxx codec string that you believe is most appropriate", or at the very least some examples on MDN, like "If you want H264 HD video choose this, if you want a small H264 video choose that". I guess the MediaRecorder API already chooses an appropriate codec string on behalf of the user based on the size of the canvas, so it would be great if WebCodecs could do the same.

It also seems like we cannot create the track upfront because it needs the metadata.decoderConfig.description (which I have no idea what it contains). That’s not a big issue, but it is a bit hard to guess.

Here is my function doing the encoding:

const encodeFramesToMP4 = async ({width, height, fps, frames, renderFrame}) => {
	const f = MP4Box.createFile();
	let track = null;
	const frameDuration = 1_000_000/fps;

	const encoder = new VideoEncoder({
		output: (chunk, metadata) => {
			if (track === null) {
				track = f.addTrack({
					timescale: 1_000_000,
					width,
					height,
					avcDecoderConfigRecord: metadata.decoderConfig?.description,
				});
			}

			const buffer = new ArrayBuffer(chunk.byteLength);
			chunk.copyTo(buffer);
			f.addSample(track, buffer, {
				duration: frameDuration,
			});
		},
		error: (error) => {
			throw error;
		}
	});
	encoder.configure({
		codec: "avc1.64003d",
		width,
		height,
	});

	for (let i = 0; i < frames; i++) {
		const frame = new VideoFrame(
			renderFrame(i),
			{timestamp: i * frameDuration},
		);
		encoder.encode(frame);
		frame.close();
	}
	await encoder.flush();
	encoder.close();
	return f;
}

And here is how it is used. We simply create an OffscreenCanvas and provide a function that can draw a given frame.

const renderExampleVideo = async () => {
	const width = 600;
	const height = 600;
	const canvas = new OffscreenCanvas(width, height);

	const file = await encodeFramesToMP4({
		width,
		height,
		fps: 30,
		frames: 30, // duration in frames
		renderFrame: i => {
			// ...
			// draw frame #i on the canvas
			// ...
			return canvas
		}
	})
	file.save("Example.mp4");
}

Feel free to let me know if there is any issue in my code or anything that could be improved.

@dalecurtis
Copy link
Contributor

Thanks for sharing! Codec strings can be pretty annoying. Here are some good references if you haven't seen them:
https://developer.mozilla.org/en-US/docs/Web/Media/Formats/codecs_parameter
https://cconcolato.github.io/media-mime-support/

@KevinBoeing
Copy link

I am currently working on a video editor that runs entirely in the Chrome browser. I am currently struggling with the demuxing and muxing process. Mp4Box.js works great, but unfortunately it only allows .mp4 containers to be demuxed. Is there already a demuxer that allows you to demux any container type? I was thinking of ffmpeg.wasm (I saw clipchamp uses it too), but since it's a cli tool, I have no idea how to use it as an all-in-one demuxer in javascript. Is it even possible? The end result of the demuxing process should be to have all the EncodedVideoFrames in one array.

@dalecurtis
Copy link
Contributor

For all containers you'd definitely need something like ffmpeg.wasm #549 shows how this might work. Even if browsers had a containers API, it'd likely only support the formats they already parse (mp4, webm, ogg, etc)

@KevinBoeing
Copy link

Thats exactly what I needed. Thanks!

@bartadaniel
Copy link

bartadaniel commented Jun 21, 2023

I also work on a fully in-browser video editing experience. While I made it work with ffmpeg+webcodecs, I can see massive value in adding something like the WebContainers API. Maybe my angle is a little different here; I had edge cases where I had to do workarounds and fixups on the demuxed streams before I could feed them to the VideoDecoder. Some examples:

  • I had to handle negative presentation timestamps in streams. The negative timestamp is not a problem for the VideoDecoder, but it results in the VideoFrame having a different timestamp than what I gave when calling decode.
  • For some videos, the extradata I receive from ffmpeg is missing some information that is available somewhere in the stream I guess. Without that, the VideoDecoder fails.

Obviously, these things are out of the scope of WebCodecs. But I assume all these touches are already written somewhere in the major browsers because those problematic videos play just fine in Chrome. If I could have access to the video and audio stream in a way that the browser thinks it's appropriate for decoding, that would be a game-changer.

@dalecurtis
Copy link
Contributor

Maybe unsurprisingly, edge cases are one of the reasons we wouldn't want to do this. The argument being that containers are all edge cases and an external library meets needs the best. It's likely if we did undertake this the API would be limited to very common muxing and demuxing scenarios. E.g., playback w/ seeking and basic recording scenarios.

IIRC, decoded timestamps should just pass through per spec: https://w3c.github.io/webcodecs/#output-videoframes -- If you're not seeing that, please file an issue with the respective UA. https://crbug.com/new for Chromium.

@aboba
Copy link
Collaborator

aboba commented May 2, 2024

Can we close this issue?

@padenot
Copy link
Collaborator

padenot commented May 2, 2024

I think so.

@chrisn
Copy link
Member

chrisn commented May 2, 2024

I think so too, but worth keeping track of developer interest, so have created w3c/media-and-entertainment#108 - anyone still interested is welcome to comment there.

@aboba aboba added won't fix No changes and removed probably not Probably not in scope, but maybe worth tracking/discussing labels May 5, 2024
@aboba aboba closed this as completed May 5, 2024
@ForeverSc
Copy link

Recently I designed a WASM demuxer package specifically for WebCodecs, compared to ffmpeg.wasm, the size will be much smaller, and at the same time than mp4box.js support more formats such as like mkv, webm, flv, etc., I hope it can help people who need it! https://github.com/ForeverSc/web-demuxer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension Interface changes that extend without breaking. won't fix No changes
Projects
None yet
Development

No branches or pull requests