Cup of coffee and a laptop with a video conference showing many participants.

Insertable streams for MediaStreamTrack

Insertable streams for MediaStreamTrack

The content of a MediaStreamTrack is exposed as a stream that can be manipulated or used to generate new content.

Insertable streams for MediaStreamTrack is part of the capabilities project and is currently in development. This post will be updated as the implementation progresses.

Background #

In the context of the Media Capture and Streams API the MediaStreamTrack interface represents a single media track within a stream; typically, these are audio or video tracks, but other track types may exist. MediaStream objects consist of zero or more MediaStreamTrack objects, representing various audio or video tracks. Each MediaStreamTrack may have one or more channels. The channel represents the smallest unit of a media stream, such as an audio signal associated with a given speaker, like left or right in a stereo audio track.

What is insertable streams for MediaStreamTrack? #

The core idea behind insertable streams for MediaStreamTrack is to expose the content of a MediaStreamTrack as a collection of streams (as defined by the WHATWG Streams API). These streams can be manipulated to introduce new components.

Granting developers access to the video (or audio) stream directly allows them to apply modifications directly to the stream. In contrast, realizing the same video manipulation task with traditional methods requires developers to use intermediaries such as <canvas> elements. (For details of this type of process, see, for example, video + canvas = magic.)

Use cases #

Use cases for insertable streams for MediaStreamTrack include, but are not limited to:

  • Video conferencing gadgets like "funny hats" or virtual backgrounds.
  • Voice processing like software vocoders.

Current status #

StepStatus
1. Create explainerComplete
2. Create initial draft of specificationIn Progress
3. Gather feedback & iterate on designIn progress
4. Origin trialIn progress
5. LaunchNot started

How to use insertable streams for MediaStreamTrack #

Enabling support during the origin trial phase #

Starting in Chrome 90, insertable streams for MediaStreamTrack is available as part of the WebCodecs origin trial in Chrome. The origin trial is expected to end in Chrome 91 (July 14, 2021). If necessary, a separate origin trial will continue for insertable streams for MediaStreamTrack.

Origin trials allow you to try new features and give feedback on their usability, practicality, and effectiveness to the web standards community. For more information, see the Origin Trials Guide for Web Developers. To sign up for this or another origin trial, visit the registration page.

Register for the origin trial #

  1. Request a token for your origin.
  2. Add the token to your pages. There are two ways to do that:
    • Add an origin-trial <meta> tag to the head of each page. For example, this may look something like:
      <meta http-equiv="origin-trial" content="TOKEN_GOES_HERE">
    • If you can configure your server, you can also add the token using an Origin-Trial HTTP header. The resulting response header should look something like:
      Origin-Trial: TOKEN_GOES_HERE

Enabling via chrome://flags #

To experiment with insertable streams for MediaStreamTrack locally, without an origin trial token, enable the #enable-experimental-web-platform-features flag in chrome://flags.

Feature detection #

You can feature-detect insertable streams for MediaStreamTrack support as follows.

if ('MediaStreamTrackProcessor' in window && 'MediaStreamTrackGenerator' in window) {
// Insertable streams for `MediaStreamTrack` is supported.
}

Core concepts #

Insertable streams for MediaStreamTrack builds on concepts previously proposed by WebCodecs and conceptually splits the MediaStreamTrack into two components:

  • The MediaStreamTrackProcessor, which consumes a MediaStreamTrack object's source and generates a stream of media frames, specifically VideoFrame or AudioFrame) objects. You can think of this as a track sink that is capable of exposing the unencoded frames from the track as a ReadableStream. It also exposes a control channel for signals going in the opposite direction.
  • The MediaStreamTrackGenerator, which consumes a stream of media frames and exposes a MediaStreamTrack interface. It can be provided to any sink, just like a track from getUserMedia(). It takes media frames as input. In addition, it provides access to control signals that are generated by the sink.

The MediaStreamTrackProcessor #

A MediaStreamTrackProcessor object exposes two properties:

  • readable: Allows reading the frames from the MediaStreamTrack. If the track is a video track, chunks read from readable will be VideoFrame objects. If the track is an audio track, chunks read from readable will be AudioFrame objects.
  • writableControl: Allows sending control signals to the track. Control signals are objects of type MediaStreamTrackSignal.

The MediaStreamTrackGenerator #

A MediaStreamTrackGenerator object likewise exposes two properties:

  • writable: A WritableStream that allows writing media frames to the MediaStreamTrackGenerator, which is itself a MediaStreamTrack. If the kind attribute is "audio", the stream accepts AudioFrame objects and fails with any other type. If kind is "video", the stream accepts VideoFrame objects and fails with any other type. When a frame is written to writable, the frame's close() method is automatically invoked, so that its media resources are no longer accessible from JavaScript.
  • readableControl: A ReadableStream that allows reading control signals sent from any sinks connected to the MediaStreamTrackGenerator. Control signals are objects of type MediaStreamTrackSignal.

In the MediaStream model, apart from media, which flows from sources to sinks, there are also control signals that flow in the opposite direction (i.e., from sinks to sources via the track). A MediaStreamTrackProcessor is a sink and it allows sending control signals to its track and source via its writableControl property. A MediaStreamTrackGenerator is a track for which a custom source can be implemented by writing media frames to its writable field. Such a source can receive control signals sent by sinks via its readableControl property.

Bringing it all together #

The core idea is to create a processing chain as follows:

Platform Track → Processor → Transform → Generator → Platform Sinks

For a barcode scanner application, this chain would look as in the code sample below.

const stream = await getUserMedia({ video: true });
const videoTrack = stream.getVideoTracks()[0];

const trackProcessor = new MediaStreamTrackProcessor({ track: videoTrack });
const trackGenerator = new MediaStreamTrackGenerator({ kind: 'video' });

const transformer = new TransformStream({
async transform(videoFrame, controller) {
const barcodes = await detectBarcodes(videoFrame);
const newFrame = highlightBarcodes(videoFrame, barcodes);
videoFrame.close();
controller.enqueue(newFrame);
},
});

trackProcessor.readable.pipeThrough(transformer).pipeTo(trackGenerator.writable);

trackGenerator.readableControl.pipeTo(trackProcessor.writableControl);

This article barely scratches the surface of what is possible and going into the details is way beyond the scope of this publication. For more examples, see the extended video processing demo and the audio processing demo respectively. You can find the source code for both demos on GitHub.

Demo #

You can see the QR code scanner demo from the section above in action on a desktop or mobile browser. Hold a QR code in front of the camera and the app will detect it and highlight it. You can see the application's source code on Glitch.

QR code scanner running in desktop browser tab showing a detected and highlighted QR code on the phone the user holds in front of the laptop's camera.

Security and Privacy considerations #

The security of this API relies on existing mechanisms in the web platform. As data is exposed using the VideoFrame and AudioFrame interfaces, the rules of those interfaces to deal with origin-tainted data apply. For example, data from cross-origin resources cannot be accessed due to existing restrictions on accessing such resources (e.g., it is not possible to access the pixels of a cross-origin image or video element). In addition, access to media data from cameras, microphones, or screens is subject to user authorization. The media data this API exposes is already available through other APIs. In addition to the media data, this API exposes some control signals such as requests for new frames. These signals are intended as hints and do not pose a significant security risk.

Feedback #

The Chromium team wants to hear about your experiences with insertable streams for MediaStreamTrack.

Tell us about the API design #

Is there something about the API that does not work like you expected? Or are there missing methods or properties that you need to implement your idea? Do you have a question or comment on the security model? File a spec issue on the corresponding GitHub repo, or add your thoughts to an existing issue.

Report a problem with the implementation #

Did you find a bug with Chromium's implementation? Or is the implementation different from the spec? File a bug at new.crbug.com. Be sure to include as much detail as you can, simple instructions for reproducing, and enter Blink>MediaStream in the Components box. Glitch works great for sharing quick and easy repros.

Show support for the API #

Are you planning to use insertable streams for MediaStreamTrack? Your public support helps the Chromium team prioritize features and shows other browser vendors how critical it is to support them.

Send a tweet to @ChromiumDev using the hashtag #InsertableStreams and let us know where and how you are using it.

Acknowledgements #

The insertable streams for MediaStreamTrack spec was written by Harald Alvestrand and Guido Urdaneta. This article was reviewed by Harald Alvestrand, Joe Medley, Ben Wagner, Huib Kleinhout, and François Beaufort. Hero image by Chris Montgomery on Unsplash.

Last updated: Improve article