Virtual reality comes to the web, part II

All about the frame loop

Joe Medley

Recently, I published Virtual reality comes to the web, an article that introduced basic concepts behind the WebXR Device API. I also provided instructions for requesting, entering, and ending an XR session.

This article describes the frame loop, which is a user-agent controlled infinite loop in which content is repeatedly drawn to the screen. Content is drawn in discrete blocks called frames. The succession of frames creates the illusion of movement.

What this article is not

WebGL and WebGL2 are the only means of rendering content during a frame loop in a WebXR App. Fortunately many frameworks provide a layer of abstraction on top of WebGL and WebGL2. Such frameworks include three.js, babylonjs, and PlayCanvas, while A-Frame and React 360 was designed for interacting with WebXR.

This article explains basics of a frame loop, using the Immersive Web Working Group's Immersive VR Session sample (demo, source). If you want to dive into WebGL or one of the frameworks, there's a growing list of resources online.

The players and the game

When trying to understand the frame loop, I kept getting lost in the details. There's a lot of objects in play, and some of them are only named by reference properties on other objects. To help you keep it straight, I'll describe the objects, which I'm calling 'players'. Then I'll describe how they interact, which I'm calling 'the game'.

The players

XRViewerPose

A pose is the position and orientation of something in 3D space. Both viewers and input devices have a pose, but it's the viewer's pose we're concerned with here. Both viewer and input device poses have a transform attribute describing its position as a vector and its orientation as a quaternion relative to the origin. The origin is specified based on the requested reference space type when calling XRSession.requestReferenceSpace().

Reference spaces take a bit to explain. I cover them in depth in Augmented reality. The sample I'm using as the basis for this article uses a 'local' reference space which means the origin is at the viewer's position at the time of session creation without a well-defined floor, and its precise position may vary by platform.

XRView

A view corresponds to a camera viewing the virtual scene. A view also has a transform attribute describing it's position as a vector and its orientation. These are provided both as a vector/quaternion pair and as an equivalent matrix, you can use either representation depending on which best fits your code. Each view corresponds to a display or a portion of a display used by a device to present imagery to the viewer. XRView objects are returned in an array from the XRViewerPose object. The number of views in the array varies. On mobile devices an AR scene has one view, which may or may not cover the device screen. Headsets typically have two views, one for each eye.

XRWebGLLayer

Layers provide a source of bitmap images and descriptions of how those images are to be rendered in the device. This description doesn't quite capture what this player does. I've come to think of it as a middleman between a device and a WebGLRenderingContext. MDN takes much the same view, stating that it 'provides a linkage' between the two. As such, it provides access to the other players.

In general, WebGL objects store state information for rendering 2D and 3D graphics.

WebGLFramebuffer

A framebuffer provides image data to the WebGLRenderingContext. After retrieving it from the XRWebGLLayer, you pass it to the current WebGLRenderingContext. Other than calling bindFramebuffer() (more about that later) you will never access this object directly. You will merely pass it from the XRWebGLLayer to the WebGLRenderingContext.

XRViewport

A viewport provides the coordinates and dimensions of a rectangular region in the WebGLFramebuffer.

WebGLRenderingContext

A rendering context is a programmatic access point for a canvas (the space we're drawing on). To do this, it needs both a WebGLFramebuffer and an XRViewport.

Notice the relationship between XRWebGLLayer and WebGLRenderingContext. One corresponds to the viewer's device and the other corresponds to the web page. WebGLFramebuffer and XRViewport are passed from the former to the latter.

The game

Now that we know who the players are, let's look at the game they play. It's a game that starts over with every frame. Recall that frames are part of a frame loop that happens at a rate that depends on the underlying hardware. For VR applications the frames per second can be anywhere from 60 to 144. AR for Android runs at 30 frames per second. Your code should not assume any particular frame rate.

The basic process for the frame loop looks like this:

Call XRSession.requestAnimationFrame(). In response, the user agent invokes the XRFrameRequestCallback, which is defined by you.
Inside your callback function:
1. Call XRSession.requestAnimationFrame() again.
2. Get the viewer's pose.
3. Pass ('bind') the WebGLFramebuffer from the XRWebGLLayer to the WebGLRenderingContext.
4. Iterate over each XRView object, retrieving its XRViewport from the XRWebGLLayer and passing it to the WebGLRenderingContext.
5. Draw something to the framebuffer.

Because steps 1 and 2a were covered in the previous article, I'll start at step 2b.

Get the viewer's pose

It probably goes without saying. To draw anything in AR or VR, I need to know where the viewer is and where they're looking. The viewer's position and orientation are provided by an XRViewerPose object. I get the viewer's pose by calling XRFrame.getViewerPose() on the current animation frame. I pass it the reference space I acquired when I set up the session. The values returned by this object are always relative to the reference space I requested when I entered the current session. As you may recall, I have to pass the current reference space when requesting the pose.

function onXRFrame(hrTime, xrFrame) {
  let xrSession = xrFrame.session;
  xrSession.requestAnimationFrame(onXRFrame);
  let xrViewerPose = xrFrame.getViewerPose(xrRefSpace);
  if (xrViewerPose) {
    // Render based on the pose.
  }
}

There's one viewer pose that represents the user's overall position, meaning either the viewer's head or the phone camera. The pose tells your application where the viewer is. Actual image rendering uses XRView objects, which I'll get to in a bit.

Before moving on, I test whether the viewer pose was returned in case the system loses tracking or blocks the pose for privacy reasons. Tracking is the XR device's ability to know where it and its input devices are relative to the environment. Tracking can be lost in several ways, and varies depending on the method used for tracking. For example, if cameras on the headset or phone are used for tracking the device may lose its ability to determine where it is in situations with low or no light, or if the cameras are covered.

An example of blocking the pose for privacy reasons is if the headset is showing a security dialog such as a permission prompt, the browser may stop providing poses to the application while this is happening. But I"ve already called XRSession.requestAnimationFrame() so that if the system can recover, the frame loop will continue. If not, the user agent will end the session and call the end event handler.

A short detour

The next step requires objects created during session set-up. Recall that I created a canvas and instructed it to create an XR-compatible Web GL rendering context, which I got by calling canvas.getContext(). All drawing is done using the WebGL API, the WebGL2 API, or a WebGL-based framework such as Three.js. This context was passed to the session object with updateRenderState(), in addition to a new instance of XRWebGLLayer.

let canvas = document.createElement('canvas');
// The rendering context must be based on WebGL or WebGL2
let webGLRenContext = canvas.getContext('webgl', { xrCompatible: true });
xrSession.updateRenderState({
    baseLayer: new XRWebGLLayer(xrSession, webGLRenContext)
  });

Pass ('bind') the WebGLFramebuffer

The XRWebGLLayer provides a framebuffer for the WebGLRenderingContext provided specifically for use with WebXR and replacing the rendering contexts default framebuffer. This is called 'binding' in the language of WebGL.

function onXRFrame(hrTime, xrFrame) {
  let xrSession = xrFrame.session;
  xrSession.requestAnimationFrame(onXRFrame);
  let xrViewerPose = xrFrame.getViewerPose(xrRefSpace);
  if (xrViewerPose) {
    let glLayer = xrSession.renderState.baseLayer;
    webGLRenContext.bindFramebuffer(webGLRenContext.FRAMEBUFFER, glLayer.framebuffer);
    // Iterate over the views
  }
}

Iterate over each XRView object

After getting the pose and binding the framebuffer, it's time to get the viewports. The XRViewerPose contains an array of XRView interfaces each of which represents a display or a portion of a display. They contain information that's needed to render content that's correctly positioned for the device and the viewer such as the field of view, eye offset, and other optical properties. Since I'm drawing for two eyes, I have two views, which I loop through and draw a separate image for each.

When implementing for phone-based augmented reality, I would have only one view but I'd still use a loop. Though it may seem pointless to iterate through one view, doing so lets you have a single rendering path for a spectrum of immersive experiences. This is an important difference between WebXR and other immersive systems.

function onXRFrame(hrTime, xrFrame) {
  let xrSession = xrFrame.session;
  xrSession.requestAnimationFrame(onXRFrame);
  let xrViewerPose = xrFrame.getViewerPose(xrRefSpace);
  if (xrViewerPose) {
    let glLayer = xrSession.renderState.baseLayer;
    webGLRenContext.bindFramebuffer(webGLRenContext.FRAMEBUFFER, glLayer.framebuffer);
    for (let xrView of xrViewerPose.views) {
      // Pass viewports to the context
    }
  }
}

Pass the XRViewport object to the WebGLRenderingContext

An XRView object refers to what's observable on a screen. But to draw to that view I need coordinates and dimensions that are specific to my device. As with the framebuffer, I request them from the XRWebGLLayer and pass them to the WebGLRenderingContext.

function onXRFrame(hrTime, xrFrame) {
  let xrSession = xrFrame.session;
  xrSession.requestAnimationFrame(onXRFrame);
  let xrViewerPose = xrFrame.getViewerPose(xrRefSpace);
  if (xrViewerPose) {
    let glLayer = xrSession.renderState.baseLayer;
    webGLRenContext.bindFramebuffer(webGLRenContext.FRAMEBUFFER, glLayer.framebuffer);
    for (let xrView of xrViewerPose.views) {
      let viewport = glLayer.getViewport(xrView);
      webGLRenContext.viewport(viewport.x, viewport.y, viewport.width, viewport.height);
      // Draw something to the framebuffer
    }
  }
}

The webGLRenContext

In writing, I had a debate with a few colleagues over the naming of the webGLRenContext object. The sample scripts and most WebXR code calls this variable gl. When I was working to understand the samples, I kept forgetting what gl referred to. I've called it webGLRenContext to remind you while your learning that this is an instance of WebGLRenderingContext.

The reason is that using gl allows method names to look like their counterparts in the OpenGL ES 2.0 API, used for creating VR in compiled languages. This fact is obvious if you've written VR apps using OpenGL, but confusing if you're completely new to this technology.

Draw something to the framebuffer

If you're feeling really ambitious, you can use WebGL directly, but I don't recommend that. It's much simpler to use one of the frameworks listed at the top.

Conclusion

This is not the end of WebXR updates or articles. You can find a reference for all of WebXR's interfaces and members at MDN. For upcoming enhancements to the interfaces themselves, follow individual features on Chrome Status.

Virtual reality comes to the web, part II Stay organized with collections Save and categorize content based on your preferences.