All about the frame loop
Recently, I published Virtual reality comes to the web, an article that introduced basic concepts behind the WebXR Device API. I also provided instructions for requesting, entering, and ending an XR session.
This article describes the frame loop, which is a user-agent controlled infinite loop in which content is repeatedly drawn to the screen. Content is drawn in discrete blocks called frames. The succession of frames creates the illusion of movement.
What this article is not
WebGL and WebGL2 are the only means of rendering content during a frame loop in a WebXR App. Fortunately many frameworks provide a layer of abstraction on top of WebGL and WebGL2. Such frameworks include three.js, babylonjs, and PlayCanvas, while A-Frame and React 360 was designed for interacting with WebXR.
This article is neither a WebGL nor a framework tutorial. It explains basics of a frame loop using the Immersive Web Working Group's Immersive VR Session sample (demo, source). If you want to dive into WebGL or one of the frameworks, the internet provides a growing list of articles.
The players and the game
When trying to understand the frame loop, I kept getting lost in the details. There's a lot of objects in play, and some of them are only named by reference properties on other objects. To help you keep it straight, I'll describe the objects, which I'm calling 'players'. Then I'll describe how they interact, which I'm calling 'the game'.
The players
XRViewerPose
A pose is the position and orientation of something in 3D space. Both viewers
and input devices have a pose, but it's the viewer's pose we're concerned with
here. Both viewer and input device poses have a transform
attribute describing
its position as a vector and its orientation as a quaternion relative to the
origin. The origin is specified based on the requested reference space type when
calling XRSession.requestReferenceSpace()
.
Reference spaces take a bit to explain. I cover them in depth in Augmented
reality. The sample I'm using as the basis for this
article uses a 'local'
reference space which means the origin is at the
viewer's position at the time of session creation without a well-defined floor,
and its precise position may vary by platform.
XRView
A view corresponds to a camera viewing the virtual scene. A view also has a
transform
attribute describing it's position as a vector and its orientation.
These are provided both as a vector/quaternion pair and as an equivalent matrix,
you can use either representation depending on which best fits your code. Each
view corresponds to a display or a portion of a display used by a device to
present imagery to the viewer. XRView
objects are returned in an array from
the XRViewerPose
object. The number of views in the array varies. On mobile
devices an AR scene has one view, which may or may not cover the device screen.
Headsets typically have two views, one for each eye.
XRWebGLLayer
Layers provide a source of bitmap images and descriptions of how those images
are to be rendered in the device. This description doesn't quite capture what
this player does. I've come to think of it as a middleman between a device and a
WebGLRenderingContext
. MDN takes much the same view, stating that it 'provides
a linkage' between the two. As such, it provides access to the other players.
In general, WebGL objects store state information for rendering 2D and 3D graphics.
WebGLFramebuffer
A framebuffer provides image data to the WebGLRenderingContext
. After
retrieving it from the XRWebGLLayer
, you simply pass it to the current
WebGLRenderingContext
. Other than calling bindFramebuffer()
(more about that
later) you will never access this object directly. You will merely pass it from
the XRWebGLLayer
to the WebGLRenderingContext.
XRViewport
A viewport provides the coordinates and dimensions of a rectangular region in
the WebGLFramebuffer
.
WebGLRenderingContext
A rendering context is a programmatic access point for a canvas (the space we're
drawing on). To do this, it needs both a WebGLFramebuffer
and an XRViewport.
Notice the relationship between XRWebGLLayer
and WebGLRenderingContext
. One
corresponds to the viewer's device and the other corresponds to the web page.
WebGLFramebuffer
and XRViewport
are passed from the former to the latter.
The game
Now that we know who the players are, let's look at the game they play. It's a game that starts over with every frame. Recall that frames are part of a frame loop that happens at a rate that depends on the underlying hardware. For VR applications the frames per second can be anywhere from 60 to 144. AR for Android runs at 30 frames per second. Your code should not assume any particular frame rate.
The basic process for the frame loop looks like this:
- Call
XRSession.requestAnimationFrame()
. In response, the user agent invokes theXRFrameRequestCallback
, which is defined by you. - Inside your callback function:
- Call
XRSession.requestAnimationFrame()
again. - Get the viewer's pose.
- Pass ('bind') the
WebGLFramebuffer
from theXRWebGLLayer
to theWebGLRenderingContext
. - Iterate over each
XRView
object, retrieving itsXRViewport
from theXRWebGLLayer
and passing it to theWebGLRenderingContext
. - Draw something to the framebuffer.
- Call
Because steps 1 and 2a were covered in the previous article, I'll start at step 2b.
Get the viewer's pose
It probably goes without saying. To draw anything in AR or VR, I need to know
where the viewer is and where they're looking. The viewer's position and
orientation are provided by an XRViewerPose
object. I
get the viewer's pose by calling XRFrame.getViewerPose()
on the current
animation frame. I pass it the reference space I acquired when I set up the
session. The values returned by this object are always relative to the reference
space I requested when I entered the current
session. As you may
recall, I have to pass the current reference space when requesting the pose.
function onXRFrame(hrTime, xrFrame) {
let xrSession = xrFrame.session;
xrSession.requestAnimationFrame(onXRFrame);
let xrViewerPose = xrFrame.getViewerPose(xrRefSpace);
if (xrViewerPose) {
// Render based on the pose.
}
}
There's one viewer pose that represents the user's overall position, meaning
either the viewer's head or the phone camera in the case of a smartphone.
The pose tells your application where the viewer is. Actual image rendering uses
XRView
objects, which I'll get to in a bit.
Before moving on, I test whether the viewer pose was returned in case the system loses tracking or blocks the pose for privacy reasons. Tracking is the XR device's ability to know where it and/or it's input devices are relative to the environment. Tracking can be lost in several ways, and varies depending on the method used for tracking. For example, if cameras on the headset or phone are used for tracking the device may lose its ability to determine where it is in situations with low or no light, or if the cameras are covered.
An example of blocking the pose for privacy reasons is if the headset is showing
a security dialog such as a permission prompt, the browser may stop providing
poses to the application while this is happening. But I"ve already called
XRSession.requestAnimationFrame()
so that if the system can recover, the frame
loop will continue. If not, the user agent will end the session and call the
end
event handler.
A short detour
The next step requires objects created during session
set-up. Recall that
I created a canvas and instructed it to create an XR-compatible Web GL rendering
context, which I got by calling canvas.getContext()
. All drawing is done using
the WebGL API, the WebGL2 API, or a WebGL-based framework such as Three.js. This
context was passed to the session object via updateRenderState()
, along with a
new instance of XRWebGLLayer
.
let canvas = document.createElement('canvas');
// The rendering context must be based on WebGL or WebGL2
let webGLRenContext = canvas.getContext('webgl', { xrCompatible: true });
xrSession.updateRenderState({
baseLayer: new XRWebGLLayer(xrSession, webGLRenContext)
});
Pass ('bind') the WebGLFramebuffer
The XRWebGLLayer
provides a framebuffer for the WebGLRenderingContext
provided specifically for use with WebXR and replacing the rendering contexts
default framebuffer. This is called 'binding' in the language of WebGL.
function onXRFrame(hrTime, xrFrame) {
let xrSession = xrFrame.session;
xrSession.requestAnimationFrame(onXRFrame);
let xrViewerPose = xrFrame.getViewerPose(xrRefSpace);
if (xrViewerPose) {
let glLayer = xrSession.renderState.baseLayer;
webGLRenContext.bindFramebuffer(webGLRenContext.FRAMEBUFFER, glLayer.framebuffer);
// Iterate over the views
}
}
Iterate over each XRView object
After getting the pose and binding the framebuffer, it's time to get the
viewports. The XRViewerPose
contains an array of XRView interfaces each of
which represents a display or a portion of a display. They contain information
that's needed to render content that's correctly positioned for the device and
the viewer such as the field of view, eye offset, and other optical properties.
Since I'm drawing for two eyes, I have two views, which I loop through and draw
a separate image for each.
When implementing for phone-based augmented reality, I would have only one view but I'd still use a loop. Though it may seem pointless to iterate through one view, doing so allows you to have a single rendering path for a spectrum of immersive experiences. This is an important difference between WebXR and other immersive systems.
function onXRFrame(hrTime, xrFrame) {
let xrSession = xrFrame.session;
xrSession.requestAnimationFrame(onXRFrame);
let xrViewerPose = xrFrame.getViewerPose(xrRefSpace);
if (xrViewerPose) {
let glLayer = xrSession.renderState.baseLayer;
webGLRenContext.bindFramebuffer(webGLRenContext.FRAMEBUFFER, glLayer.framebuffer);
for (let xrView of xrViewerPose.views) {
// Pass viewports to the context
}
}
}
Pass the XRViewport object to the WebGLRenderingContext
An XRView
object refers to what's observable on a screen. But to draw to that
view I need coordinates and dimensions that are specific to my device. As with
the framebuffer, I request them from the XRWebGLLayer
and pass them to the
WebGLRenderingContext
.
function onXRFrame(hrTime, xrFrame) {
let xrSession = xrFrame.session;
xrSession.requestAnimationFrame(onXRFrame);
let xrViewerPose = xrFrame.getViewerPose(xrRefSpace);
if (xrViewerPose) {
let glLayer = xrSession.renderState.baseLayer;
webGLRenContext.bindFramebuffer(webGLRenContext.FRAMEBUFFER, glLayer.framebuffer);
for (let xrView of xrViewerPose.views) {
let viewport = glLayer.getViewport(xrView);
webGLRenContext.viewport(viewport.x, viewport.y, viewport.width, viewport.height);
// Draw something to the framebuffer
}
}
}
The webGLRenContext
In writing this article I had a debate with a few collegues over the naming of
the webGLRenContext
object. The sample scripts and most WebXR code simpley
calls this variable gl
. When I was working to understand the samples, I kept
forgetting what gl
referred to. I've called it webGLRenContext
to remind you
while your learning that this is an instance of WebGLRenderingContext
.
The reason is that using gl
allows method names to look like their
counterparts in the OpenGL ES 2.0 API, used for creating VR in compiled
languages. This fact is obvious if you've written VR apps using OpenGL, but
confusing if you're completely new to this technology.
Draw something to the framebuffer
If you're feeling really ambitious, you can use WebGL directly, but I don't recommend that. It's much simpler to use one of the frameworks listed at the top.
Conclusion
This is not the end of WebXR updates or articles. You can find a reference for all of WebXR's interfaces and members at MDN. For upcoming enhancements to the interfaces themselves, follow individual features on Chrome Status.
Photo by JESHOOTS.COM on Unsplash