Positioning virtual objects in real-world views

The Hit Test API lets you position virtual items in a real-world view.

Joe Medley
Joe Medley

The WebXR Device API shipped last fall in Chrome 79. As stated then, Chrome's implementation of the API is a work in progress. Chrome is happy to announce that some of the work is finished. In Chrome 81, two new features have arrived:

This article covers the WebXR Hit Test API, a means of placing virtual objects in a real-world camera view.

In this article I assume you already know how to create an augmented reality session and that you know how to run a frame loop. If you're not familiar with these concepts, you should read the earlier articles in this series.

The immersive AR session sample

The code in this article is based on, but not identical to, that found in the Immersive Web Working Group's Hit Test sample (demo, source). This example lets you place virtual sunflowers on surfaces in the real world.

When you first open the app, you'll see a blue circle with a dot in the middle. The dot is the intersection between an imaginary line from your device to the point in the environment. It moves as you move the device. As it finds intersection points, it appears to snap to surfaces such as floors, table tops, and walls. It does this because hit testing provides the position and orientation of the intersection point, but nothing about the surfaces themselves.

This circle is called a reticle, which is a temporary image that aids in placing an object in augmented reality. If you tap the screen, a sunflower is placed on the surface at the reticle location and orientation of the reticle point, regardless of where you tapped the screen. The reticle continues to move with your device.

A reticle rendered on a wall, Lax, or Strict depending on their context
The reticle is a temporary image that aids in placing an object in augmented reality.

Create the reticle

You must create the reticle image yourself since it is not provided by the browser or the API. The method of loading and drawing it is framework specific. If you're not drawing it directly using WebGL or WebGL2 consult your framework documentation. For this reason, I won't go into detail about how the reticle is drawn in the sample. Below I show one line of it for one reason only: so that in later code samples, you'll know what I'm referring to when I use the reticle variable.

let reticle = new Gltf2Node({url: 'media/gltf/reticle/reticle.gltf'});

Request a session

When requesting a session, you must request 'hit-test' in the requiredFeatures array as shown below.

navigator.xr.requestSession('immersive-ar', {
  requiredFeatures: ['local', 'hit-test']
})
.then((session) => {
  // Do something with the session
});

Entering a session

In previous articles I've presented code for entering an XR session. I've shown a version of this below with some additions. First I've added the select event listener. When the user taps the screen, a flower will be placed in the camera view based on the pose of the reticle. I'll describe that event listener later.

function onSessionStarted(xrSession) {
  xrSession.addEventListener('end', onSessionEnded);
  xrSession.addEventListener('select', onSelect);

  let canvas = document.createElement('canvas');
  gl = canvas.getContext('webgl', { xrCompatible: true });

  xrSession.updateRenderState({
    baseLayer: new XRWebGLLayer(session, gl)
  });

  xrSession.requestReferenceSpace('viewer').then((refSpace) => {
    xrViewerSpace = refSpace;
    xrSession.requestHitTestSource({ space: xrViewerSpace })
    .then((hitTestSource) => {
      xrHitTestSource = hitTestSource;
    });
  });

  xrSession.requestReferenceSpace('local').then((refSpace) => {
    xrRefSpace = refSpace;
    xrSession.requestAnimationFrame(onXRFrame);
  });
}

Multiple reference spaces

Notice that the highlighted code calls XRSession.requestReferenceSpace() twice. I initially found this confusing. I asked why does the hit test code not request an animation frame (starting the frame loop) and why does the frame loop seem to not involve hit tests. The source of the confusion was a misunderstanding of reference spaces. Reference spaces express relationships between an origin and the world.

To understand what this code is doing, pretend that you're viewing this sample using a standalone rig, and you have both a headset and a controller. To measure distances from the controller, you would use a controller-centered frame of reference. But to draw something to the screen you would use user-centered coordinates.

In this sample, the viewer and the controller are the same device. But I have a problem. What I draw must be stable with regard to the environment, but the 'controller' I'm drawing with is moving.

For image drawing, I use the local reference space, which gives me stability in terms of the environment. After getting this I start the frame loop by calling requestAnimationFrame().

For hit testing, I use the viewer reference space, which is based on the device's pose at the time of the hit test. The label 'viewer' is somewhat confusing in this context because I'm talking about a controller. It makes sense if you think of the controller as an electronic viewer. After getting this, I call xrSession.requestHitTestSource(), which creates the source of hit test data that I'll use when drawing.

Running a frame loop

The requestAnimationFrame() callback also gets new code to handle hit testing.

As you move your device, the reticle needs to move with it as it tries to find surfaces. To create the illusion of movement, redraw the reticle in every frame. But don't show the reticle if the hit test fails. So, for the reticle I created earlier, I set it's visible property to false.

function onXRFrame(hrTime, xrFrame) {
  let xrSession = xrFrame.session;
  xrSession.requestAnimationFrame(onXRFrame);
  let xrViewerPose = xrFrame.getViewerPose(xrRefSpace);

  reticle.visible = false;

  // Reminder: the hitTestSource was acquired during onSessionStart()
  if (xrHitTestSource && xrViewerPose) {
    let hitTestResults = xrFrame.getHitTestResults(xrHitTestSource);
    if (hitTestResults.length > 0) {
      let pose = hitTestResults[0].getPose(xrRefSpace);
      reticle.visible = true;
      reticle.matrix = pose.transform.matrix;
    }
  }

  // Draw to the screen
}

To draw anything in AR, I need to know where the viewer is and where they're looking. So I test that hitTestSource and the xrViewerPose are still valid.

function onXRFrame(hrTime, xrFrame) {
  let xrSession = xrFrame.session;
  xrSession.requestAnimationFrame(onXRFrame);
  let xrViewerPose = xrFrame.getViewerPose(xrRefSpace);

  reticle.visible = false;

  // Reminder: the hitTestSource was acquired during onSessionStart()
  if (xrHitTestSource && xrViewerPose) {
    let hitTestResults = xrFrame.getHitTestResults(xrHitTestSource);
    if (hitTestResults.length > 0) {
      let pose = hitTestResults[0].getPose(xrRefSpace);
      reticle.visible = true;
      reticle.matrix = pose.transform.matrix;
    }
  }

  // Draw to the screen
}

Now I call getHitTestResults(). It takes the hitTestSource as an argument and returns an array of HitTestResult instances. The hit test may find multiple surfaces. The first one in the array is the one closest to the camera. Most of the time you will use it, but an array is returned for advanced use cases. For example, imagine your camera is pointed at a box on a table on a floor. It's possible that the hit test will return all three surfaces in the array. In most cases, it will be the box that I care about. If the length of the returned array is 0, in other words, if no hit test is returned, continue onward. Try again in the next frame.

function onXRFrame(hrTime, xrFrame) {
  let xrSession = xrFrame.session;
  xrSession.requestAnimationFrame(onXRFrame);
  let xrViewerPose = xrFrame.getViewerPose(xrRefSpace);

  reticle.visible = false;

  // Reminder: the hitTestSource was acquired during onSessionStart()
  if (xrHitTestSource && xrViewerPose) {
    let hitTestResults = xrFrame.getHitTestResults(xrHitTestSource);
    if (hitTestResults.length > 0) {
      let pose = hitTestResults[0].getPose(xrRefSpace);
      reticle.visible = true;
      reticle.matrix = pose.transform.matrix;
    }
  }

  // Draw to the screen
}

Finally, I need to process the hit test results. The basic process is this. Get a pose from the hit test result, transform (move) the reticle image to the hit test position, then set its visible property to true. The pose represents the pose of a point on a surface.

function onXRFrame(hrTime, xrFrame) {
  let xrSession = xrFrame.session;
  xrSession.requestAnimationFrame(onXRFrame);
  let xrViewerPose = xrFrame.getViewerPose(xrRefSpace);

  reticle.visible = false;

  // Reminder: the hitTestSource was acquired during onSessionStart()
  if (xrHitTestSource && xrViewerPose) {
    let hitTestResults = xrFrame.getHitTestResults(xrHitTestSource);
    if (hitTestResults.length > 0) {
      let pose = hitTestResults[0].getPose(xrRefSpace);
      reticle.matrix = pose.transform.matrix;
      reticle.visible = true;

    }
  }

  // Draw to the screen
}

Placing an object

An object is placed in AR when the user taps the screen. I already added a select event handler to the session. (See above.)

The important thing in this step is knowing where to place it. Since the moving reticle gives you a constant source of hit tests, the simplest way to place an object is to draw it at the location of the reticle at the last hit test.

function onSelect(event) {
  if (reticle.visible) {
    // The reticle should already be positioned at the latest hit point,
    // so we can just use its matrix to save an unnecessary call to
    // event.frame.getHitTestResults.
    addARObjectAt(reticle.matrix);
  }
}

Conclusion

The best way to get a handle on this is to step through the sample code or try out the codelab. I hope I've given you enough background to make sense of both.

We're not done building immersive web APIs, not by a long shot. We'll publish new articles here as we make progress.

Photo by Daniel Frank on Unsplash