The Shape Detection API detects faces, barcodes, and text in images.
What is the Shape Detection API?
With APIs like navigator.mediaDevices.getUserMedia
and the Chrome for Android
photo picker,
it has become fairly easy to capture images or live video data from device
cameras, or to upload local images. So far, this dynamic image data—as well as
static images on a page—has not been accessible by code, even though
images may actually contain a lot of interesting features such as faces,
barcodes, and text.
For example, in the past, if developers wanted to extract such features on the
client to build a QR code reader, they had
to rely on external JavaScript libraries. This could be expensive from a
performance point of view and increase the overall page weight. On the other
hand, operating systems including Android, iOS, and macOS, but also hardware
chips found in camera modules, typically already have performant and highly
optimized feature detectors such as the Android
FaceDetector
or the iOS generic feature detector,
CIDetector
.
The Shape Detection API exposes these implementations through
a set of JavaScript interfaces. Currently, the supported features are face
detection through the FaceDetector
interface, barcode detection through the
BarcodeDetector
interface, and text detection (Optical Character
Recognition, (OCR)) through the TextDetector
interface.
Suggested use cases
As outlined above, the Shape Detection API currently supports the detection of faces, barcodes, and text. The following bullet list contains examples of use cases for all three features.
Face detection
- Online social networking or photo sharing sites commonly let their users annotate people in images. By highlighting the boundaries of detected faces, this task can be facilitated.
- Content sites can dynamically crop images based on potentially detected faces rather than relying on other heuristics, or highlight detected faces with Ken Burns-like panning and zooming effects in story-like formats.
- Multimedia messaging sites can allow their users to overlay funny objects like sunglasses or mustaches on detected face landmarks.
Barcode detection
- Web applications that read QR codes can unlock interesting use cases like online payments or web navigation, or use barcodes for establishing social connections on messenger applications.
- Shopping apps can allow their users to scan EAN or UPC barcodes of items in a physical store to compare prices online.
- Airports can provide web kiosks where passengers can scan their boarding passes' Aztec codes to show personalized information related to their flights.
Text detection
- Online social networking sites can improve the accessibility of
user-generated image content by adding detected texts as
alt
attributes for<img>
tags when no other descriptions are provided. - Content sites can use text detection to avoid placing headings on top of hero images with contained text.
- Web applications can use text detection to translate texts such as, for example, restaurant menus.
Current status
Step | Status |
---|---|
1. Create explainer | Complete |
2. Create initial draft of specification | Complete |
3. Gather feedback & iterate on design | In progress |
4. Origin trial | Complete |
5. Launch | Barcode detection Complete |
Face Detection In Progress | |
Text Detection In Progress |
How to use the Shape Detection API
If you want to experiment with the Shape Detection API locally,
enable the #enable-experimental-web-platform-features
flag in about://flags
.
The interfaces of all three detectors, FaceDetector
, BarcodeDetector
, and
TextDetector
, are similar. They all provide a single asynchronous method
called detect()
that takes an
ImageBitmapSource
as an input (that is, either a
CanvasImageSource
, a
Blob
, or
ImageData
).
For FaceDetector
and BarcodeDetector
, optional parameters can be passed
to the detector's constructor that allow for providing hints to the
underlying detectors.
Please carefully check the support matrix in the explainer for an overview of the different platforms.
Working with the BarcodeDetector
The BarcodeDetector
returns the barcode raw values it finds in the
ImageBitmapSource
and the bounding boxes, as well as other information like
the formats of the detected barcodes.
const barcodeDetector = new BarcodeDetector({
// (Optional) A series of barcode formats to search for.
// Not all formats may be supported on all platforms
formats: [
'aztec',
'code_128',
'code_39',
'code_93',
'codabar',
'data_matrix',
'ean_13',
'ean_8',
'itf',
'pdf417',
'qr_code',
'upc_a',
'upc_e'
]
});
try {
const barcodes = await barcodeDetector.detect(image);
barcodes.forEach(barcode => searchProductDatabase(barcode));
} catch (e) {
console.error('Barcode detection failed:', e);
}
Working with the FaceDetector
The FaceDetector
always returns the bounding boxes of faces it detects in
the ImageBitmapSource
. Depending on the platform, more information
regarding face landmarks like eyes, nose, or mouth may be available.
It is important to note that this API only detects faces.
It does not identify who a face belongs to.
const faceDetector = new FaceDetector({
// (Optional) Hint to try and limit the amount of detected faces
// on the scene to this maximum number.
maxDetectedFaces: 5,
// (Optional) Hint to try and prioritize speed over accuracy
// by, e.g., operating on a reduced scale or looking for large features.
fastMode: false
});
try {
const faces = await faceDetector.detect(image);
faces.forEach(face => drawMustache(face));
} catch (e) {
console.error('Face detection failed:', e);
}
Working with the TextDetector
The TextDetector
always returns the bounding boxes of the detected texts,
and on some platforms the recognized characters.
const textDetector = new TextDetector();
try {
const texts = await textDetector.detect(image);
texts.forEach(text => textToSpeech(text));
} catch (e) {
console.error('Text detection failed:', e);
}
Feature detection
Purely checking for the existence of the constructors to feature detect the Shape Detection API doesn't suffice. The presence of an interface doesn't tell you whether the underlying platform supports the feature. This is working as intended. It's why we recommend a defensive programming approach by doing feature detection like this:
const supported = await (async () => 'FaceDetector' in window &&
await new FaceDetector().detect(document.createElement('canvas'))
.then(_ => true)
.catch(e => e.name === 'NotSupportedError' ? false : true))();
The BarcodeDetector
interface has been updated to include a getSupportedFormats()
method
and similar interfaces have been proposed
for FaceDetector
and
for TextDetector
.
await BarcodeDetector.getSupportedFormats();
/* On a macOS computer logs
[
"aztec",
"code_128",
"code_39",
"code_93",
"data_matrix",
"ean_13",
"ean_8",
"itf",
"pdf417",
"qr_code",
"upc_e"
]
*/
This allows you to detect the specific feature you need, for example, QR code scanning:
if (('BarcodeDetector' in window) &&
((await BarcodeDetector.getSupportedFormats()).includes('qr_code'))) {
console.log('QR code scanning is supported.');
}
This is better than hiding the interfaces because even across platforms, capabilities may vary and so developers should be encouraged to check for precisely the capability (such as a particular barcode format or facial landmark) they require.
Operating system support
Barcode detection is available on macOS, ChromeOS, and Android. Google Play Services are required on Android.
Best practices
All detectors work asynchronously, that is, they do not block the main thread. So don't rely on realtime detection, but rather allow for some time for the detector to do its work.
If you are a fan of
Web Workers,
you'll be happy to know that detectors are exposed there as well.
Detection results are serializable and can thus be passed from the worker
to the main app via postMessage()
. The demo shows this in action.
Not all platform implementations support all features, so be sure to check the support situation carefully and use the API as a progressive enhancement. For example, some platforms might support face detection per se, but not face landmark detection (eyes, nose, mouth, etc.); or the existence and the location of text may be recognized, but not text contents.
Feedback
The Chrome team and the web standards community want to hear about your experiences with the Shape Detection API.
Tell us about the API design
Is there something about the API that doesn't work like you expected? Or are there missing methods or properties that you need to implement your idea? Have a question or comment on the security model?
- File a spec issue on the Shape Detection API GitHub repo, or add your thoughts to an existing issue.
Problem with the implementation?
Did you find a bug with Chrome's implementation? Or is the implementation different from the spec?
- File a bug at https://new.crbug.com. Be sure to include as
much detail as you can, simple instructions for reproducing, and set
Components to
Blink>ImageCapture
. Glitch works great for sharing quick and easy repros.
Planning to use the API?
Planning to use the Shape Detection API on your site? Your public support helps us to prioritize features, and shows other browser vendors how critical it is to support them.
- Share how you plan to use it on the WICG Discourse thread.
- Send a tweet to @ChromiumDev using the hashtag
#ShapeDetection
and let us know where and how you're using it.
Helpful links
- Public explainer
- API Demo | API Demo source
- Tracking bug
- ChromeStatus.com entry
- Blink Component:
Blink>ImageCapture