BILIBILI taps MediaPipe's on-device web AI solution to improve video stream UX and boost session duration by +30%

Cecilia Cong
Cecilia Cong
Tyler Mullen
Tyler Mullen

BILIBILI, one of the premier entertainment content platforms in Greater China and Southeast Asia, hosts a massive database of user-generated content, live broadcasts, and gaming experiences that attracts more than 330 million monthly active users (MAU).

One of the distinctive features of BILIBILI's platform is the integration of bullet-screen comments, a popular feature in Japan and China that displays real-time viewer feedback as scrolling text across video streams. Bullet-screen comments add an exciting and immersive element to live video content, keeping viewers actively engaged by letting them express their own thoughts and respond to other viewers' reactions in real time.

The challenge

While bullet-screen comments are an engaging way for viewers to interact with content, it's important to keep the speaker's portrait unobstructed for the best user experience. In the following video, bullet-screen comments can be disruptive and discourage viewers from continuing to watch.

Original state: Initial videos show a person speaking, with comments scrolling across the screen, over the speaker's face.

To enable bullet-screen comments that seamlessly flow behind a speaker's portrait, you need accurate machine learning segmentation, which can be difficult to run efficiently on-device. That's why, historically, such powerful features would need to be supported server-side.

Given how much content BILIBILI serves on a daily basis, processing large portions of it server-side would be very expensive. So, their development team needed to find a client-side solution to reduce the cost. A further challenge is that moving to client-side machine learning makes it difficult to keep CPU usage from increasing to the point where performance is hindered.

Goal: In the end, BILIBILI wanted the bullet-screen comments to scroll from right to left behind the speaker, so as not to block the speaker's face.

The solution: On-device image segmentation

To address these challenges, BILIBILI's developers leveraged Body Segmentation with MediaPipe and TensorFlow.js, a predecessor of MediaPipe's Image Segmenter. This provided an efficient on-device segmentation API, as well as pretrained models for selfie and multi-object segmentation.

BILIBILI can now quickly iterate and support the feature, while reducing costs and maintaining performance.


Here's how BILIBILI implemented this solution:

  1. Real-time character outlines: BILIBILI used the Selfie Segmenter model to extract the outline of characters throughout the video. This allowed them to create a mask that delineated the boundaries of characters.
  2. Integration with the live comment layer: Then, they merged the extracted character outline with the live comment layer using CSS mask-image properties. By setting the central character area as transparent, the bullet-screen comments can seamlessly appear behind the characters without obstructing them.
    A blue character in a rectangular box points to another box with a gray character, representing the SVG mask. A plus sign with blue lines represents the addition of live comments. Together this equals blue lines behind a character outline, representing comments flowing behind the character.
    A diagram showing how BILIBILI's developers extracted a character outline from a video element and integrated it with a live comment layer using real-time computing by MediaPipe.
  3. Optimize the implementation: BILIBILI needed to test and ensure the implementation didn't degrade performance.

Performance optimization

BILIBILI's developers used OffscreenCanvas and Web Workers to move time-consuming tasks to workers, to avoid occupying the main thread. Then, they reduced the size of the mask, as it's only needed to extract the character outline and is not dependent on image size or quality.

After reducing the mask size, BILIBILI's development team stretched the mask during composition and merged it with the DOM layer to reduce the rendering pressure as much as possible.

A blue character in a box points to a mini identical image. A dotted line points to a small black box with a transparent character. The small black box points to an identical larger box. This minimization process plus live comments, represented by blue lines, is equal to the merged results of comments flowing behind the character.
A diagram demonstrating how BILIBILI minimized the mask size and merged it with a stretched mask.

Increased session duration and click-through rates

By combining the reach and power of the web with the flexibility of MediaPipe's AI solutions, BILIBILI successfully delivered a powerful and engaging web app experience to millions of users. And, within just one month of implementation, BILIBILI saw a notable 30% increase in session duration and a 19% improvement in click-through rate of live streaming videos.

    30 %

    Increased session duration

    19 %

    Higher CTR

With MediaPipe's free, on-device web AI solutions, BILIBILI's developers could efficiently retain crucial visual elements while keeping viewers engaged, ensuring smooth performance, and ultimately, delivering the premium video streaming experience that viewers expect from the platform leader.

Quote from Jun Liu, senior engineer at BILIBILI: MediaPipe's solution helped save us development costs without focusing on creating a portrait extraction model.