BILIBILI, one of the premier entertainment content platforms in Greater China and Southeast Asia, hosts a massive database of user-generated content, live broadcasts, and gaming experiences that attracts more than 330 million monthly active users (MAU).
One of the distinctive features of BILIBILI's platform is the integration of bullet-screen comments, a popular feature in Japan and China that displays real-time viewer feedback as scrolling text across video streams. Bullet-screen comments add an exciting and immersive element to live video content, keeping viewers actively engaged by letting them express their own thoughts and respond to other viewers' reactions in real time.
The challenge
While bullet-screen comments are an engaging way for viewers to interact with content, it's important to keep the speaker's portrait unobstructed for the best user experience. In the following video, bullet-screen comments can be disruptive and discourage viewers from continuing to watch.
To enable bullet-screen comments that seamlessly flow behind a speaker's portrait, you need accurate machine learning segmentation, which can be difficult to run efficiently on-device. That's why, historically, such powerful features would need to be supported server-side.
Given how much content BILIBILI serves on a daily basis, processing large portions of it server-side would be very expensive. So, their development team needed to find a client-side solution to reduce the cost. A further challenge is that moving to client-side machine learning makes it difficult to keep CPU usage from increasing to the point where performance is hindered.
The solution: On-device image segmentation
To address these challenges, BILIBILI's developers leveraged Body Segmentation with MediaPipe and TensorFlow.js, a predecessor of MediaPipe's Image Segmenter. This provided an efficient on-device segmentation API, as well as pretrained models for selfie and multi-object segmentation.
BILIBILI can now quickly iterate and support the feature, while reducing costs and maintaining performance.
Implementation
Here's how BILIBILI implemented this solution:
- Real-time character outlines: BILIBILI used the Selfie Segmenter model to extract the outline of characters throughout the video. This allowed them to create a mask that delineated the boundaries of characters.
- Integration with the live comment layer: Then, they merged the extracted
character outline with the live comment layer using CSS
mask-image
properties. By setting the central character area as transparent, the bullet-screen comments can seamlessly appear behind the characters without obstructing them. - Optimize the implementation: BILIBILI needed to test and ensure the implementation didn't degrade performance.
Performance optimization
BILIBILI's developers used OffscreenCanvas and Web Workers to move time-consuming tasks to workers, to avoid occupying the main thread. Then, they reduced the size of the mask, as it's only needed to extract the character outline and is not dependent on image size or quality.
After reducing the mask size, BILIBILI's development team stretched the mask during composition and merged it with the DOM layer to reduce the rendering pressure as much as possible.
Increased session duration and click-through rates
By combining the reach and power of the web with the flexibility of MediaPipe's AI solutions, BILIBILI successfully delivered a powerful and engaging web app experience to millions of users. And, within just one month of implementation, BILIBILI saw a notable 30% increase in session duration and a 19% improvement in click-through rate of live streaming videos.
30 %
Increased session duration
19 %
Higher CTR
With MediaPipe's free, on-device web AI solutions, BILIBILI's developers could efficiently retain crucial visual elements while keeping viewers engaged, ensuring smooth performance, and ultimately, delivering the premium video streaming experience that viewers expect from the platform leader.