Published: January 13, 2024
This is the last in a three-part series on LLM chatbots. The previous articles discussed the power of client-side LLMs and walked you through adding a WebLLM-powered chatbot to a to-do list application.
Some newer devices ship with large language and other AI models right on the device. Chrome has proposed integrating Built-in AI APIs into the browser, with a number of APIs in different stages of development. Many of these APIs are going through the standards process, so that websites can use the same implementation and model to achieve the maximum inference performance.
The Prompt API is one such AI API. To use it, developers are encouraged to sign up for the early preview program. Once accepted, you'll receive instructions on how to enable the Prompt API in browsers. The Prompt API is available in an origin trial for Chrome Extensions, so you can test this API for real extension users.
Shared model access
The Prompt API behaves similarly to WebLLM. However, there is no model selection this time (you have to use the LLM that ships with the browser). When enabling Built-in AI, Chrome downloads Gemini Nano into the browser. This model can then be shared across multiple origins and runs with maximum performance. There is a GitHub issue where a developer has requested to add a model selection feature.
Set up the conversation
You can start the message conversation in the exact same way, but the Prompt API also offers a shorthand syntax to specify the system prompt. Start the language model session using the create()
method on the self.ai.languageModel
property:
const session = await self.ai.languageModel.create({
systemPrompt: `You are a helpful assistant. You will answer questions related
to the user's to-do list. Decline all other requests not related to the user's
todos. This is the to-do list in JSON: ${JSON.stringify(todos)}`,
});
Answer your first question
Instead of having a configuration object for configuring streaming, the Prompt API offers two separate methods:
prompt()
returns the full stringpromptStreaming()
returns an async iterable. In contrast with WebLLM, the Prompt API responds with the full string response, so you don't have to combine the results yourself.
If no other origin has triggered the model download before, your first request may take a very long time while Gemini Nano is downloaded into your browser. If the model is already available, inference starts immediately.
const stream = session.promptStreaming("How many open todos do I have?");
for await (const reply of stream) {
console.log(reply);
}
Demo
Summary
Integrating LLMs into applications can significantly enhance the user experience. While cloud services offer higher-quality models and high inference performance regardless of the user's device, on-device solutions, such as WebLLM and Chrome's Prompt API, are offline-capable, improve privacy, and save cost compared to cloud-based alternatives. Try out these new APIs and make your web applications smarter.