Chrome is back at Google I/O May 19-20! Register now

Build agent-friendly websites

Kasper Kulikowski

Omkar More

Your website has a new type of visitor. Some human users are pivoting from manual navigation to delegating goal-oriented journeys to AI agents. Those autonomous systems can interpret input, plan, and execute actions on behalf of a user.

However, many of our websites are designed to be beautiful for humans, with complex hover-states, shifting layouts, and fluid motion. This is functionally broken for agents.

How agents view your site

Agents don't look at your website on a monitor. They operate on a machine-readable representation of your site. The quality of this representation determines their performance.

Agents can view your website in 3 primary ways: screenshots, raw HTML, and the accessibility tree.

Screenshots

The agent takes a snapshot of the rendered page and uses a vision model to identify elements. Based on the screenshot, the agent can recognize that a search bar at the top-right is a global search, while a box in the middle is likely a form field. Visual cues can be helpful, as agents can use color, size, and proximity to determine importance. A big Delete button will likely be interpreted with more caution than a small "Help" link. However analyzing screenshots can be slow and expensive (in terms of used tokens), making it better as a backup when the structure is confusing.

HTML

The agent analyzes the DOM and reads the HTML. It understands how elements are nested, the logical hierarchy of the DOM tree, attributes like IDs and classes that define structure, and raw data strings that form the site's informational backbone. This helps the agent understand the relationship between elements. If a "Buy Now" button is inside a product container, the agent assumes that button belongs to that specific product.

Accessibility tree

The accessibility tree is a browser-native API distills the DOM into what's most important: roles, names, and states of interactive elements. It's the page's semantic summary, used by assistive technology. For an AI agent, it functions as a high-fidelity map that ignores the visual "noise" of CSS to focus on pure utility. By interpreting this tree, an agent can learn the functional intent of every toggle, slider, and input field.

Combined modalities

Relying on a single input creates a semantic gap. For example, in the DOM, an agent might see a <div> without knowing you've actually configured this as a functional button with CSS and JavaScript. With screenshots, it's possible an agent may identify where that button sits on the screen, but it's still unaware of the button's intended destination or action that it's designed to trigger.

Modern agents, therefore, combine multiple modalities. They use the DOM and accessibility tree to get a clean, structured list of interactive elements, and then cross-reference that with a visual rendering to understand layout, grouping, and visual cues.

Our job is to provide clean signals across all these channels.