Lighthouse flags invalid
Most Lighthouse audits only apply to the page that you're currently on. However, since
robots.txt is defined at the host-name level, this audit applies to your entire domain (or subdomain).
robots.txt is not valid audit in your report to learn what's wrong with your
Common errors include:
No user-agent specified
Pattern should either be empty, start with "/" or "*"
Invalid sitemap URL
$ should only be used at the end of the pattern
Lighthouse doesn't check that your
robots.txt file is in the correct location. To function correctly, the file must be in the root of your domain or subdomain.
robots.txtdoesn't return an HTTP 5XX status code #
If your server returns a server error (an HTTP status code in the 500s) for
robots.txt, search engines won't know which pages should be crawled. They may stop crawling your entire site, which would prevent new content from being indexed.
To check the HTTP status code, open
robots.txt in Chrome and check the request in Chrome DevTools.
robots.txtsmaller than 500 KiB #
Search engines may stop processing
robots.txt midway through if the file is larger than 500 KiB. This can confuse the search engine, leading to incorrect crawling of your site.
robots.txt small, focus less on individually excluded pages and more on broader patterns. For example, if you need to block crawling of PDF files, don't disallow each individual file. Instead, disallow all URLs containing
disallowvalues are either empty or start with
$in the middle of a value (for example,
User-agent names to tell search engine crawlers which directives to follow. You must provide a value for each instance of
user-agent so search engines know whether to follow the associated set of directives.
To specify a particular search engine crawler, use a user-agent name from its published list. (For example, here's Google's list of user-agents used for crawling.)
* to match all otherwise unmatched crawlers.
User-agent names define the sections of your
robots.txt file. Search engine crawlers use those sections to determine which directives to follow. Placing a directive before the first user-agent name means that no crawlers will follow it.
Search engine crawlers only follow directives in the section with the most specific user-agent name. For example, if you have directives for
user-agent: * and
user-agent: Googlebot-Image, Googlebot Images will only follow the directives in the
user-agent: Googlebot-Image section.
Sitemap files are a great way to let search engines know about pages on your website. A sitemap file generally includes a list of the URLs on your website, together with information about when they were last changed.
If you choose to submit a sitemap file in
robots.txt, make sure to use an absolute URL.
robots.txtis not valid audit