Questions? We have answers.

Alexa's Site Audit Crawler

Information regarding our privacy policy, company, and technology can be found on the following pages Privacy Policy and About Us.

Alexa uses the Common Crawl in order to discover backlinks and the Alexa web crawler to identify issues with your site’s SEO related to our Site Audit service. For more information about the Common Crawl, click here.

What if I don't want Alexa to crawl my site as part of the Site Audit service?

All you have to do is tell us using a robots.txt file.

Robots.txt is a file website administrators can place at the top level of a site to direct the behavior of web crawling robots. All of the major Web-crawlers such as Google, Yahoo, Bing and Baidu respect robots.txt.

The Alexa Site Audit crawler (robot) identifies itself as “ia_archiver” in the HTTP “User-agent” header field. The Alexa Internet ia_archiver crawler strictly adheres to robots.txt rules.

To prevent ia_archiver from visiting any part of your site, your robots.txt file should look like this:

User-agent: ia_archiver
Disallow: /

You can also restrict crawling of specific directories. For example, to prevent ia_archiver from visiting the images directory (and its subdirectories):

User-agent: ia_archiver
Disallow: /images/

To allow ia_archiver to visit your entire site add these lines to your robots.txt file:

User-agent: ia_archiver
Disallow:

For more information regarding robots, crawling, and robots.txt visit the Web Robots Pages at Robotstxt.org, an excellent source for the latest information on the Standard for Robots Exclusion.


Articles in this section