Alexa uses the Common Crawl in order to discover backlinks and the Alexa web crawler to identify issues with your site’s SEO related to our Site Audit service. The Alexa web crawler will not index anything you would like to remain private. For more information about the Common Crawl, click here.
What if I don't want Alexa to crawl my site as part of the Site Audit service?
All you have to do is tell us using a robots.txt file.
Robots.txt is a file website administrators can place at the top level of a site to direct the behavior of web crawling robots. All of the major Web-crawlers such as Google, Yahoo, Bing and Baidu respect robots.txt.
The Alexa web crawler (robot) identifies itself as “ia_archiver” in the HTTP “User-agent” header field. The Alexa Internet ia_archiver crawler strictly adheres to robots.txt rules.
To prevent ia_archiver from visiting any part of your site, your robots.txt file should look like this:
You can also restrict crawling of specific directories. For example, to prevent ia_archiver from visiting the images directory (and its subdirectories):
To allow ia_archiver to visit your entire site add these lines to your robots.txt file:
For more information regarding robots, crawling, and robots.txt visit the Web Robots Pages at Robotstxt.org, an excellent source for the latest information on the Standard for Robots Exclusion.