Skip to main content

Internet Archive will ignore robots.txt files to keep historical record accurate

internet archive robots txt server
Internet Archive
The Internet Archive has announced that going forward, it will no longer conform to directives given by robots.txt files. These files are predominantly used to advise search engines on which portions of the page should be crawled and indexed to help facilitate search queries.

In the past, the Internet Archive has complied with instructions laid out by robots.txt files, according to a report from Boing Boing. However, it has been decided that the way that these files are calibrated is often at odds with the service that the site sets out to provide.

“Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes,” stated a blog post that the organization published last week. “Internet Archive’s goal is to create complete ‘snapshots’ of web pages, including the duplicate content and the large versions of files.”

Robots.txt files are increasingly being used to remove entire domains from search engines following their transition from a live, accessible site to a parked domain. If a site goes out of business, and is rendered inaccessible in this way, it also becomes unavailable for viewing via the Internet Archive’s Wayback Machine. The organization apparently receives queries about these sites on a daily basis.

The Internet Archive hopes that disregarding robots.txt files will help contribute to an accurate representation of prior points in the web’s history, removing their capacity to muddy the waters with instructions intended for search engines.

The organization has already ceased referring to robots.txt files on sites and pages related to the U.S. government and the U.S. military, to account for the enormous changes that can be made to domains between one administration and the next. This decision has caused no major problems, so there are high hopes that discontinuing the use of the files more broadly will be helpful.

Brad Jones
Former Digital Trends Contributor
Brad is an English-born writer currently splitting his time between Edinburgh and Pennsylvania. You can find him on Twitter…
Why you should buy a MacBook Pro instead of a MacBook Air
The 14-inch MacBook Pro on a window sill.

There are plenty of reasons to buy a MacBook Air instead of a MacBook Pro. If you want a MacBook on a budget, you don't necessarily need the goodies that come with upgrading to the MacBook Pro.

That being said, I'm going to argue for spending a little more. In my experience, the MacBook Pro offers several distinct advantages that help justify a higher price, especially with the introduction of the more affordable MacBook Pro 14 with the base M3. If you can stretch your budget a bit, here's why I think you should buy a MacBook Pro instead of a MacBook Air.
Setting the stage: pricing

Read more
Windows 11 vs. Windows 10: finally time to upgrade?
The screen of the Surface Pro 9.

Windows 11 is the newest version of Windows, and it's one of the best Windows versions released. At launch, the operating system was very similar to Windows 10, but it has morphed a lot over the past several years. Now, Windows 11 has several key differences compared to Windows 10.

If you've been holding out on upgrading, we have everything you need to know about Windows 11 and how it's different than Windows 10 in this article. We'll detail the differences, as well as show you the areas where Windows 11 is growing faster than Windows 10.
Windows 11 vs. Windows 10: what's new

Read more
Usually $299, this HP Chromebook is discounted to $149 today
HP Chromebook 14b sits on a desk.

If the laptop deals that you come across are too expensive because all you need is a basic device, then you may want to check out Chromebook deals. Here's an affordable offer that may catch your attention -- the HP Chromebook 14a for only $149, following a $150 discount on its original price of $299. That's insanely cheap, and we don't think that price will hold for a long time. If you're interested in taking advantage of this bargain, it's highly recommended that you proceed with the purchase immediately because it may be gone as soon as tomorrow.

Why you should buy the HP Chromebook 14a
A Chromebook is a laptop that's powered by Google's Chrome OS, which is a web-based operating system that allows devices to run fast and smooth despite cheap components. For example, the HP Chromebook 14a is only equipped with the Intel Celeron N4120 processor, Intel UHD Graphics 600, and 4GB of RAM, but it's good enough for the basic tasks that you'll need to complete for work or school such as typing documents, doing online research, and making presentations. The HP Chromebook 14a doesn't have much built-in storage as it only comes with a 64GB eMMC, but you'll have all the space that you need for your files on Google Drive.

Read more