Skip to main content

Internet Archive will ignore robots.txt files to keep historical record accurate

internet archive robots txt server
Internet Archive
The Internet Archive has announced that going forward, it will no longer conform to directives given by robots.txt files. These files are predominantly used to advise search engines on which portions of the page should be crawled and indexed to help facilitate search queries.

In the past, the Internet Archive has complied with instructions laid out by robots.txt files, according to a report from Boing Boing. However, it has been decided that the way that these files are calibrated is often at odds with the service that the site sets out to provide.

Recommended Videos

“Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes,” stated a blog post that the organization published last week. “Internet Archive’s goal is to create complete ‘snapshots’ of web pages, including the duplicate content and the large versions of files.”

Robots.txt files are increasingly being used to remove entire domains from search engines following their transition from a live, accessible site to a parked domain. If a site goes out of business, and is rendered inaccessible in this way, it also becomes unavailable for viewing via the Internet Archive’s Wayback Machine. The organization apparently receives queries about these sites on a daily basis.

The Internet Archive hopes that disregarding robots.txt files will help contribute to an accurate representation of prior points in the web’s history, removing their capacity to muddy the waters with instructions intended for search engines.

The organization has already ceased referring to robots.txt files on sites and pages related to the U.S. government and the U.S. military, to account for the enormous changes that can be made to domains between one administration and the next. This decision has caused no major problems, so there are high hopes that discontinuing the use of the files more broadly will be helpful.

Brad Jones
Former Digital Trends Contributor
Brad is an English-born writer currently splitting his time between Edinburgh and Pennsylvania. You can find him on Twitter…
AMD’s Ryzen 7 9800X3D may not give Intel any breathing room
The Ryzen 7 7800X3D installed in a motherboard.

The competition between Intel Arrow Lake and AMD Zen 5 hasn't been as fierce as usual, with both lineups delivering small gen-to-gen improvements. However, it seems that AMD may soon add a staple to its list of the best processors, and the CPU might be announced at the worst possible time for Intel. I'm talking about the Ryzen 7 9800X3D, which now has a rumored release date alongside some performance benchmarks.

The release date speculation was initially shared on Bilibili, but the user has since deleted their post. However, the discussion continued on Chiphell forums, spilling the beans on both the official announcement date and the possible release date.

Read more
25 years ago, Nvidia changed PCs forever
The GeForce 256 sitting next to a Half Life box.

Twenty-five years ago, Nvidia released the GeForce 256 and changed the face of PCs forever. It wasn't the first graphics card produced by Nvidia -- it was actually the sixth -- but it was the first that really put gaming at the center of Nvidia's lineup with GeForce branding, and it's the device that Nvidia coined the term "GPU" with.

Nvidia is celebrating the anniversary of the release, and rightfully so. We've come an unbelievable way from the GeForce 256 up to the RTX 4090, but Nvidia's first GPU wasn't met with much enthusiasm. The original release, which lines up with today's date, was for the GeForce 256 SDR, or single data rate. Later in 1999, Nvidia followed up with the GeForce 256 DDR, or dual data rate.

Read more
These M4 MacBook Pro leaks are a goldmine of secret info
Russian YouTuber Romancev768 with what is claimed to be a real M4 MacBook Pro unit.

Apple's known for locking down its secrets under lock and key. But not these past few weeks.

The company hasn’t even announced the M4 MacBook Pro, yet we’ve apparently learned pretty much everything there is to know about the upcoming laptop thanks to a series of purported high-profile leaks and unboxing videos that have shown off the device from every angle. For a firm as security conscious as Apple, having the MacBook Pro spoiled in this way is close to catastrophic.

Read more