Skip to main content

A major Wikipedia project fixed millions of old, broken links

Wikipedia’s enormous army of editors do their best to jump on pages showing erroneous information or quickly rewrite entries that have been tampered with by a miscreant, but occasionally the false information stays up for longer than you’d like.

With that in mind, many people who use the online encyclopedia like to hit the third-party links at the bottom of the page from which information in the main article has been sourced. Those links should not only confirm the information in the Wikipedia article but also offer more depth to the subject, and so are an invaluable resource for those wishing to dig deeper into a particular topic.

The trouble is that sometimes those articles — whether from the news media, educational institutions, businesses, or research establishments — are taken offline, resulting in a broken link. This can undermine the credibility of Wikipedia for those looking to verify information appearing in the listing.

The good news is that a team of volunteers from the Internet Archive has been able to restore a colossal nine million broken links on Wikipedia, helping to make those annoying “404 error/page not found” messages a thing of the past.

The Internet Archive is a non-profit digital library that’s been keeping a record of every web page put online since 1996, when the internet as we know it today was in its earliest stages of development. So yes, among its staggering 338 billion archived web pages are all of the ones that Wikipedia linked to but which have since been taken offline.

The Internet Archive’s Mark Graham explained in a blog post this week how it’s been archiving nearly every URL referenced on different Wikipedia sites the moment those links are added or changed — at the rate of about 20 million URLs a week.

It’s also been running a software robot called IABot on more than 20 Wikipedia language editions searching for broken links, Graham wrote. When it finds broken links, IABot looks for archives in the Wayback Machine — a searchable database for web pages — and other web archives to replace them with.

“Restoring links ensures Wikipedia remains accurate and verifiable and thus meets one of Wikipedia’s three core content policies: ‘Verifiability,’” Graham wrote.

The team plans to continue with its efforts to check and fix links on more Wikipedia sites and increase the speed of its system, as well as look at how it can extend its operation beyond the online encyclopedia.

On a side note, the Wayback Machine is a fun tool anyone can use. Besides helping you access information from old sites, it also lets you see how a site’s design has changed over the years — all you need to do is enter the site’s URL. Enter “”, for example, and then click on different dates on the calendar to see just how clunky the streaming service used to look. The archived pages aren’t dynamic but instead show a snapshot of how it appeared on a particular day.

Many people who use Wikipedia and know about the Wayback Machine already use the tool to access a snapshot of the lost page, but the Internet Archive’s work to re-establish the links has helped to improve the usability of the site and also boost its credibility in the process.

Editors' Recommendations