Skip to main content

A major Wikipedia project fixed millions of old, broken links

Wikipedia’s enormous army of editors do their best to jump on pages showing erroneous information or quickly rewrite entries that have been tampered with by a miscreant, but occasionally the false information stays up for longer than you’d like.

With that in mind, many people who use the online encyclopedia like to hit the third-party links at the bottom of the page from which information in the main article has been sourced. Those links should not only confirm the information in the Wikipedia article but also offer more depth to the subject, and so are an invaluable resource for those wishing to dig deeper into a particular topic.

The trouble is that sometimes those articles — whether from the news media, educational institutions, businesses, or research establishments — are taken offline, resulting in a broken link. This can undermine the credibility of Wikipedia for those looking to verify information appearing in the listing.

The good news is that a team of volunteers from the Internet Archive has been able to restore a colossal nine million broken links on Wikipedia, helping to make those annoying “404 error/page not found” messages a thing of the past.

The Internet Archive is a non-profit digital library that’s been keeping a record of every web page put online since 1996, when the internet as we know it today was in its earliest stages of development. So yes, among its staggering 338 billion archived web pages are all of the ones that Wikipedia linked to but which have since been taken offline.

The Internet Archive’s Mark Graham explained in a blog post this week how it’s been archiving nearly every URL referenced on different Wikipedia sites the moment those links are added or changed — at the rate of about 20 million URLs a week.

It’s also been running a software robot called IABot on more than 20 Wikipedia language editions searching for broken links, Graham wrote. When it finds broken links, IABot looks for archives in the Wayback Machine — a searchable database for web pages — and other web archives to replace them with.

“Restoring links ensures Wikipedia remains accurate and verifiable and thus meets one of Wikipedia’s three core content policies: ‘Verifiability,’” Graham wrote.

The team plans to continue with its efforts to check and fix links on more Wikipedia sites and increase the speed of its system, as well as look at how it can extend its operation beyond the online encyclopedia.

On a side note, the Wayback Machine is a fun tool anyone can use. Besides helping you access information from old sites, it also lets you see how a site’s design has changed over the years — all you need to do is enter the site’s URL. Enter “youtube.com”, for example, and then click on different dates on the calendar to see just how clunky the streaming service used to look. The archived pages aren’t dynamic but instead show a snapshot of how it appeared on a particular day.

Many people who use Wikipedia and know about the Wayback Machine already use the tool to access a snapshot of the lost page, but the Internet Archive’s work to re-establish the links has helped to improve the usability of the site and also boost its credibility in the process.

Editors' Recommendations

Trevor Mogg
Contributing Editor
Not so many moons ago, Trevor moved from one tea-loving island nation that drives on the left (Britain) to another (Japan)…
How to create a Subreddit on desktop and mobile
Laptop Working from Home

Few social media sites are as popular as Reddit. Regardless of what you're interested in, there's probably a thriving community for you to interact with on the platform. Known as subreddits, these communities are home to topics like gaming, world news, science, movies, and more. If you can't find a subreddit with your particular interest, Reddit makes it easy to create your own Reddit community.

Running a successful Reddit community isn't easy – but the process of starting one only takes a few minutes. Keep in mind that you'll want to keep a close eye on your subreddit to prevent it from being shut down or turning into a wasteland with no users, but running a subreddit can be a lot of fun when done properly. If you prefer, you can also create a private community that only your friends can join, giving you a place to hang out beyond Twitter and TikTok.

Read more
How to download music from YouTube on desktop and mobile
A woman sitting on a couch, wearing airpods and holding and looking at a smartphone.

Downloading music from YouTube is a fairly common practice, and the demand for making the process easier has inspired the creation of countless websites and software.

But not every service can be considered safe. In fact, some of these services may infect your computer with malware or produce poor-quality audio files. When downloading music from YouTube, you’ll need to first make sure that the websites or apps you use for doing so won’t hurt your device. For this guide our team has found two methods to make the process safer and easier.

Read more
How to clear your browser cache in Chrome, Edge, or Firefox
The Firefox iPhone app.

A stocked computer cache may be convenient for logging into and out of go-to sites in seconds flat, but a major buildup of these tracking codes could significantly impact your PC’s performance. If you’ve noticed that your PC has been running rather slow of late, or you’re using a new browser and don’t know how to clear its cache, we’ve got you covered with the following guide.

Read more