How to Access Pages Missing from the Internet

Using the Wayback Machine to Find Lost Pages

404 pages have gotten more creative over the years:

404 pages from GitHub (left) and HopperMagic (right) (Source)

However, that does not make them less annoying, especially when searching for critical data. Pages can disappear for many reasons: someone forgot to pay hosting fees, governments deemed the info subversive, individuals try to scrub records from the web, or mundane infrastructure problems. The average life of a webpage has been variously reported as 44, 75, and 100 days; whatever the exact number, one thing is clear: the Internet is leaky and content is not guaranteed to stay around forever.

Enter the Wayback Machine: simply install this chrome extension, and unlock those disappeared pages that have been saved in the Internet Archive:

Wayback Machine in operation on a missing page using the chrome extension.

You can also go to http://web.archive.org/ and search for the url:

Saved versions of a previously inaccessible website (here)

This shows you all the versions of the webpage saved over time. If you want to see previous versions of a website, paste in the url and time travel.

Saved versions of towardsdatascience.com

The Wayback Machine is a digital archive of the World Wide Web and other information on the Internet developed and maintained by the Internet Archive, a non-profit with the modest goal of “archiving the entire Internet and providing universal access to all knowledge”. They maintain a library of digital content free for anyone to access. To date it contains:

All of this information — growing by 15 TB per day (as of 2016) — lives on physical infrastructure at the Internet Archive. Currently, there are over 20,000 disks of up to 8 TB trying to archive the entirety of human knowledge.

This may seem a little overwhelming, but on a practical level, you can use the Internet Archive’s Wayback Machine to access missing web pages that have been saved. If you’re worried about a page being taken down (maybe because it’s controversial) you can also save it through the Chrome Extension or the Internet Archive website. There are other tools that may allow you to see old web pages like Chrome’s cached pages. As with all human infrastructure, the Internet and digital tools crumble over time, nonetheless, the Internet Archive and Wayback Machine give you one way to fight the decay.

As always I welcome feedback and constructive criticism. You can find me on Twitter @koehrsen_will.




Data Scientist at Cortex Intel, Data Science Communicator

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Will Koehrsen

Will Koehrsen

Data Scientist at Cortex Intel, Data Science Communicator

More from Medium

Buckle up! Data bootstrap at full speed

Data bootstrap at full speed (Image source: https://unsplash.com/photos/NqOInJ-ttqM)

6 Top Jamstack Solutions You Need to Develop High-Performance Apps and Websites

Automated Pairing Matrix using GitHub commits

HOW IT ALL STARTED — My journey to the Azubi re/Start Program