December 29th, 2008 by Larry Donahue

Debunking the Wayback Machine

The Internet Archive (www.archive.org) was founded in 1996 by Brewster Kahle, a search-engine whiz and dot-com multimillionaire at the time, with a dream: “He wanted to back up the Internet.” It quickly became the largest publicly accessible, privately funded digital archive in the world. At the same time of its founding, Mr. Kahle co-founded Alexa Internet in April 1996, which was sold to Amazon.com in 1999.

At the time of the founding of the Internet Archive, there were approximately 50 million or so unique URL’s. In July of 2007, Google claimed that it found approximately 1 trillion unique URL’s on the web, with every indication that growth will continue to explode. Over the years, the Internet has become a significant driver of commerce, increasingly the subject matter of litigation. The Wayback Machine has provided evidence, for plaintiffs and defendants alike, in litigation ever since; and has become a very important tool for attorneys and litigants.

The problem is, most attorneys (and even highly paid expert witnesses) don’t have enough technical experience to truly appreciate the limitations of the Wayback Machine, and often misuse or misinterpret the results of the Wayback Machine.

I was recently hired as an expert witness by a small Internet company, to help defend against a lawsuit from a major music publisher. The case looked absolutely hopeless, as this major music publisher spent an exorbitant sum on an expert witness, who appeared to have created a water-tight case against my client using information provided by the Wayback Machine.

As I read the report of the plaintiff’s expert witness, it became clear to me that the expert witness had absolutely no understanding of the limitations of the Wayback Machine, and as a result, completely misinterpreted the results. Within a few short weeks, I was able to completely discredit the expert witness, thereby undermining the plaintiff’s case (This case is still ongoing, and has not yet reached final disposition).

I am currently working on a paper, which I am calling “Debunking the Wayback Machine,” which will detail the advantages and disadvantages of using the Wayback Machine in litigation. This paper will discuss the technical issues, as well as provide easy-to-follow steps and guidelines on how to carefully examine and apply the results of the Wayback Machine for testimonial purposes. And, most importantly, how to discredit anyone that blindly relies on the results of the Wayback Machine to prove or disprove their case.

Consider the issues.

First, the Wayback Machine relies on important disclaimers (for a reason). See www.archive.org/legal/affidavit.php and www.archive.org/about/terms.php.

Second, one should never take the Wayback Machine at face value:

  • It does make changes to the underlying HTML.
  • It can and does make mistakes (usually based on errors or other problems from webservers, the systems running the websites being backed up).
  • Dates don’t always align with what you see (check the dates and links for ALL links, frames and images on each and every page).
  • The dates are mere snapshots, and don’t necessarily represent all the changes that have occurred on a website.
  • It cannot see any text contained within images.
  • It cannot see any information that is accessed from a form (i.e. data contained within a database).
  • In general, it cannot see any web pages that depend on scripts (although there exceptions).
  • It can paint a false representation of a web page, if that webpage uses dynamic technology (i.e. technology that produces a result, after querying a backend script or database – including but not limited to Flash, ActiveX or AJAX technologies).

And third, The Wayback Machine isn’t always so way back: It can include links back to the existing website. When you’re referencing objects at the existing website, you’re accessing information that exists today, not the date you think you’re referencing from the Wayback Machine. Pay special attention to:

  • Images,
  • Forms,
  • Information from database queries, and
  • Framesets

These limitations can have profound impacts on what is delivered from the Wayback Machine, and my paper discusses these impacts in depth. For your consideration, consider these two examples I have personally witnessed in the past year:

  • In one example, the Wayback Machine has archived a particular website for years. When referencing that website from several years ago, the Wayback Machine contains all the web pages including a form. The form, however, references an actual script that sits on today’s website (i.e. the script, itself, is not backed up on the Wayback Machine). Therefore, when one accesses the form from the Wayback Machine, it gives the false impression that when you hit “submit,” you’re getting the results from several years ago. This is incorrect, because when you hit “submit,” the Wayback Machine sends the query to the existing website, therefor you’re getting today’s information. This is a difficult concept to grasp, and made all the more difficult in litigation, because most expert reports contain mere screen shots, when a careful examination of the underlying links, data and information provided by the Wayback Machine is needed to properly assess the accuracy and relevance of that information to the case at hand.
  • In another example, the Wayback Machine had archived some, but not all, images of a particular website. When viewing a backed up website through the Wayback Machine, you see a complete web page but when you carefully examine the links, you find that not all the images represented have been backed up. A few of the images — and in this case, a very important image — continue to be referenced from the existing website. When you have anything coming from outside the Wayback Machine, it is not archived information. Thus, subject to changes and manipulation over time. In this case, the image in question was key to a case: It provided specific information about the company that a plaintiff attempted to use in litigation.

In conclusion, I believe it’s attorney malpractice to let the opposing side use the results from the Wayback Machine in litigation (or to influence settlement or the outcome of a case) without consulting with an expert who can carefully examine and scrub the results provide by the Wayback Machine.

Stay tuned for my paper. If you have any questions or are dealing with a matter that involves evidence provided from the Wayback Machine, please feel free to contact us at your earliest convenience.

5 Responses to “Debunking the Wayback Machine”

jack grimes

December 29th, 2008 - 9:15 pm

Great article. Please let me know how to get your published paper.
–jack

Ron Bader

December 30th, 2008 - 8:28 am

Your article was very informative and I am very interested in reading your paper when it is published. I will appreciate it greatly if you will send me information on how to obtain a copy.

Thanks,
Ron

Larry Donahue

December 30th, 2008 - 4:33 pm

Thank you, Ron and Jack. I intend to release the article in late January (or early February at the latest), and will send you guys a copy.

Take care and have a GREAT holiday season!

Larry.

David Sandlin

January 5th, 2009 - 4:52 pm

Larry,

Good article. I have been working in digital forensics for a while now and have tried to use the Wayback Machine. I found many of the same issues you have listed. Thanks for taking the time to actually document the shortcomings.

David

25 Years and Still Counting « Techno Cat

March 23rd, 2010 - 8:01 pm

[…] Mr. Donahue further states that the WaybackMachine archive may also make changes to HTML as well as making other mistakes. See: Debunmking the Wayback Machine […]

Leave a Reply