IMDb Message Board Archives: Are They Legal?

Screenshot of IMDb Message Boards

March 1, 2017

It’s no secret there are published archives of Internet Movie Database (IMDb) message boards, which were disabled on February 20, 2017. Users were given permission to archive content for personal use, but publishing archives could be a copyright violation. I’d love to read the archives as much as the next guy, but what fun would it be if I thought I was reading pirated material?

As important as the message boards are, respecting copyright law is more important. The Copyright Act’s goal is to fairly compensate creative efforts, and encourage creativity for the greater good. If everyone ignores that goal, it will be harder for people to create, and humanity will suffer as a whole because of it.

But what if content is no longer available from the copyright holder? What if it’s important information that would benefit society? Fortunately, the law isn’t black and white. It was written to accommodate unique circumstances, and tries to balance the public’s interest with the copyright holder’s interest.

What about the archives? Let’s see if we can’t figure out if it’s legal to publish them. We’ll start at the beginning:

The Copyright Holder

Generally speaking, when you post content to IMDb, you give them non-exclusive rights to it:

Your License to IMDb: If you do post content or submit material, and unless we indicate otherwise, you grant IMDb a nonexclusive, royalty-free, perpetual, irrevocable, and fully sublicensable right to use, reproduce, modify, adapt, publish, translate, create derivative works from, distribute, and display such content throughout the world in any media.

http://www.imdb.com/conditions

Non-exclusive rights give you, as well as IMDb, the right to use your content. IMDb does claim exclusive rights to the compilation of all content, but probably doesn’t want the liability of defending the exclusive rights of every individual submission.

The message boards, on the other hand, had its own superseding Terms and Conditions. The page with those terms can no longer be found on their website, but it was archived, ironically enough, on Internet Archive:

Submissions: IMDb shall own exclusive rights, including all intellectual property rights, and shall be entitled to the unrestricted use of these materials for any purpose, commercial or otherwise, without acknowledgment or compensation to you.

http://web.archive.org/web/20161224074352/http://www.imdb.com/help/show_article?boardstc

So, when you posted content on IMDb message boards, you gave IMDb exclusive rights to it, meaning you gave up your right to use it, and no one but IMDb has the right to use it.

Some people have been complaining that they don’t want to see their posts on an archive, because they can’t edit or delete them. Unfortunately for them, they gave up control over what IMDb does with their content when they licensed the rights to them. IMDB would have been well within their rights to archive the content, but they chose not to do so:

We can archive the boards and make the content available. We have chosen not to do so. If we were to make a static archive available then that would include all of the inappropriate, abusive and off-topic discussions preserved; people would want the right to edit what they posted, to remove it, to report the content, to post follow-ups because things had changed and so on. By the time you build in all of those features, all you would have is an even more limited and inferior version of the boards which would also largely be going stale.

http://movietvforums.com/community/what-will-happen-to-the-original-database/col-needhams-response-to-archiving-the-content/#post-17

It’s evident that IMDb could have legally created an archive that the contributors had no control over. Our first legal obstacle is cleared. Moving on to:

The Data Scrapers

Google and other search engines use computer programs known as crawlers to index web pages and make them searchable, which is legal and usually desirable. A web scraper goes a step further and actually extracts data, possibly for reproduction on another website, and its legality is often decided in a court of law. The IMDb message board archives that are currently published on websites were obtained through data-scraping. The question is if that data-scraping was legal.

If you don’t want your data crawled or scraped, there are ways of preventing it. One of those ways is robots exclusion protocol or robots.txt. IMDb uses robots.txt to discourage the crawling of some of their data, including the message boards, but it is not against the law to ignore robots exclusion protocol.

In copyright cases, the use or absence of robots.txt is likely considered on a case-by-case basis, with courts examining the specific circumstances of each case, and determining if the Copyright Act would be served by their ruling.

In Associated Press v. Meltwater Holdings U.S., Inc., the court rejected the defense that the protocol hadn’t been deployed, arguing that copyright owners should not be forced to use the protocol (which deters search engines and can make it difficult to recover lost data) or they lose the right to prevent unauthorized use. Conversely, in Parker v. Yahoo, Inc., the court ruled that Yahoo had an implied license to create cache copies because the plaintiff knew they would do so in the absence of the protocol.

With respect to future cases involving use of scraped content . . . courts are likely to follow a similar analysis driven by the facts of the specific case. Issues regarding whether the copying is momentary, whether the information extracted is factual, the effect on the market value of the copyrighted material, and the amount and substantiality of the material used are likely to be key issues in these cases . . . Courts are also likely to consider, in the context of defenses to copyright claims, the specific circumstances relating to a website’s deployment of the robots.txt protocol, including whether the defendant has a practice or policy of complying with the protocol if deployed.

https://www.bna.com/legal-issues-raised-by-the-use-of-web-crawling-and-scraping-tools-for-analytics-purposes

What about IMDb’s terms, which state: “Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.”?

Just having a page with those terms doesn’t always hold up in a court of law.

Basically, scrapers follow the same paths as visitors and search engines, which can be done without explicitly accepting any terms. In Nguyen v. Barnes & Noble, Inc., the court stated that “whether the user knew about the terms depended upon the design and content of the website and the agreement’s web page, noting that where the link to a website’s terms of use is buried at the bottom of the page or tucked away in obscure corners of the website, courts have refused to enforce such an agreement.”

IMDb’s terms are buried in the bottom right-hand corner of their web page, and we all know nobody looks at anything on the bottom their web pages, especially their message boards, but also their terms.

Actual knowledge of the terms is sometimes demonstrated by the fact that the defendant was issued a cease and desist warning from the plaintiff. As far as I know, this did not occur.

A breach of contract claim also requires a showing of damages. Crawlers and scrapers can cause high bandwidth consumption and server overload. That’s probably why the message boards were crashing for everyone in the final days. More than likely, IMDb knew the message boards were being scraped, and chose not to do anything about it. Device detection specialists could have identified and deterred the scrapers, cease and desist warnings could have been issued, or the message boards could have just been shut down completely. The fact that IMDb was aware of the data-scraping and chose not to act on it implies consent to the scraping.

So, the scraping might have been OK. These things are often decided on a case-by-case basis, but I’m leaning towards legal. Of more importance is what happened to the data once it was scraped. Which brings us to:

The Fair Use Doctrine

Copyright law is an aberration in the realm of the freedom of speech. To address this contradiction, there are some exceptions to the copyright law, and one of those is the fair use doctrine. The purpose of the fair use doctrine is to allow for limited and reasonable uses as long as it does not interfere with owners’ rights. The courts consider four factors when evaluating fair use:

  • (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
  • (2) the nature of the copyrighted work;
  • (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
  • (4) the effect of the use upon the potential market for or value of the copyrighted work.

https://www.law.cornell.edu/uscode/text/17/107

Certain uses are favored in the statute, including education and research, but that’s no guarantee it will be considered fair use. The four factors have to be weighed together as a whole, and the courts try to balance the public’s interest with the copyright holder’s interest.

The statute was written with the flexibility to allow it to evolve through case law with new circumstances and new types of uses. Unfortunately, that flexibility also makes it difficult to predict if a use would be considered fair. Again, these cases are often decided on case-by-case basis.

Fortunately, there is case law we can reference when considering if publishing the IMDb message board archives is legal under the fair use doctrine:

In Time Inc. v. Bernard Geis Associates, the court determined the nature of the work was more important to the public good than the copyright holder’s interest. This case concerns an analysis of the famous Zapruder film of John F. Kennedy’s assassination. The court determined the public’s interest in the analysis of the work, which was made possible by the infringement of copyrighted material, outweighed the copyright holder’s interest.

https://www.law.cornell.edu/copyright/cases/293_FSupp_130.htm

https://www.unemed.com/blog/six-seconds-in-dallas-fair-use-and-the-kennedy-assassination

In Righthaven LL v. Jama, a nonprofit organization posted a newspaper article in its entirety on its website. This was considered transformative fair use because the purpose of the article was to educate the public about immigration issues. Judge Philip Pro noted that “Noncommercial, nonprofit use is presumptively fair.”

http://fairuse.stanford.edu/overview/fair-use/cases/

http://www.dmlp.org/sites/citmedialaw.org/files/2011-06-20-Order%20Granting%20Mot%20to%20Dismiss%20in%20Righthave%20v%20Hoehn%20Order.pdf

In Kelly v. Arriba Soft Corp., thumbnail reproductions of images were posted on a website, but they were smaller and of poorer quality than the originals, and helped to serve the public by indexing them. Although the defendant used the images for commercial purposes, their use was considered transformative and not a substitute for the originals, and therefore did not undermine the potential market for those images.

http://fairuse.stanford.edu/overview/fair-use/cases/

https://www.bna.com/legal-issues-raised-by-the-use-of-web-crawling-and-scraping-tools-for-analytics-purposes

The three fair use cases above illustrate examples of when:

  • The nature of the use is more important to the public welfare than the copyright holder’s interest.
  • The purpose of the use is non-commercial and nonprofit.
  • The use did not affect the potential market for the copyright holder.

These rulings could all be applied to the IMDb message board archives:

  • The community created the vast repository of information that was contained in the message boards, and that community feels they are important, so they believe they have a right to it that transcends the copyright holder’s right.
  • If the archives are posted on a non-commercial, nonprofit website, it is presumptively fair use.
  • Since the message boards are technically out-of-print, their publication could in no way undermine IMDb’s potential market for them.  It could theoretically be argued their negativity reflects badly on IMDb, but that could be difficult to quantify in market terms.

But perhaps the bigger question that should be answered is this: How does Internet Archive get away with publishing copyrighted material?

Internet Archive: Wayback Machine

Internet Archive: Wayback Machine, founded in 1996, is a nonprofit digital library that preserves digital media, and provides online access to over a million users a day. Their stated goal is universal access to all knowledge. From their website:

Most societies place importance on preserving artifacts of their culture and heritage. Without such artifacts, civilization has no memory and no mechanism to learn from its successes and failures. Our culture now produces more and more artifacts in digital form. The Archive’s mission is to help preserve those artifacts and create an Internet library for researchers, historians, and scholars. The Archive collaborates with institutions including the Library of Congress and the Smithsonian.

Do you collect all the sites on the Web?

No, the Archive collects web pages that are publicly available. We do not archive pages that require a password to access, pages that are only accessible when a person types into and sends a form, or pages on secure servers. Pages may not be archived due to robots exclusions and some sites are excluded by direct site owner request.

http://archive.org/about/faqs.php#The_Wayback_Machine

It’s not without controversy:

We’re sure that there are going to be a lot of people who want to be excluded,” says Kahle, although he notes that in the Internet Archive’s five-year history 90 percent of the complainers have become converts after hearing that the nonprofit’s primary goal is simply to preserve history, not to profit off it. Kahle says it has typically been individuals, not companies, who are most concerned about protecting their intellectual property — or future privacy.

http://www.salon.com/2001/11/02/wayback/

It could be said they put themselves at considerable legal risk:

The fact that the Internet Archive does not request permission to copy websites before remote harvesting puts it at considerable legal risk. Most rightholders may be satisfied with an a-priori or a-posteori opt-out system, but assuming that all will be satisfied is not necessarily a safe way to operate. Neither system overrides the rightholder’s automatic copyright, and offering to take down copied material after the fact is not recognised as sufficient protection should a law suit be brought forward (Charlesworth, 2003).

https://blogs.city.ac.uk/ludiprice/files/2015/03/Internet-Archiving-The-Wayback-Machine-v0rykw.pdf

Which, naturally, led to a lawsuit in December 2005, Internet Archive v. Suzanne Shell, in which the plaintiff took issue with Internet Archive posting copies of her website on their archive:

(I)n Internet Archive v. Shell, the Internet Archive sought dismissal on preemption grounds of the plaintiff’s claim for breach of contract relating to Internet Archive’s crawling and indexing of plaintiff’s website in violation of terms of use that prohibited any copying of plaintiff’s website for a “commercial or financial purpose.” The court rejected Internet Archive’s preemption argument, finding that Internet Archive’s alleged agreement to refrain from use of the material on plaintiff’s website “for commercial or financial purposes … lie(s) well beyond the protections (the website owner) receives through the Copyright Act.” The court reached this conclusion despite the fact that the Internet Archive is a nonprofit entity—apparently on the basis of disputed allegations that Internet Archive’s copying of the content at issue allowed it to “acquir(e) … grant awards, donations, … and the expectation of acquiring additional intellectual property.”

https://www.bna.com/legal-issues-raised-by-the-use-of-web-crawling-and-scraping-tools-for-analytics-purposes

It seems the court felt Internet Archive indirectly benefited financially from her work, which led to a settlement in April 2007:

The Internet Archive said, “Internet Archive has no interest in including materials in the Wayback Machine of persons who do not wish to have their Web content archived. We recognize that Ms. Shell has a valid and enforceable copyright in her Web site and we regret that the inclusion of her Web site in the Wayback Machine resulted in this litigation. We are happy to have this case behind us.” “I respect the historical value of Internet Archive’s goal. I never intended to interfere with that goal nor cause it any harm,” said Ms. Shell.

https://archive.org/post/119669/lawsuit-settled

Despite that, Internet Archive, for the most part, gets away with publishing copyrighted material because:

  • They are nonprofit.
  • They are for research and educational purposes.
  • They honor robots exclusion protocol.
  • They will remove any disputed material.

How does that apply to the IMDb message board archives that are floating around? Let’s add up what we have so far:

Key Factors in Determining the Legality of IMDb Message Board Archives

  • The contributors of the content have no control over it: The exclusive rights were given to IMDb.
  • The data-scraping could well have been legal: It’s not illegal to ignore robots.txt, the terms could be unenforceable, and no cease and desist warnings were issued.
  • Publishing the archives could be legal if it is not-for-profit, it is for research or educational purposes, it doesn’t impact the future earnings for the copyright holder, and no cease and desist warnings have been issued.

What does that mean for the two currently published archives?

Archiveteam.org

Archiveteam.org’s IMDb message board archive is published on Internet Archive. “ArchiveTeam’s web archives will be available to everyone without restrictions or profit as usual.” Rumor has it they intend to make their files a browseable version. Right now, the data is not in a format that is easy to access.

My Verdict:

If an IMDb message board archive is on Internet Archive, and IMDb or the content creators do not request exclusion, then I’m calling it as legal as anything else you would find on Internet Archive. I suspect far more people would be glad to see it there than would like to see it taken down.

MovieChat.org

The legality of MovieChat.org’s IMDb message board archive is murkier. The website was created on February 8, 2017, but its business status is unknown. The creator does ask for donations, but that doesn’t necessarily make it either for profit or not-for-profit. It certainly could be argued it’s for research or educational purposes, much the same way Internet Archive does. It does appear the creator will follow the Oakland Archive Policy, which recommends the removal of archived material upon request. From the website:

Are there legal concerns about archiving IMDB’s posts?

No. Before I became a software engineer, I was a lawyer for 4 years practicing in the IP and copyright space, and thus I have a deep understanding of the current situation, which is partly why I decided to start this site. Furthermore, I have spoke with several individuals at AMZN/IMDB, all of whom state that the message boards are of absolute minimum value/concern to the company, and it is highly unlikely that the company would pursue any legal action should someone attempt to archive the boards. In fact, IMDB was actually considering creating an API / data extraction tool to allow users to retrieve whatever they wanted before the boards shut down, but they decided it just wasn’t enough of a priority for them (it wouldn’t bring in any revenue, so why should they do it?). The (much) more likely scenario is that a few individual users may directly request to have their posts removed (much like you can request Google to remove personal information about you from its search results). Such an action would have a very minimal effect on the overall content of our site.

https://www.moviechat.org/movies/general/posts/58a28ce74dcf9300116f391d

IMDb has presumably been notified of MovieChat.org archive:

Before you posted, I had already sent IMDb a note informing them about (MovieChat.org).

https://www.themoviedb.org/talk/58a20110c3a36879160006c9

This was posted on IMDb’s board on Get Satisfaction, where Col Neeham (the CEO of IMDb) is the official rep:

My name’s Jim, and I created http://MovieChat.org as a replacement for IMDB’s message boards.  Key Features:
1. Any movie/show on IMDB is also on MovieChat.org – we have separate boards for each movie/show, just like IMDB
2. I backed up most of the posts for IMDB’s top 10,000 movies/shows – most existing conversations on IMDB should also appear on MovieChat.org – we have over 3 million posts already (and I’m working non-stop to back up even more from IMDB)!

https://getsatisfaction.com/imdb/topics/imdb-message-boards?topic-reply-list[settings][filter_by]=all&topic-reply-list[settings][page]=8#topic-reply-list

At this point, it appears IMDb has knowledge of MovieChat.org’s archive, and if they don’t issue a cease and desist warning, then it seems they are giving MovieChat.org implied consent to publish their copyrighted material.

My Verdict:

I’m calling it legal for now, but it would ultimately have to be decided in a court of law—and only if it gets to that point.

 

Whew, after all that, I feel I can finally peruse the IMDb message board archives with a clear conscience. Also, someone has created a script that inserts the old message boards back into the appropriate IMDb page, right where they used to be, so it’ll be like it never even happened.

 

Resources of Interest:

Copyright Issues Relevant to the Creation of a Digital Archive: A Preliminary Assessmentbody

Summaries of Fair Use Cases

Use of Online Data in the Big Data Era: Legal Issues Raised by the Use of Web Crawling and Scraping Tools For Analytics Purposes

6 thoughts on “IMDb Message Board Archives: Are They Legal?”

  1. Great analysis! I loved the way you laid out your argument, and like Fritzelly says, it’s good stuff to know.

    “The page with those terms can no longer be found on their website, but it was archived, ironically enough, on Internet Archive”

    “we all know nobody looks at anything on the bottom their web pages, especially their message boards, but also their terms”

    Loved those lines.

    1. Thanks, Instar808. When IMDb said hardly anyone even knew about the message boards, a lot of people said, “Of course not, that’s because they’re buried on the bottom of your web page,” so I couldn’t resist.

      And I don’t know what I’d do without Internet Archive.

  2. Great article with great source material. I was wondering if exclusive rights still apply if something goes out of print, like the IMDb message boards? Keep up the good work!

    1. That’s a good question. I almost included the “out of print” or “reversion of rights” method of regaining copyright in my article. Basically, it means that if your work goes out-of-print, the rights revert back to you, so you have another chance to exploit your work. It prevents your hands from being tied if the copyright holder is no longer publishing your work.

      This could certainly apply to the IMDb message boards, which are now technically “out-of-print.” It could be claimed that republishing them could in no way undermine their market value to IMDb, unless IMDb claims the negativity reflects badly on them.

      I didn’t include it because usually a “reversion of rights” clause has to be in the contract, which it wasn’t in IMDb’s case (it can be requested). Ultimately, I decided it was beyond the scope of this article, which deals with the right of a third party to publish the archives.

      Thanks for your question and compliment!

      “Out of Print” Clauses

      DIY: Reversion of Rights
       

Leave a Reply