Internet Archive [repack] | Rec 2007

Title: The Internet Archive and the Legal Battleground of 2007: A Case Study in Digital Preservation and Copyright

“It was our hangout,” Pete said. “We posted pictures, ranted about the new shift manager, shared recipes. When the layoffs started in late ’07, everyone poured their hearts out there. But the domain expired in 2009.”

The Short Answer (TL;DR)

In late 2007, the Internet Archive's massive web crawling operation (code-named "rec 2007") inadvertently triggered a global email meltdown. A misconfigured crawler visiting millions of websites harvested thousands of auto-reply email addresses (like "out-of-office" and "mailer-daemon" responses) and then began emailing them, creating infinite email loops. This flooded email servers worldwide, crashed systems at major universities and corporations, and forced the Internet Archive to halt all crawling for several days. rec 2007 internet archive

Legal and policy concerns also dominated conversations. Copyright law, robots.txt exclusions, and takedown requests created friction between preservation goals and rights holders’ interests. In 2007 the normative balance still favored site owners’ control: robots.txt often excluded crawls, and some legal frameworks remained ambiguous about fair use and preservation exceptions for digital archives. Archivists argued for legal clarity and narrower restrictions to enable responsible long-term preservation. REC 2007 served as a forum to press for policy reforms—clearer archival exceptions in copyright law, safe-harbor provisions for non-commercial preservation, and standardized consent mechanisms for capturing user-contributed content.

To do this, the Archive runs web crawlers — automated software (spiders) that browse the web, follow links, and download copies of pages. By 2007, the Archive was crawling billions of URLs. Title: The Internet Archive and the Legal Battleground

However, 2007 was a year of transformation. It was the year the IA moved from passive archiving of public web pages to active aggregation of printed literature. This shift brought the organization into direct conflict with the publishing industry and the complexities of U.S. Copyright Law. This paper explores how the initiatives launched and the legal pressures mounted in 2007 laid the groundwork for the litigation the IA faces today.

Once on the item page, you have several options on the right-hand sidebar under "Download Options" Internet Archive Help Center In-Browser Streaming But the domain expired in 2009

The Fallout (Late 2007)

The flood of looped emails caused widespread problems:

Skip to content