The NYT is blocking the Internet Archive. That’s a mistake.

The New York Times has started blocking the Internet Archive's web crawlers, going beyond standard robots.txt rules to cut off access entirely. The Guardian is doing the same. The stated reason: fear of AI companies scraping their content. But the Internet Archive isn't an AI company — it's a library, and it's been one for nearly 30 years.

The Wayback Machine holds more than one trillion archived web pages. Wikipedia links to over 2.6 million news articles preserved there across 249 languages. Journalists use it to check what a page said before it was quietly edited. Researchers use it to track how stories evolve. Courts cite it as evidence. When articles get changed, removed, or memory-holed, the Archive is often the only place the original version still exists.

"Organizations like the Internet Archive are not building commercial AI systems," writes EFF's Joe Mullin. "They are preserving a record of our history."

Several publishers are currently suing AI companies over training on copyrighted material — a legitimate fight. But the Archive isn't part of that fight. Making material searchable is well-established fair use, as the Google Books case confirmed. Blocking a nonprofit archivist doesn't keep your articles out of training datasets. It just means nobody can prove what your article originally said.

"If publishers shut the Archive out, they aren't just limiting bots," Mullin writes. "They're erasing the historical record."

Previously: