The Internet Archive just hit a number that sounds abstract until you think about what it means: one trillion webpages saved. Since 1996, this nonprofit has been quietly building a backup copy of the internet itself—a project that started feeling urgent around 2019, when MySpace's server migration accidentally deleted 50 million songs from 14 million artists in a single afternoon.
That's the thing about digital content: it doesn't persist unless someone actively maintains it. A website can vanish when its owner loses interest, when a company shuts down, when a server fails. The internet, despite being everywhere, is fundamentally fragile.
So the Internet Archive built web crawlers to automatically capture publicly available websites, and invited volunteers to upload everything else—old books, obscure music, documents that might otherwise disappear. After nearly three decades, they've collected more than 866 billion webpages, 41 million texts, and millions of other digital artifacts. They're adding around 500 million new websites every day. The total storage: 100,000 terabytes, or roughly the capacity of 50,000 of today's highest-end iPhones.
We're a new kind of news feed.
Regular news is designed to drain you. We're a non-profit built to restore you. Every story we publish is scored for impact, progress, and hope.
Start Your News DetoxThis matters more than it might seem. Journalists use the Archive to verify what a news outlet said five years ago. Researchers trace how misinformation spreads. Historians document how the web has changed. Ordinary people check what their favorite website looked like before a redesign.
The New Pressure
But the Archive is now caught in a collision between two forces. On one side, tech companies are scraping the internet to train AI systems—often in legally murky ways. On the other, major media outlets including The New York Times, The Guardian, and USA Today have started blocking the Archive from preserving their newer content, worried that their articles will end up in AI training datasets without compensation.
It's a legitimate concern. Writers and publishers deserve to be paid for their work, and right now, there's no clear legal or financial framework that covers this scenario. The tension is real. But it also creates a paradox: the more content that gets hidden from the Archive, the harder it becomes to preserve what might be the most fragile information ecosystem we've ever built.
The Archive will keep growing toward its two trillionth webpage. But its future depends on finding a middle ground—ways to protect creators' rights while keeping the permanent record of the internet intact.









