As reported by Nieman Lablast month, some major media organizations—including The New York Times, The Guardian, and Reddit—have started blocking the Wayback Machine from archiving their sites over unfounded concerns about AI scraping.
Last week, tech writer Mike Masnick (Techdirt) explained why this is “a mistake we’re going to regret for generations.”
Today, Mark Graham, director of the Wayback Machine, has published a response to the Nieman Lab reporting, pushing back on the media organizations’ concerns about the Wayback Machine being a backdoor to AI scraping. Graham writes:
“These concerns are understandable, but unfounded… like others on the web today, we expend significant time and effort working to prevent such abuse.”
Read the post to learn how Graham is working to protect the integrity of the Wayback Machine, and why limiting web archiving threatens our shared digital history.
Link rot. There’s nothing quite as frustrating as clicking on a link that leads to nowhere.
WordPress, which powers more than 40% of websites online, recently partnered with the Internet Archive to address this problem. Engineers from the Internet Archive and Automattic worked together to create a plugin that can be added to a WordPress website to improve the user experience and check the Wayback Machine for an archived version of any webpage that has been moved, changed or taken down.
The free Internet Archive Wayback Machine Link Fixer, publicly launched last fall, combats link rot by seamlessly redirecting the user to a reliable backup page when it encounters a missing page. When the plugin is added to a website, it will do a scan, see what pages exist, and then automatically save those pages to a queue to be archived. If it doesn’t exist, then it will be sent for capture.
Once the software is installed on a WordPress website, the plugin will auto redirect users to the Wayback Machine version of a missing page.
Broken links are one of the web’s most relentless problems. Pew Research found that 38% of the web has disappeared over the past decade and for web admins, “It’s a never-ending game of whack-a-mole to keep links working,” said Matt Blumberg, Product Manager with the Wayback Machine. “This new tool prevents those inevitable 404s by automatically updating links to a preserved copy and it proactively archives pages in the Wayback Machine, where they’re kept accessible for free, long-term, so your site stays usable without manual fixes.”
“It’s very important that websites have a memory and that the web overall as has a memory. We are increasingly using [the web] as our only source of truth. When links go dead, in effect, the truth goes dead. This has become even more important in the world of AI.”
Alexander Rose, Director of Long-term Futures for Automattic Inc.
Many WordPress websites are homespun and are most susceptible to having links go dead. Remedying this problem is not only valuable to individuals, but also to the overall culture, said Alexander Rose, Director of Long-term Futures for Automattic Inc., the technology company behind WordPress.com.
“We need to have an accurate memory of the things that get said, posted, and the ways that we have communicated over time,” Rose said. “Otherwise we’re either doomed to repeat errors or we’re going to make choices that are uninformed by the past.”
The link fixer is expanding the “heroic effort” made by the Internet Archive over the years to preserve everything from small websites to NASA.gov and WhiteHouse.gov, he said.
“It’s very important that websites have a memory and that the web overall as has a memory,” Rose said. “We are increasingly using [the web] as our only source of truth. When links go dead, in effect, the truth goes dead. This has become even more important in the world of AI.”
As the plugin rolls out, Rose and Blumberg said they are open to feedback. The goal is to make the software as easy as possible to use. Next, they will fine tune the features and promote its broad use.
“As it becomes a solid piece of software that people know and like, then I think it has a path to being integrated much more deeply,” Rose said. “It’s early days, but every person I’ve talked to about it is excited to see the potential end of the dreaded 404 error.”
Digital journalists increasingly turn to web archives like the Wayback Machine to follow how things on the Internet break, change or disappear – from deleted posts to quietly edited pages.
The web has become not only a source of information but also the subject of media investigations, prompting journalists, researchers and activists to use digital archives to reconstruct timelines, verify claims, uncover hidden connections and hold powerful actors to account.
As online materials grow more fragile and prone to disappearance, the Internet Archive’s Wayback Machine has been critical in making “lost” web pages available – recently celebrating archiving over a trillion pages.
We are also interested in how others use web archives across fields, and what we can learn from each other.
In this piece we draw on the Internet Archive’s News Stories collection to surface practices and use cultures of the Wayback Machine amongst journalists and media organisations. We analysed a dataset of about 8,600 news articles, assembled by the IA via daily Google News keyword searches since 2018.
Drawing on a combination of digital methods, machine learning and lots of reading – we surfaced nine ways that journalists use the Wayback Machine in their reporting.
***
1. following what is deleted
Shifting political alliances are a common driver of online footprint erasure. Deleted tweets have revealed past critics in current allies (here and here), and current career aspirations were juxtaposed with earlier conflicting stances in personal blogs and websites (here, here, here and here).
Unannounced takedowns of collections or site sections on government websites often prompt investigations using archival snapshots. Examples include removed editions of presidential newsletters and deleted staff contact lists for services supporting vulnerable groups, signaling access-to-information breaches.
The removal of official publications also enticed further contextualisation, revealing cases in which information was deleted due to being incomplete, inaccurate or inconveniently timed.
Beyond politics, erasing on corporate websites highlights commercial and reputational pressures, such as deleted statements on forced labour, product safety and climate deception.
2. following what has been altered
Subtle alterations on webpages can also reveal a plain-to-see effort to reshape narratives.
In other cases, small additions to online content have proved just as revealing. A before and after snapshot of a blog post showed how a supposed early warning about a virus threat was added only after the pandemic began. Similarly, changes to a social media platform’s API rules appeared shortly after third-party apps were banned, subtly reframing the policy to align with new restrictions.
3. following what is banned
Sometimes removals are deliberate, often at the request of companies seeking to enforce copyright, control branding, or limit liability.
Archived snapshots are also often the only way to reconstruct what preceded a link break, when it happened, and what information was effectively cut off.
For example, an investigation into a set of broken URLs on a government website revealed that the pages themselves had not been removed, but the links pointed to outdated servers, creating a false impression of secrecy that sparked a conspiracy theory.
In another case, a major technical glitch took multiple Nigerian government websites offline, cutting off access to official information and showing how even unintentional failures can undermine transparency.
5. following what is hacked
Compromised versions of hacked websites and social media accounts present another form of using archived snapshots as traceable historical record.
For example, past screenshots of Twitter’s bio page revealed inconsistencies in claims about an alleged takeover of the US president’s social media account. In other cases, such snapshots helped surface a forensic trail and distinguish unauthorised activity carried out by activists (here and here) from the ones linked to cybercriminal groups (here).
6. following what is connected
Archived web data often uncovers unexpected linkages between domains’ ownership that appear unrelated on the surface.
For example, journalists used analytics codes of copies of sites maintained by the Wayback Machine to uncover disinformation networks. In another investigation, archived records verified that a website redirect to Joe Biden’s presidential campaign was unrelated to him, debunking conspiracy theories about the domain’s ownership.
Snapshots of a fake Black Lives Matter Facebook page and its associated websites allowed reporters to trace the individuals behind the operation. Similarly, archived versions of Amazon storefronts exposed networks of accounts generating affiliate revenue from coordinated product listings.
7. following what is reported
Archived web pages have proven vital for tracing how stories are presented across media outlets and platforms.
In another case, snapshots of the Google homepage captured during the 2018 State of the Union speech disproved a viral claim that Google ignored Donald Trump’s address in favour of Barack Obama.
8. following what is unchanged
In other investigations, the most revealing detail is what did not change.
For example, during a bushfire crisis in Australia, archived pages showed that a key policy statement by the Greens party was left untouched, despite a disinformation campaign claiming to the contrary.
Similarly, a social media account circulated as having been reactivated under a new wave of laissez-faire moderation was, in fact, never suspended.
9. following what is saved
When forums, platforms and websites vanish, it’s the work of crowdsourced archivists that capture their traces before they vanish for good.
These are some of the ways we’ve noticed journalists using web archives – and there are many more! If you know of other interesting examples, we’d love to hear from you.
We hope that these nine ways may help to inspire critical and creative uses of web archives to “follow the changes” – exploring what they can tell us about digital culture and society, and the times we live in.
Thais Lobo is research associate at the Department of Digital Humanities, King’s College London, with a previous career in journalism.
Jonathan W. Y. Gray is Co-director of the Centre for Digital Culture and Reader in Critical Infrastructure Studies at the Department of Digital Humanities, King’s College London. He is also co-founder of the Public Data Lab; research associate at the Digital Methods Initiative (University of Amsterdam) and the médialab (Sciences Po, Paris). More about his work at jonathangray.org.
Liliana Bounegru is Senior Lecturer (Associate Professor) in Digital Media, Culture and Society at the Department of Digital Humanities, King’s College London. She is also co-founder of the Public Data Lab, member of the Digital Methods Initiative at the University of Amsterdam and associate of the Sciences Po Paris médialab. More about her work can be found at lilianabounegru.org.
How do you commemorate the preservation of 1 trillion web pages in a zine? That was Megan Lotts’ challenge when she was contacted by the Internet Archive last summer.
Lotts is an art librarian at Rutgers, the State University of New Jersey, where she promotes creativity, play, and makerspaces through her teaching and research. She designs zines (short for magazine), which are self-published, handmade objects that are often copied and shared. It was through Lotts’ involvement with zines at the American Library Association (ALA) conference that she was asked by Internet Archive librarian Chris Freeland to create one for the Internet Archive’s October celebration.
For the project, Lotts collaborated with Louisa Cohen and Drew MacDonald at the Internet Archive on images and text to incorporate. Although an avid user of the Internet Archive, Lotts said making the zine prompted her to take a deep dive and discover all new material.
“As a librarian, this is a space where you go for history,” she said of the Internet Archive. “I’m a kind of curious, reflective person, but there were collections that I came across that I didn’t know existed.”
The final product is an 8-page zine that Lotts has shared on the Internet Archive, along with a close-up view of the pages. It includes the Wayback Machine logo, icons of various collections, an old Polaroid photo of Internet Archive’s digital librarian, Brewster Kahle, next to a vintage computer.
The zine was printed and shared with attendees at the Oct. 22 Internet Archive party in San Francisco. Lotts took a week off from Rutgers to help unveil the zine at the festivities. Upon returning to Rutgers, she said it was fun to show students her work and explain the process. They were excited to hear about her experience, Lotts said, and what she learned behind the scenes at the headquarters.
“My students grew up with the Wayback Machine. They’ve used it since grade school,” said Lotts, 51, who remembers first accessing the Archive in college. “If you think about 1 trillion pages in less than 30 years, that’s outrageous. It’s preserving information for posterity.”
Zines need to be preserved, Lotts maintains, along with other art and cultural artifacts.
Librarian and creator Megan Lotts.
“When I give someone a zine, what I’m really hoping is that I’m giving you a moment,” Lotts said, “whether you recognize it or not, to hold this in your hands and get lost from the rest of the world. It’s just a tiny little book … I want people to look at it and think about it. That’s the beauty of the zine.”
Zines can be as elaborate as the one she produced for the Archive, she said, or as simple as creating something with a piece of paper, pen or pencil and an idea. “Those are things that most of us can access and everybody has a story,” said Lotts, who hopes the project inspires people to consider tapping into their creative side to make a zine.
“I’m noticing—as a scholar and as an educator—that people want to engage with the arts. They want to be creative,” said Lotts, who has degrees in fine arts, library science, painting and art history and teaches a class on play. “It’s really powerful for me to see students come alive and think about information and knowledge creation in a playful and exciting way.”
Audrey Witters remembers the creativity of the early web.
Audrey Witters
When she was launching her career in the mid-1990s, being online was more about exploring and having fun than figuring out how to make a return on investment. Witters said if you were curious about someone’s web page, you could simply click to see their code or email them with questions. She enjoyed how accessible the early web community was and the feeling of connection.
Now a business consultant in San Jose, she spoke at the Internet Archive’s Oct. 22 celebration, praising its efforts to save digital content and encouraging innovation through experimentation.
Watch Witters’ remarks:
“Thank you to the Internet Archive for preserving the history of the early web, that time of collective effort and quirky, chaotic creation, so that we can have really fun moments of nostalgia,” Witters said from the stage, “but even more so that the next generation of creators can be inspired to find their own ways to promote exploration, collaboration and joyful expression.”
Witters shared the story of her career and the influence the internet has had on her work before there was much pressure to monetize content.
Witters’ famous animated GIF
After graduating with a degree in electrical engineering from Cornell University, Witters built her early career in the tech sector. Witters garnered attention for helping design a small, animated alien GIF at a graphic art software company. Her work was featured in a 1996 book, GIF Animation Studio, by Richard Koman.
In those early days, it was exciting to come into work each morning to see if any new web servers had launched, Witters said. She was on the lookout for new and interesting approaches to digital layout, movement, or interactivity. She followed a graduate student posting pictures of his daily vegetarian lunch – a forerunner of the food bloggers – and witnessed the beginning of e-commerce. Content was diverse and the web reflected a diversity of voices.
Witters leveraged what she learned to develop an expertise in project management, and said she’d like to see more of that early online creativity carried over to confront today’s challenges.
“Business relies on innovation. Innovation is based on creativity, and creativity comes from fun,” she said. “We don’t have a lot of time for fun these days.”
Prioritizing profit without including time for play is not good for individuals, society or businesses in the long run, Witters maintains. As systems evolve, creativity is needed to meet changing demands and unleash new ideas.
For 20 years, Witters worked at Stanford University in the Graduate School of Business, including a decade as the inaugural managing director of online executive programs. Following that role she founded her own company, Learning Impact Advisors, helping higher education clients develop career programs that amplify their mission.
Witters recalls with fondness the “Wild West” days of the early web: “It’s important to preserve that spirit and be inspired by it.”
When someone calls up a single webpage in a digital archive, it’s difficult to understand the scope of the collection. To improve the visibility and appreciation of its resources, the Internet Archive Europe partnered with software engineers and the Internet Archive to develop an interactive display that gives users a sense of what all is available at their fingertips.
This fall, an installation was unveiled in the Netherlands and later demonstrated by Internet Archive Founder Brewster Kahle at the October 22 celebration in San Francisco.
“The idea is to be able to show and play with the breadth that people have accomplished and the depth that we have all built together,” Kahle said. “This is the web we built. This is the web that we want. This is the web we want to make go from 1 trillion to 2 trillion to 3 trillion.”
The initial display included screenshots of more than 85,000 Dutch websites preserved over the past 30 years. Visitors to the National Library in the Netherlands used a physical joystick and buttons to explore a variety of webpages in a game-like experience. With their voices, they can direct the machine to zoom in on specific topics or domains. The screenshots are laid out in a semantic grid, where websites with similar topics appear together in a cluster. Both topics and layout are extracted using AI–based tools (VLM, embeddings).
The idea started with Kai Jauslin in 2020 when he was working with the Swiss National Library to help the public visualize its digital collection. Jauslin, a software engineer and owner of Nextension.com, https://www.nextension.com/ and Barbara Signori, a digital librarian, created an interactive display that went live in 2021, reflecting 80,000 snapshots of archived web pages in the Swiss library collection. (It has since grown to more than 115,000.)
Once Kahle saw the Swiss project, he was interested in developing something similar using the Wayback Machine. In January, Jauslin got the green light to make the project open source so he could reuse everything he’d developed for the Swiss library for the Internet Archive Europe. He then collaborated with a team at the Internet Archive including Jefferson Bailey, director of archiving and data services.
“One of the goals of this project was to be able to show the depth [of the collection] and how big everything is,” Jauslin said.
Bailey extracted the data, made over 1 million screenshots, created formatting to adapt the project framework to feature webpages from the Netherlands collection. The screenshots were used in the interface backed by the Wayback Machine.
“This showcases these collections and makes them more tangible and usable in different ways,” Bailey said. “It’s not just looking at the archive copy of one website, but looking at all of them and searching across categories. You can zoom in and zoom out with functionality that was not available before. It showcases these collections. “
In addition to being a cool tech project, Bailey said, the display has an advocacy element in helping demonstrate the value and scope of digital collections. The display is a good “public engagement” opportunity that lets library patrons interact and grasp the scale of the available resources.
The visibility is a useful tool in making the case to funders and the government to support open resources and library preservation.
At the National Library of the Netherlands, Sophie Ham, curator of the digital collection, said the display shows that life on the internet is worth preserving.
“We were very enthusiastic about this concept [of the display] because our web archive is very hidden. People barely know it’s there,” Ham said. “We need people to acknowledge the importance of a web archive – but to acknowledge it, you have to make it visible and more attractive.”
The display made the collection visible, she said, and the low-barrier, interactive element has been embraced by visitors.
“It helps us get into people’s mind that web archives are as important as books in collections of national libraries,” Ham said.
As technology advances, Jauslin said he hopes the project will continue to expand; Bailey added the hope is to customize the display to other national libraries that express interest.
Erin Malone, the user experience designer behind Kodak’s first website, looks back on the early web with the story of how she and a colleague built the company’s inaugural homepage in 1994, before most of marketing even knew what the web was.
Fresh out of grad school and self-taught in HTML (as everyone was at that time), Malone helped create a pioneering site that today lives on in the Wayback Machine. Her testimonial highlights just how radical those early experiments were, and why preserving them matters.
“Another person in the design group that I worked in…suggested, ‘Why don’t we build a website for Kodak?’ And since I had done a website, I was like, sure, let’s do it.
And we asked our boss if that was OK. And he said, ‘Yes,’ because I don’t think he really knew what we were talking about.”
Erin Malone, interaction designer
When I got out of grad school, I started working at Kodak. And in 1994, Mosaic came out. I had just taught myself HTML and another person in the design group that I worked in, his name was Frank Marino, suggested, “Why don't we build a website for Kodak?”
And since I had done a website, I was like, sure, let's do it.
And we asked our boss if that was OK. And he said, yes, because I don't think he really knew what we were talking about. And, you know, marketing wasn't really into the web yet. And they didn't have any objections.
So we built a website that was essentially a big image map with four images coming out of the center. And I think each one linked to, I don't know, a white paper or a page with just some text on it.
We built that in, I think,'94. I think what the Wayback Machine has is dated from 1996, but it's the same image, the same homepage. And it was pretty radical at the time.
From CNN: The Internet Archive has been saving web history for nearly 30 years. CNN’s Hadas Gold goes inside its headquarters to see how the archive is innovating for the AI age and protecting itself from both political and physical threats.
Jean Armour Polly—better known as the Net-mom, and the person who helped popularize the phrase “surfing the internet”in 1994—adds her voice to the celebration of the Internet Archive’s 1 trillionth webpage preserved.
In her message, Polly reflects on the ephemerality of the web—how sites appear, vanish, change, or are censored—and why the Archive’s ability to reveal these shifts is essential to understanding not just events, but who was speaking, who wasn’t, and whose voices history might otherwise forget. Drawing on her own work digitizing fragile Civil War pension files, she compares the care of digital preservation to rescuing stories from dusty barns and bringing them back to life. Polly honors not only creators, but also the librarians and archivists who ensure that our cultural record endures.
“Without [Internet Archive], we risk not only losing the websites themselves, but the story of how society and culture has been shaped by them.”
Jean Armour Polly, Net-Mom
Hi, I'm Jean Armour Polly, also known as the Net-mom.
It's because in the early days of the internet, I helped a lot of people take their first baby steps on it. But I'm here today to help congratulate and celebrate the Internet Archive's 1000000000000th webpage archived.
That's just an amazing number. Wow. Because websites are ephemeral. They come up, they go down, links are added, links are deleted. Sometimes they're even censored. The archive reveals all these changes though, and that's important.
It's important for us to not only see how events were covered, but who was talking about them, what they were saying, and sometimes it's even as important or maybe more important about who wasn't talking and whose voices weren't heard.
The archive might even become the Rosetta Stone for future digital archeologists trying to decipher the hieroglyphs of emojis or inscrutable memes.
I have some experience with digitization myself. In recent months, I've been a volunteer at the New York Genealogical and Biographical Society's Digitize New York project. Here where I live. We've been scanning and digitizing a huge cache of Civil War pension documents that had formally been in a lawyer's office, but since 1930, they've been stored in Campbell's soup boxes in a dusty old hay barn.
When I scan something, I think of the soldier and the story that I'm helping to preserve, because it wasn't just about grievous war wounds or diseases he had picked up, but also about his family history, about camp life, about troop movements and battles, things you just can't find in a history book.
And I think about his family, I think about him when I scan these documents, but I also think about who had the forethought to save this stuff, and not just toss it or shred it or burn it, but to keep it in hopes that some day somebody would come along and rescue it, digitize it, so the stories would live.
And that's what the Internet Archive has done and will do. It's so important. Without it, we risk not only losing the websites themselves, but the story of how society and culture has been shaped by them.
So many kudos to the content creators, but also don't forget the critical work of the librarians and the archivists who have preserved them.
Save our stories, protect the past, and help shape our future.
Katherine Maher, President and CEO of NPR, honors the Internet Archive’s milestone of 1 trillion web pages preserved as “1 trillion artifacts and snapshots of our interconnected world.” In her message, Maher celebrates the Archive’s role in protecting the integrity of the open web—keeping news, public discourse, and our shared stories freely accessible to all. She draws parallels between NPR and the Internet Archive, highlighting their shared commitment to access to information, public service, and strengthening societies through knowledge and dialogue. As Maher notes, in an era when information “emerges suddenly, decays rapidly, and disappears instantly,” the Archive’s preservation work is more critical than ever.
“At NPR, we share many common values with the Internet Archive, a deep commitment to access to information, a dedication to public service, and a belief in strengthening societies.”
Katherine Maher, President and CEO of NPR
Hello everyone. I'm Katherine Maher, president and CEO of NPR. It's an honor to join you today in celebrating a truly historic accomplishment and one close to my heart.
Congratulations to the Internet Archive and everyone who contributed to this milestone of 1 trillion webpages. That's 1 trillion artifacts and snapshots of our interconnected world. It's a testament to the Internet Archive's unwavering commitment to safeguarding the integrity of the open web and its history, ensuring that this vast digital record remains free and open for everyone.
At NPR, we share many common values with the Internet Archive, a deep commitment to access to information, a dedication to public service, and a belief in strengthening societies. Through information and dialogue we live today, in an era in which information is unstable, it emerges suddenly decays, rapidly, disappears instantly. It's increasingly difficult for anyone to build stability on this volatility, whether you're an independent learner or society seeking common ground.
So in this moment, the Archive's role in preserving news, public discourse and our shared stories is more critical than ever. The internet is today's living historical record, a cultural mirror reflecting our society, who we are, where we come from, what we perceive matters, how we connect, and how we make sense of ideas, events, and one another.
By preserving this record, the Archive helps us remain grounded in what we know and what we think we believe and accountable to how we change and evolve over time. It supports vital research and allows us to understand current events within a broader context. This preservation counters the challenge of disappearing news and loss, meaning online. It provides us an enduring resource for journalists, scholars, and the public alike. It protects our shared stories and it strengthens our civic dialogue.
So let's all celebrate this incredible milestone together. The Internet Archive and the Wayback Machine are trusted, vital resources, and we at NPR are proud to stand with you in this important work. Thank you. Please keep it up. Keep on keeping on.