This article was co-written by Kris Kasianovitz, director of the Institute of Governmental Studies Library at UC Berkeley.


The Question: Where Can PDF’s of Government Reports be Preserved for Easy Discovery?

The Digitization Team was contacted by a UC librarian who wanted to know if it were possible to upload PDFs into HathiTrust. A professor at her campus had downloaded a collection of reports from the U.S. Surgeon General that she could not readily locate in a government repository. The reports were archived in the End of Term Web (EOT) Archive, but the professor was worried they might not be easy for the public to locate, so the librarian contacted CDL to learn if it might be possible to upload them to HathiTrust.


We received this question early in 2025 when librarians were suddenly facing unprecedented threats to online government information posed by the new U.S. administration. The Digitization Team could provide an answer about HathiTrust - that PDFs are not accepted - but we needed and sought help from colleagues who specialize in government information to help find solutions for preserving the PDFs. 


Given the ongoing threat to government information, we have documented what we learned with our colleagues as we worked together to find a home for the Surgeon General’s reports. Kris Kasianovitz, Library Director at Institute of Governmental Studies Library at UC Berkeley, generously agreed to offer her insight and co-author this article. We want to thank her and as well as other colleagues who suggested paths for the preservation and discovery of these materials, namely Kate Tasker, director of the Industry Documents Library at UC San Francisco, and James R. Jacobs, US Government Information Librarian at Stanford University.


An Unprecedented Threat to Government Information

The responses we received from the government information colleagues we contacted revealed that the problem of preserving born-digital materials is becoming more and more urgent for the library community. The issue isn't just about a single collection of reports; it's about a growing and unprecedented threat to the integrity of government information itself.


While these materials have always been at risk, the reasons for their disappearance were once more innocuous. Government information librarians used to deal with link rot, occasional human error, and, in some isolated cases, removal due to national security or policy changes. Now, however, they are experiencing something far more pernicious: the intentional and widespread erasure of government information promulgated by executive orders and unchecked presidential power.


The new administration has been mandating the removal and modification of information and data at an unprecedented rate. Alarm bells began sounding in January 2025, rapidly reaching a crisis point by February 2nd, when the New York Times reported that over 8,000 federal government sites had been purged. This deliberate removal of taxpayer-funded public records begs a fundamental question: when the U.S. government no longer acts as a reliable steward of its own information, how can librarians, and the public they serve, be assured that these important publications are collected, preserved, and easily discoverable for the future?


Barriers to the Preservation and Discovery of Born-digital Government Information

For decades, libraries have grappled with how best to effectively collect, preserve, catalog, and provide access to born-digital government publications in ways that are easy to discover. When government content began migrating online, librarians worked to ensure this critical information was protected from the risks of link rot, technological obsolescence, or removal by the government itself. There is now a suite of potential platforms for this purpose, including:


  • Internet Archive’s Archive-It:  A subscription web archiving tool which can collect entire websites or discrete PDFs.
  • Institutional repositories: Examples include the Rosetta Repository at the California State Library and the Stanford Digital Repository. These allow librarians to upload PDFs one-by-one or in bulk, and provide stable storage with descriptive metadata.
  • Cybercemetary and similar collaborative projects between universities and government agencies like NARA and GPO. These use a combination of tools to ensure that soon-to-be defunct government agency websites and publications are collected and preserved.
  • GovInfo.gov: A Trusted Digital Repository (TDR) managed by the Government Publishing Office. With librarian engagement, titles can sometimes be added here, ensuring preservation in a federally managed system.


These options are essential, but they do not necessarily ensure easy discoverability. Each option requires a user to go to the organization’s website to conduct a search – and this, of course, requires that users are aware of these organizations and how to find them.


Why HathiTrust is Not the Solution (Yet)

Researchers often use HathiTrust to assemble a corpora of texts for their research, and HathiTrust contains hundreds of thousands of federal government documents: it therefore makes sense that the professor looked to HathiTrust as a viable option to upload this material for safekeeping and improved discoverability. However, HathiTrust’s vast collection of federal government information was scanned from print volumes, and HathiTrust does not currently include born-digital content in its repository. While the organization hopes to have the capacity to do so in the future, there is currently no way to deposit PDFs or other forms of born-digital content. While UC’s investment in HathiTrust remains vital for digitized print federal documents, it is not currently designed for PDF deposit workflows.


What Librarians Can Do

The challenge is immense, but the library community is not simply standing by. In addition to pointing to repositories where born-digital information might be contributed, and helping to find government information that has been removed, the broader community is engaged in projects that are working to ensure the "people's information" persists, providing a lifeline for our shared cultural record in a time of unprecedented risk. The time to act is now.


If you have more information or ideas to add to the list, please us know.


Resources

Options for Preserving Federal Government PDF’s

The FDLP has a process to report “unreported documents” to the Government Publishing Office (GPO), including sending PDFs to them to catalog. The GPO is required to  collect and preserve these materials. When "unreported" content is contributed, they catalog it, make the bibliographic data available via the Catalog of Government Publications/OCLC, and add a digital copy to their digital repository. So far, there is no indication that they are shuttering this work.


The Internet Archive has been designated as a Federal Depository Library and anyone can upload files, including PDFs. The only requirement is that a user have an account, which is free. Contributors can select a Creative Commons License or choose public domain when files are uploaded. It’s also possible to batch or bulk upload files using the Internet Archive’s Command-Line Tool.


Internet Archive will create a discrete collection for an individual or organization who uploads 50 or more related items that have the same media type (such as PDF). Collections have a landing page that includes a description, and provide faceted search, along with some usage statistics. For an example of a collection, see the “Reclaim the Records” collection  from the Maryland State Archives.


It is also possible to archive unique web pages using the Open WayBack Save Page Now function, although the PDF upload is a better option if you have multiple files or titles.


Internet Archive’s Democracy’s Library is a compilation of 700 collections from over 50 government organizations and includes close to a million government publications in multiple  mediums, including reports, videos, and archived websites. If you create a relevant collection of government documents, it can be added to Democracy’s Library.


The Stanford Digital Repository is configured to handle PDF uploads for important documents that are out of scope for the FDLP. Once in the repository, publications can be made available worldwide. While the contributors are generally members of the Stanford community, James Jacobs jrjacobs@stanford.edu is willing to discuss potential solutions for government information. The Stanford Library Catalog has a filter that searches only government information in their collections - limit to “Digital Only” to get materials that have been deposited into the SDR.


  • The Rosetta Repository at the California State Library (CSL)

The California State Library has an online federal government documents collection that is capable of ingesting PDF’s. Per the April 3, 2025 CSL webinar on Federal Government Information: How to Locate and Access Online Resources, Bradley Seybold, the regional federal documents librarian, would likely be willing to discuss deposits of materials that are appropriate for the collection. Contact cslgps@library.ca.gov.


Stay Informed and Get Involved - Current Awareness Resources

America's Essential Data, a collaborative effort led by former U.S. Chief Data Scientist Denice Ross, is dedicated to documenting the value that federal government data provides for American lives and livelihoods. The Dearly Departed Datasets serves as an in-memoriam, tracking the loss of federal datasets that once served the American people. The Use Cases track how these datasets are actively used by governments, researchers, and communities. You can get involved by sharing the site on social media, connecting related efforts, submitting a use case, or hosting a group session.


Created in 2004 by librarians as a place where those who have a stake in the preservation and free access to government information (libraries, government agencies, non-profit organizations, journalists and researchers) can initiate dialogue and build consensus around these issues. FGI promotes free government information through collaboration, education, advocacy, and research.


This page, maintained by government information librarian Kelly Smith at UC San Diego, is a great way to track what is currently happening in U.S. government information. The Weekly Roundups provide awareness and links about federal government activities and reports. The roundups are available going back to August 31, 2019.


    The bot posts whenever a Federal US .gov domain is either added or removed.


THE place to follow all things gov info, GOVDOC-L is the primary discussion forum about government information of all types (federal, state, local, and international) with an emphasis on the Federal Depository Library program.


A much needed and must-read publication by librarians, James A. Jacobs and James R. Jacobs. This 2025 volume is available as a free ebook download but may also be purchased as a print volume.

 

Listen to a timely conversation between James Jacobs and Jim Jacobs, guided by Shari Laster, about their publication and safeguarding public information in the digital age. Shari will serve as UC’s inaugural Associate University Librarian and Director of Systemwide Library Facilities starting in January 2026. 


 

Born-Digital Government Information Preservation Projects

A community project that tracks actions by the current administration that impact government information. Users are invited to submit government resources that have been removed or censored. The goals of the project are to help the public see the scope of what has been removed, modified, or impacted; and to (when possible) point to where preserved copies of missing documents can be found.


The EOT Web Archive captures federal government web sites at risk of changing or disappearing during the transition to a new presidential administration. It includes federal government websites (.gov, .mil, etc) in the Legislative, Executive, or Judicial branches of the government. Established in 2008, the EOT was created as a collaboration among partners and volunteers from academic libraries, Internet Archive, and non-profit and governmental organizations.


An extension of past End of Term projects, this collaborative project is intended to document the federal government's web presence by archiving government websites and data. In an effort to preserve changing or disappearing government information.


The Data Rescue Project started in February 2025 as a coordinated effort of three data organizations, including members of IASSIST, RDAP, and the Data Curation Network. Their goal is to serve as a clearinghouse for data rescue-related efforts and data access points for public US governmental data that are currently at risk. They are working to track and coordinate efforts which include: data gathering, data curation and cleaning, data cataloging, and providing sustained access and distribution of data assets.