NYPL: Seeking Ways to Provide Digital Access to the 20th Century’s Trove of Books : UC HathiTrust Support

(Published August 31, 2023)

This article is based, in part, on presentations delivered by Kathleen Riegelhaupt (Director, eReading, NYPL) and Greg Cram (Associate General Counsel and Director, Information Policy, NYPL) at the Google Books Summit on May 25, 2023.

Photo credit: NYPL/Jonathan Blanc

In the midst of the controlled digital lending hubbub of the last few years, The New York Public Library (NYPL) has been doing a bit of sleuthing to find innovative ways to fulfill its ambitious mission to provide “any book to anybody,” including in digital form. NYPL is unique among most public libraries: It is not actually a public institution, but rather a private non-profit, and it serves a unique combination of patrons at its over 80 neighborhood branches in New York City and in its function as a global research destination. NYPL’s reach is enormous, serving over 16 million library patrons a year. It was also one of the original 5 partners in the Google Books project (while UC was the 6th) and is a member of HathiTrust. Like many libraries, NYPL’s ability to provide digital access to certain types of books is often limited by both cost (in the case of ebooks that libraries can commercially license) or whether the work has been digitized (public domain books, which can be made fully available to patrons once they are digitized).

The challenge lies in figuring out how to provide digital access to in-copyright out-of-print volumes (generally books published from 1928 to the 1990’s, almost three quarters of the 20th century!). Even well funded libraries are prevented from providing digital access to these volumes. The lack of access to these books constitutes a huge gap in the scholarly record and is having repercussions on research. Copyright is the limiting factor: Research on each individual volume is often required to accurately determine the copyright status of the books in this corpus, which contains hundreds of thousands of volumes. Even when copyright research is conducted, there are many cases in which determining the legal copyright owner is next to impossible. In the course of its sleuthing, NYPL has discovered that even authors of these books are not always sure of the copyright status of their own works!

NYPL Digitizes the Catalog of Copyright Entries

In 2018, NYPL launched a project to digitize the Catalog of Copyright Entries, and to extract, parse, and transcribe the data to create a publicly accessible database. The database will make it easier to research the copyright status of books published in the United States in the 20th century. In particular, it will help researchers more easily determine whether the required copyright formalities (registration and renewal with the copyright office) of an individual book were met. If the formalities were not met, the book is in the public domain. Early in this project, using data from the digitized scans, Greg Cram made an educated guess that 65-75% of rights holders of books published between 1928-1964 did not renew the copyright registration of their works. This means that potentially hundreds of thousands of titles currently held by digital libraries that are closed for access because they are assumed to be in copyright, are actually in the public domain. HathiTrust’s Copyright Review team has been using data from NYPL’s efforts to help target books that are likely public domain to add to their queue for individual copyright review.

NYPL Seeks to License In-copyright Out-of-Print Books, and Fails

More recently, NYPL has engaged in two more projects to increase the availability and access of digital versions of in-copyright out-of-print books for their patrons. Both projects involve securing licenses (getting legal permission) from copyright holders to allow NYPL to provide digital access to these books. According to Cram, the problem with licensing books of this sort is that nobody knows which (if any) may suddenly become popular again and be a potential commercial success. Because of this possibility, authors and publishers often continue to hold on to their rights and don’t allow access to the books via a Creative Commons License or other means. In an attempt to mitigate this, Cram made sure NYPL had something to offer rights holders before he approached them.

In the first project, NYPL worked with the Authors Guild to locate the authors of in-copyright out-of-print books, whose books were published so long ago that any rights from the publisher should have reverted back to the author. NYPL offered these authors (or their heirs) the opportunity to easily sell their books online in return for a license to allow NYPL to provide access to their patrons. They were surprised to learn that many authors were not comfortable warranting that they were the copyright holder, as they could neither locate copies of their publishing contracts or remember what the contracts stipulated. For contracts this old, it wasn’t in the interest of the publishers that the Library contacted to search for, find, and interpret the contracts to identify the rightsholder and clear the work for the Library to serve digitally. In the end, NYPL secured only a handful of licenses from authors and began work to identify a more efficient process by which to clear these books.

Photo credit: NYPL/Jonathan Blanc

NYPL Launches the University Press Backlist Pilot Project

Cram continued to think about how NYPL might solve the problem, and his thoughts led him to the 2009 proposed Google Books Settlement. The settlement (which was ultimately thrown out by the courts) would have given Google blanket permission to make in-copyright out-of-print books available for access. If it had been upheld, the Settlement Agreement would have circumvented the need (for Google) to hunt down and interpret the publisher contracts for each individual book title. NYPL needed a similar blanket agreement if it was going to make good on its mission.

Ultimately, the NYPL team, including Cram, Kathleen Riegelhaupt and Elena Herzen, came up with an approach they think might succeed. They would seek blanket agreements from publishers to allow NYPL to make their entire back list available online. In exchange, NYPL would provide publishers with the circ data for each title and digital copies of the books. This way, the publisher would be informed if an older title became commercially viable. If a book did prove to be popular enough to warrant a new publication, NYPL would remove access (or purchase a commercial license). To make sure they had all bases covered, NYPL would also seek licenses from the authors under the assumption that either the author or the publisher (or both) must retain the copyright. This path would ensure NYPL had the required permission.

This approach, referred to as the University Press Backlist Pilot Project, is currently in the proof of concept state. The NYPL team is in discussions with the University of South Carolina Press, University of Massachusetts Press, MIT Press, and the University of Michigan Press about obtaining a license for NYPL to digitize and allow access to volumes on their backlists, in exchange for NYPL providing its circulation stats on the volumes back to the publishers. So far, the results look promising. If the pilot proves successful, NYPL hopes to expand the project to include other presses and publishers.

A Potential Backup Plan to Controlled Digital Lending?

Cram sees this strategy as an alternative to Controlled Digital Lending which is currently being tested in the courts in a 2020 lawsuit by four publishers against the Internet Archive. NYPL’s approach is to work collaboratively with authors and publishers to secure licenses to permit patrons to access in-copyright out-of-print works without impacting the commercial value of those titles. While getting licenses from both the authors and the publishers requires herculean time and effort on the part of NYPL, the approach is unlikely to be subject to infringement lawsuits and the potential slings and arrows of judicial rulings, or to be dependent on Congress to change (or further define) copyright law. And if successful, it will be a boon for scholars and researchers who cannot come in person to libraries to view these out-of-print titles in person.

Through engaging with copyright holders in this project, NYPL has learned that most authors and publishers want these works available. And NYPL wants to prove that there is public interest in making them available. According to Cram, “there is no data in the Hachette case - we want to produce real data to make decisions. Is there actual commercial value in these older works? If not, there is evidence for legislators and courts to rebalance fair use.”

NYPL seeks, through this project, to license open digital access to these volumes via the publicly available NYPL Digital Research Books platform, so that any visitor to the site can immediately access and read the work. Other libraries could -- to provide access -- simply point to the open copy in Digital Research Books and would only have to negotiate a separate agreement with publishers themselves if they sought to host a file of their book locally instead of pointing to NYPL's site. For a small minority of titles, access may be limited to NYPL cardholders.

There are currently no easy answers or routes to gaining legal digital access to these works. Given the complex circumstances, any and all legal paths to securing access to the 20th century’s wealth of research and literature are necessary and important. NYPL is a crucial pioneer in these efforts.