Search is a complex activity, and the larger the volume of data being searched, the more complex search becomes. The HathiTrust corpus consists of 17.5 million volumes comprising 6.2 billion pages. Given this breadth of content, crafting your search to retrieve accurate results can be an art form in itself. The HathiTrust home page has the familiar single search box, but the additional radio buttons and links to advanced search or search tips indicate that there's a lot more going on under the hood. Making sense of this complexity is challenging, but understanding how to use two core search options can help:
- Conducting a “Full-text” vs “Catalog” search (i.e. searching the text itself or the item's metadata), and
- Limiting search results to “Full View” reading-accessible volumes or including “All Items” regardless of whether the volumes are reading accessible or not.
Each of these options is described below.
Search Decision #1: Full-text or Catalog Search?
Determining whether to use Full-text or Catalog search depends on what you are trying to find and how broadly you want to seek it.
Full-text search looks for your search terms in the text of all the volumes in the repository, as well as the descriptive (bibliographic) metadata provided for all of the volumes.
Full-text search should be the most useful if you are searching for topics not included in formal subject headings: quotes; the title of an article or story within a larger publication (such as a periodical or anthology); or any names, places, and phrases that are unlikely to appear in the bibliographic metadata.
Pro-tip => The amount of data being searched - 6.2 billion pages plus 17+ million bibliographic records - is massive, so being precise is important.
Caveat => The quality and consistency of the OCR text affects the accuracy of the search and the OCR can vary greatly across the corpus, based on the language of the text, the digitization agent, the date of digitization, and the OCR software used. Fortunately, the majority of the HathiTrust collection was digitized by Google and has very good and continually improving OCR.
Catalog search looks for your search terms in the bibliographic metadata alone, not in the OCR text extracted from the volumes.
Catalog search should be the most useful when:
- Looking for
- a monograph with known title, author, publisher, and/or OCLC number
- a serial or monographic series with known title
- Searching by Library of Congress (LOC) subject headings
You can also make results more precise by limiting your Catalog Search to specific metadata fields using the “All Fields” drop down list.
Pro-tip => Search by OCLC number using the “ISBN/ISSN” option.
Caveat => Catalog search is limited by the quality of the metadata provided by member libraries. When the metadata for a volume is incomplete, inconsistent, or inaccurate, that volume may not show up in searches for which it is otherwise a good match.
Search Decision #2: Full View or All Items Search Results?
From within a set of search results, you can decide whether to see just “Full View” volumes or “All Items.”
Full View Search Results
Full View volumes in HathiTrust are those that anyone may access and read. For users in the United States, Full View volumes make up roughly 40% of the collection (about 7 million volumes) and are mostly public domain (often published before 1928* and/or are US federal government documents).
The default search will only display results for volumes that are “Full View” as determined by the user's location (either within or outside the United States). Within the search results, there is also an option to switch the view to “All Items”, whether they can be accessed and read or not.
All Items Search Results
“All Items” search results include everything in the HathiTrust collection: Full View volumes, as well as “Limited (search-only)” volumes. “Limited (search-only)” volumes in HathiTrust are restricted for access because they are (or are assumed to be) in copyright. No one may access “Limited (search-only)” volumes for reading purposes.
How are "Limited (search-only)" volumes useful and why would someone want to include them in their search results?
- They help with discovering and finding physical copies you may have access to. Each result offers a number of ways to locate a physical copy through the "Get This Item" section on the left-hand side of the book reader. There are links to find the book in a library or on Google Books (where there are links to purchase a physical copy online at AbeBooks, Amazon, etc.).
- Though users can’t read the pages of these volumes, the contents can be searched for keywords or phrases, and the number of hits and the pages upon which they are found are displayed.
While searching HathiTrust may seem complicated at first, there are some easy options for optimizing your search results. If you don’t get the results you want at first, try a different search method and see if the results improve!
*This date is incremented one year every January 1.