Zephir, the HathiTrust bibliographic metadata management system, is managed by CDL’s Discovery & Delivery team. In this advice column, Barbara Cormack, the metadata analyst for Zephir, answers common questions for contributing records to Zephir. While these questions were written by fictitious authors, you are welcome to submit your questions to Zephir (email: zephir-help@ucop.edu).


Dear Zephir,
I’m very interested in the works of Fiona Macleod and was pleased to see so many works by this author in HathiTrust. I was doing some research and found six hits for the title The winged destiny; studies in the spiritual history of the Gael. I thought Zephir was supposed to group or cluster similar records together. Why am I getting so many copies of the same thing in my search results?

   – Lost in L.A.

Dear Lost,
There’s likely more than one explanation for what you report. We’ll start with the most common: there are multiple editions of the work coming up in your search results. The first clue here is the different publication dates listed: 1910, 1911, 1905, etc. In HathiTrust, different publication years are considered different editions and are not clustered together. Similarly, works from different publishers, even if they are the same title, are considered to be different editions and are clustered separately. Now, for your search, you may have spotted that there are two results for this title dated 1911. If you’ve looked at both records using the MARC display feature in HathiTrust, you’ll have seen that both are published by Duffield, in New York. So why are they not clustered together? The answer here lies with the OCLC numbers, an extremely common identifier found in most bibliographic records. These two records that you looked at for the Duffield 1911 publication of The winged destiny have different OCLC numbers. So what is the explanation for this? The most likely one is that two different people at different libraries cataloged this particular title and edition in OCLC - in other words, creating duplicate versions of the record in the vast OCLC bibliographic database. Many libraries, including those who contribute to HathiTrust, use OCLC as a source for their bibliographic records, to download into their own local systems, and from there they export to HathiTrust. This is most probably how the different versions of the 1911 Duffield edition of The winged destiny ended up in HathiTrust: the duplicates were submitted by contributing members.


Dear Zephir,
I was looking for some material on missions in Turkey for my thesis and was thrilled to find this title in the HathiTrust catalog, Leavening the Levant (https://catalog.hathitrust.org/Record/006574475). Even more exciting, there were five copies, submitted by four different institutions, and all open to full view. But when I looked at the volume from the Getty Research Institute, it was for a completely different work – something in French, maybe on an art topic? How do you explain this?

    – Suspicious Scholar

Dear Suspicious,
Oh dear, you’ve put your finger on a problem that sometimes crops up in the HathiTrust catalog. Here’s what’s happened. The Zephir system clusters, or groups, records together using OCLC numbers or system ID numbers. Four of the records in this cluster have the OCLC number assigned to the title Leavening the Levant. The fifth record is for the title Nomenclature des gravures sur bois, eaux-fortes et lithographies exécutées à ce jour par J.-E. Laboureur, and it has the unique OCLC number assigned to that particular title. Unfortunately and mysteriously, this bibliographic record also has the OCLC number from the Leavening records; that’s what is causing this French title to be incorrectly clustered with the records for Levant. The Zephir system is simply grouping the records using the commonly-held OCLC number. In best cataloging practices, a metadata record should only have one, primary OCLC number, but for a variety of reasons both historical and unknown, records sometimes have more than one. Zephir does not prevent such records from being ingested; as a result we sometimes have such incorrectly grouped records.