A Misconception about OAI-PMH Metadata Formats

OAI-PMH stands for Open Archives Initiative Protocol for Metadata Harvesting. In brief, OAI-PMH is a protocol that enables publishing and harvesting of metadata about objects (such as PDFs or JPGs) in an archive.  In my experience, I have found that one aspect of this protocol is often misunderstood.

About OAI-PMH

OAI-PMH defines six different commands (called verbs). Three of these commands are used to publish 1) information about the archive itself, 2) the metadata formats in which data is published, and 3) the sets of data available in the archive. The remaining three commands are used for the actual publishing/harvesting of data: one command publishes a list of record identifiers, one publishes a list of metadata records, and one publishes a single metadata record. By supplying the appropriate arguments to the commands, it is possible to track changes to an archive since a certain date or between two dates.

Much more information about OAI-PHM can be found at http://www.openarchives.org/pmh/.

The Misconception

Having worked with OAI-PMH quite a bit the last few years, one thing that has consistently surprised me is a significant misconception about the protocol. Namely, more than once I have encountered the belief that the OAI-PMH protocol supports only the Dublin Core metadata format. This is simply not true.

It IS true that the OAI-PMH specification requires metadata to be published in Dublin Core, but it does NOT dictate that metadata be published ONLY in the Dublin Core format.

Specifically, the OAI-PMH specification states "…OAI-PMH supports items with multiple manifestations (formats) of metadata. At a minimum, repositories must be able to return records with metadata expressed in the Dublin Core format, without any qualification. Optionally, a repository may also disseminate other formats of metadata."

The Reality

If one browses the web and examines the various archives publishing with the OAI-PMH protocol, it quickly becomes clear that many archives publish in Dublin Core and also in one or more additional metadata formats. In almost every case, these additional formats provide much richer metadata than is possible with Dublin Core (formats like RDF, METS, and MODS are not uncommon). Here are some examples:

Rice University Digital Scholarship – Dublin Core, RDF, METS, two others
Gateway to Oklahoma History – Dublin Core, RDF, one other
Smithsonian Digital Repository – Dublin Core, Qualified Dublin Core, MODS
Pensoft Publishers – Dublin Core, MODS
PubMed Central – Dublin Core, two formats derived from NLM DTDs created for journal metadata exchange

This, I think, is the point… Dublin Core exists as the lowest common denominator for metadata exchange with OAI-PMH, but most archives should (and do) provide something richer.

Conclusion

The support for multiple metadata formats is the distinction that some potential users and adopters of OAI-PHM miss. I have seen archives proclaim that they support OAI-PMH, only to find that they support only the bare minimum (i.e. only Dublin Core). To them, I say "that’s nice, but given the limitations of Dublin Core, please add support for a richer metadata format". And, I have seen users dismiss OAI-PHM out of hand, complaining about its limited usefulness due to mangled Dublin Core metadata. To them, I say "look closer, and if the archives with which you are working truly only support Dublin Core, demand more".

OAI-PMH metadata formats… more than just Dublin Core.

Advertisements