A Misconception about OAI-PMH Metadata Formats

OAI-PMH stands for Open Archives Initiative Protocol for Metadata Harvesting. In brief, OAI-PMH is a protocol that enables publishing and harvesting of metadata about objects (such as PDFs or JPGs) in an archive.  In my experience, I have found that one aspect of this protocol is often misunderstood.

About OAI-PMH

OAI-PMH defines six different commands (called verbs). Three of these commands are used to publish 1) information about the archive itself, 2) the metadata formats in which data is published, and 3) the sets of data available in the archive. The remaining three commands are used for the actual publishing/harvesting of data: one command publishes a list of record identifiers, one publishes a list of metadata records, and one publishes a single metadata record. By supplying the appropriate arguments to the commands, it is possible to track changes to an archive since a certain date or between two dates.

Much more information about OAI-PHM can be found at http://www.openarchives.org/pmh/.

The Misconception

Having worked with OAI-PMH quite a bit the last few years, one thing that has consistently surprised me is a significant misconception about the protocol. Namely, more than once I have encountered the belief that the OAI-PMH protocol supports only the Dublin Core metadata format. This is simply not true.

It IS true that the OAI-PMH specification requires metadata to be published in Dublin Core, but it does NOT dictate that metadata be published ONLY in the Dublin Core format.

Specifically, the OAI-PMH specification states "…OAI-PMH supports items with multiple manifestations (formats) of metadata. At a minimum, repositories must be able to return records with metadata expressed in the Dublin Core format, without any qualification. Optionally, a repository may also disseminate other formats of metadata."

The Reality

If one browses the web and examines the various archives publishing with the OAI-PMH protocol, it quickly becomes clear that many archives publish in Dublin Core and also in one or more additional metadata formats. In almost every case, these additional formats provide much richer metadata than is possible with Dublin Core (formats like RDF, METS, and MODS are not uncommon). Here are some examples:

Rice University Digital Scholarship – Dublin Core, RDF, METS, two others
Gateway to Oklahoma History – Dublin Core, RDF, one other
Smithsonian Digital Repository – Dublin Core, Qualified Dublin Core, MODS
Pensoft Publishers – Dublin Core, MODS
PubMed Central – Dublin Core, two formats derived from NLM DTDs created for journal metadata exchange

This, I think, is the point… Dublin Core exists as the lowest common denominator for metadata exchange with OAI-PMH, but most archives should (and do) provide something richer.

Conclusion

The support for multiple metadata formats is the distinction that some potential users and adopters of OAI-PHM miss. I have seen archives proclaim that they support OAI-PMH, only to find that they support only the bare minimum (i.e. only Dublin Core). To them, I say "that’s nice, but given the limitations of Dublin Core, please add support for a richer metadata format". And, I have seen users dismiss OAI-PHM out of hand, complaining about its limited usefulness due to mangled Dublin Core metadata. To them, I say "look closer, and if the archives with which you are working truly only support Dublin Core, demand more".

OAI-PMH metadata formats… more than just Dublin Core.

Open Letter to Microsoft Regarding the Skype Acquisition

When I started this blog, I decided that I’d blog about technologies, tools, and gadgets, but would try to avoid straight opinion pieces.  Well, I’m going to break that rule.

Like many others, I was surprised by the recent announcement that Microsoft is acquiring Skype.

I won’t pretend to understand the business and technical strategies that drove this deal, or what the implications might be.  Check out the usual tech news outlets and you’ll find that there are plenty of others doing just that.  The possibilities of the deal are exciting, and I look forward to witnessing the outcome.  But I’m not going to try and guess the outcome in advance..

A recent post to Twitter reads “Wonder what the #MS acquisition of #Skype means for its cross-platform availability.”  That got me to thinking.  Currently, Skype clients are available on many platforms, including Windows, OSX, Linux, iPhone, Android-based phones, and Symbian phones.  There are even Skype-enabled televisions.  Does an acquisition by Microsoft put this broad platform reach in jeopardy?

I work on a project that includes partners spread across the United States, England, Germany, Austria, France, Eqypt, China, Australia, Brazil, and probably a few that I’ve forgotten.  These partners use a huge variety of technologies and platforms.  Off the top of my head, I can think of Windows, OSX, and Linux operating systems.  MySql, Microsoft SQL Server, and PostgreSQL database servers.  IIS, Apache, and NGINX web servers.  Programming languages in use include C#, Java, and PHP (I’m sure there are others).  Additional technologies in use include Drupal, Fedora Commons, Gluster, and many more.  With this number of partners and technologies spread across the world, it should be no surprise that there is no consensus on what are the “best” tools.  Each partner uses the tools that work best for them.

Similarly, each partner institution (and in some cases each individual person), initially had different preferences for instant communication tools.  Some preferred Windows Live Messenger, some liked iChat.  There were Google Chat advocates, and even some Yahoo! Messenger users.  Some of these tools are single-platform, and many do not talk to one another. 

So how did we ultimately find a way to communicate with one another?  Skype.  The integration of voice, video, and chat was compelling.  The ability to call someone in another country for free was significant.  And, the variety of platforms supported allowed ALL of our partners to use the tool, regardless of their preferred computing platform.

Skype has become an invaluable tool.  We’ve come to rely on it so much that we were negatively affected earlier this year when Skype suffered a major outage.

So, Microsoft, I’m sure you see the various platforms and partners as an opportunity to sell more customers on the Windows platform.  Nothing wrong with that; it’s your job to find opportunities to push your products.  Except the opportunity you imagine doesn’t exist.  This is NOT an opportunity to push Windows.  No partner is going to switch platforms simply to use an instant communications tool, especially when other options exist, even if those options are technically inferior.

We can’t be the only existing Skype users in this situation.  Please, Microsoft, keep supporting ALL of the platforms on which Skype exists.  I was at the MIX conference where the first version of Silverlight was announced, and I remember Scott Guthrie saying “this was HARD” when Silverlight 1.0 running on OSX was demonstrated.  So cross-platform is hard, but you did it.  Do it again.

If cross-platform availability is easier to achieve by dropping the platform-specific client applications in favor of a slick web-based cross-browser cross-platform Skype application, so be it.  Actually, that would be great.  Just make sure it really IS cross-platform and cross-browser.  Not Silverlight-based, and none of this “HTML5 runs best in Internet Explorer on Windows” silliness.  Ensure that Skype continues to work the same everywhere that it does now.

C’mon Microsoft, you can do it.  Skype is a fantastic tool.  Keep it that way, and keep it cross-platform.