RDA and transforming to a new bibliographic framework

I haven’t had the opportunity to work much with RDA records yet, however I’ve been following some e-mail lists, blogs, and other commentaries where people have been discussing there experiences with it. The Library of Congress , the National Library of Medicine (NLM), and the National Agricultural Library (NAL) organized testing to evaluate whether or not they will implement RDA.

Out of this testing experience (which is still being analyzed), the Library of Congress issued “Transforming our Bibliographic Framework: A Statement from the Library of Congress” on May 13. According to the statement, “Spontaneous comments from participants in the US RDA Test show that a broad cross-section of the community feels budgetary pressures but nevertheless considers it necessary to replace MARC 21 in order to reap the full benefit of new and emerging content standards.” Therefore, Library of Congress is going to investigate, among other things, replacing MARC 21.

From what I have heard of the RDA testing, I think this makes sense. The general feel I get is that RDA by its self is not enough of a change to make libraries expend the resources necessary to implement it. Sure there are some improvements over AACR2, but there are also many things I read that are not improvements. This is especially true if you agree with the Taiga Forum 6’s 2011 Provocative Statement #2 that libraries will need to participate in radical cooperation. RDA offers a bit too much flexibility to insure that bibliographic records created by one library will fit well for other libraries. For example, the Rule of 3 is gone which on the cover is an improvement since it allows for more then 3 authors to be included as main or added entry. However, as discussions on the RDA-L list, it requires only the first author and illustrators of children’s books as author main or added entry. Local choices are great if you are only working for the local and not “radically cooperating.”

I won’t go through the list of complaints (and, to be fair, some complements) of RDA I’ve seen, as you can find them yourselves. I think my takeaway though is RDA on top of our existing bibliographic infrastructure is probably not going to make a monumental improvement for our patrons while at the same time it will be costly to implement (especially retroactively). RDA might be better than AACR2, but is it better enough that migrating to it is worth the time and costs? I am not so sure. Maybe simple changes to AACR2 would be just as good and more practical?

Some people I talk to think moving to RDA is a necessary first step that will make more significant or radical changes easier in the future. I, however, have a underlying fear that if libraries implement RDA in the current environment they will be stuck with it for a long time and it will actually make it harder to implement something different in the future. I hope the others are right and I am wrong since I believe in the short to medium term, RDA will be implemented on top of our existing bibliographic infrastructure – for better or worse.

If we replace our underlying bibliographic infrastructure with something else and change to RDA, say maybe something based on RDF or some other standard model for data interchange, we might actually get a significant change that will help expose our bibliographic data to the greater world of linked data while at the same time making it easier for libraries to take advantage of linked data.

One thing that the Library of Congress needs to take account in this process is the economic realities of implementing something new. I don’t see this specifically mentioned in the issues they plan on addressing. I assume that it will be part of the underlying discussions, but I would like to see it more prominently mentioned. Part of this is also involving vendors as well as open source developers of systems such as Evergreen and Koha. If LoC makes a change, it will effect libraries throughout the US (and probably the world). If the systems libraries use can’t function withing this new bibliographic framework, it will be a difficult and extremely expensive transition.

I think this is something librarians, especially those in systems and cataloging, should follow closely. I know I will be doing so.

MARC is better than Dublin Core

During my presentation on Digital Preservation: Context & Content (slides) at ELAG 2011 last week, I made the statement that MARC is better than Dublin Core. This may have been a bit of proactive statement but I thought it was relevant to my presentation and the conference in general. I felt that someone had to say it, especially since there was a whole workshop on MARC Must Die and a number of other presentations were gleefully awaiting the day we were done with MARC. For example, with a great deal of support from the audience, Anders Söderbäck said “we the participants of #elag2011 hold these truths to be self-evident, that MARC must die…”.

Probably not surprisingly my statement sent off a mini-barrage of messages on the conference Twitter feed. Since the conference was almost over (my presentation was the second to last) and it wasn’t the core to what I was talking about, I didn’t have time to explain/expand on my position. I know that some of the people that responded to my statement on Twitter were not at the conference and at least a few I am pretty sure weren’t watching the live stream. Because of this, I wanted to take this time to put the statement in context and explain why I said I think MARC is better than Dublin Core. I understand people may not agree with me and this post won’t change that, but that doesn’t mean I need to agree with the band-wagon that wants to kill something that has been pretty successful for the last 40 or so years.

Before going any further since I’m not sure it was clear to everyone commenting on Twitter, I should point out that by MARC I mean MARC 21+AACR2 (which is the common usage of the term in the USA), but I imagine the same statements would likely apply to any version of MARC + what ever set of rules you want to apply, Similarly, by Dublin Core, I mean the Simple and/or Qualified Dublin Core Metadata along with the Dublin Core Metadata Element (DCMES) format (i.e. descriptive fields). I know that there are other aspects of the Dublin Core Metadata Initiative, but for the purposes of this discussion I don’t believe they are germane [1]. I am focusing on how Dublin Core can be used to describe objects. After all, that is why librarians use metadata – to describe things. No matter how easy it is for machines (or humans) to parse a metadata record, it would not be very useful if the standard does not make it possible to adequately describe, in a consistent way, whatever it is that one is trying to describe. I should also also point out, that while I love theory and research, in this case I am mostly concerned with the practical.

The statement came out of my experiences thus far with using Dublin Core for digital preservation at Binghamton University. Before we started on this, I was familiar with Dublin Core but never really had to work closely with it on a large scale so I didn’t have a strong opinion of it. I am not a cataloger, but as a systems librarian, I feel it is necessary to follow developments in cataloging and I am also having to work with MARC records on a fairly regular basis. Thus, I realize that MARC has its issues but please don’t kill it until we have something better and at this point, I don’t believe we do. [2]

In short, my problem with Dublin Core is that it does not allow for the granularity and consistency that I believe is necessary to adequately describe a mixed set of objects for long term preservation and access. Mixed sets is important here, if you are doing a long term preservation project that includes a disperse set of objects, I believe it is important that there is some consistency across collections. This is especially true if they are going to be managed or searched together. Librarians often comment on the need to break down silos or at least tie them together for discovery. The metadata needs to be adequate to do this. Maybe if you are a national library you can have multiple digital preservation solutions, but at a mid-sized university library that approach is problematic and most-likely not realistic. This is doubly so if you consider that one of the main components of preservation is ensuring access in the future (i.e. you are not talking about a dark archive). This is not really a new or unique criticism but I think it is often overlooked and/or too easily dismissed. Even one of the people who objected to my saying MARC was better than Dublin Core, Corey Harper, admitted this was a valid criticism in his article, “Dublin Core Metadata Initiative: Beyond the Element Set” published in the Winter 2010 issue of Information Standards Quarterly .

A couple of tweeters brought up DCAP (Dublin Core Application Profiles) which in theory could be used to allow for the use of additional (or alternative) metadata fields to address some of my issues with how well Dublin Core describes particular objects. However, as Corey Harper mentioned in a tweet, “I understand that DCAP infrastructure lacking, but…” (ellipsis in the original). But the “but” is not something that can be ignored. If the infrastructure isn’t there, it is a big issue – practice over theory. Even if the infrastructure wasn’t lacking, I am not sure how well it would address my criticisms. Even without DCAP I can add local qualifiers or elements for my application (and have in fact, done so), but as Dublin Core Metadata Initiative warns, “Nevertheless, designers should employ additional qualifiers with both caution and the understanding that interoperability could suffer as a result.” I don’t see how the use of multiple DCAPs would not end up leading to similar interoperability issues and result in a “Least Common Denominator” situation on the discovery end of things. Without discovery, you don’t have access, and without access you don’t have preservation.

Lastly, Michael Giarlo asked “But then is anyone actually putting DCMES up against MARC? Seems a category error to me.” I don’t think it is a category error at all. Both are metadata formats/standards that libraries are using to describe objects in their collections. Perhaps one might argue the category is overly broad category, but I think they are obviously in the same category. Comparing the two is only natural and is in fact, I think quite useful. DCMES may be easier to teach and for computer programmers to program, but in my experience it is nowhere near as useful when it comes to actually describing at an item – which as I said earlier is the goal in the first place. Maybe some technologists value interoperability over description, but I am not ready to go there. We need something better, not something different.

As I said earlier in the post, I doubt this will change anyones mind, but hopefully it explains why I said that MARC is better than Dublin Core.

[1] Truthfully I am a bit confused why this was an issue on Twitter. Wikipedia and even the official “Using Dublin Core” document Diane Hillmann created for DCMI just use the term “Dublin Core” to describe the metadata standard so this is pretty common usage.

[2] I do not mean to imply that anyone is making the argument that Dublin Core should completely replace MARC, but the MARC must die contingent is relevant to this particular discussion of MARC versus Dublin Core. At some point maybe I’ll make a post about some of the more complete alternatives to MARC being discussed.

PaperSprint to be held in Paris

On the Association of Internet Researchers e-mail list there was an interesting announcement about a PaperSprint to be held at Fabelier in Paris, France.

Some of you that follow this blog may know what a Code Sprint is. For those that don’t, a Code Sprint is according to Wikipedia, “A sprint is a time-boxed period of software development focused on a given list of goals (but with variable scope). Sprints have become popular events among some Open Source projects.

A PaperSprint, or at least how Fabelier is describing it, is the same basic idea, except the end result is not code, but instead is an academic paper. The goal of this particular PaperSprint “is to explore the possibility of writing from scratch a (reasonably good) research paper within 4 hours” from start to finish. This sounds like an exciting experiment and if it was closer, I’d try to participate. It will be really interesting to see how it turns out. If the results are positive, maybe it is something to try in the library and information science environment. Anyone interested?

University of Florida MARC Records avilable as CC0

Eric Lease Morgan and Peter E. Murray brought my attention to a new MARC record licensing method for MARC records implemented by the University of Florida. In short, the University of Florida is making their original MARC records available under the Creative Commons CC0 license which basically is a “tool for freeing your own work of copyright restrictions around the world. You may use this tool even if your work is free of copyright in some jurisdictions, if you want to want to ensure it is free everywhere.” As Peter points out, this follows the University of Michigan which has made there records available using CC0 as well.

This is an interesting development. Peter Murry ponders if this is “redundant since some think that MARC records, as a recitation of facts, cannot be copyrighted anyways?” I am not sure what other countries copyright laws are, and while I believe the copyright-able-ness of MARC records in the US is dubious at best, even if it is redundant around the globe, it is good to make it clear.

University of Michigan makes there records available for download. At this point, I think the University of Florida only marks them in the catalog. It would be nice to see a downloadable version of Florida’s records as well but I applaud this move with or without a nice zip file.

Is your research data server secure?

There is an article on Inside Higher Ed about a professor at University of North Carolina getting demoted after a breach of security of a server she was using to store her research data on. The data included 114,000 Social Security Numbers. Apparently the University was going to fire her but a “faculty hearings committee” persuaded the University to demote her to associate professor and cut her pay in half instead.

The article doesn’t really go into enough details for me to comment to what degree she was at fault she, but considering the outcome, it seems that she must have been at fault at some level, and in fact the faculty review board “did not dispute was that [the researcher] was accountable for the breach according to existing university policy.” But based on this article, it seems that dismissal or even the final outcome of keeping her tenure but cutting her pay in half is a pretty drastic penalty unless she callously disregard the security of the records or if she was collecting data she shouldn’t have been – neither of which it seems applies in this case.

However, it does bring up some interesting questions.

The researcher said she “did everything I knew to do, but I did not know how to secure a machine.” I am sure that is so, but if she is collecting this data she should know that she is taking responsibility for it and if she doesn’t know how to “secure a machine”, why did she put it on a machine connected to the Internet?

The report partially addresses the issue of her not knowing how to secure a server by saying that the researcher hires a “university software programmer” to maintain the server but that person wasn’t certified. Personally, I don’t think certification means much so I wouldn’t fault the researcher for hiring someone without certification. That said, the fact that the person was a programmer does not necessarily mean they know about securing a server. I know some programmers that are great sysadmins and I know some that couldn’t administer a system to save their life. Without knowing the person’s background it is hard to say if the person was or was not qualified. Honestly, I think there is probably more to this part of the story.

The defenders in the article point to “systemic institutional failure” but that is a dangerous slippery slope. The only thing coming close to systematical failure I can see might be the “system” allowing researchers and research labs to run their own servers or possibly approving the collection of the data in the first place. Do these supporters really want central IT controlling and locking down everything and/or approving what type of research data they can gather? I doubt it. However, with this freedom comes responsibility and accountability. This is not “Systematic Failure.” Assuming the server wasn’t properly secured there was a failure, but it was not “systematic” (if the server was properly secured, which sounds unlikely reading this article, then there was no human “failure” but either a technical one or “just” a crime).

The article points out that the researcher constantly gave the person she hired an “excellent” rating a systems administrator (although by her own acknowledgement that she does not know how to secure a server, I am not sure how she could evaluate that – or for that matter what that means in UNC’s context). One thing that any manager needs to take into consideration when evaluating a person is that those evaluations may mean something down the road. The article makes no such mention of what happened to the programmer.

The comments to the article are quite interesting. There are a number of responses on both sides. One of the most pertinent comments is by someone posting as Lala. Lala rote:

Granted that UNC overreacted — dismissal is a bit too much in this case — but why would anyone put Social Security numbers in a dataset that is stored online? This is Data Security 101 stuff. The researcher certainly wasn’t qualified to judge the security of the system’s firewalls but she certainly was aware that she was storing easily-identifiable data in a potentially risky location.

I think this is a very good question. Unless you are absolutely sure you are doing everything you can to protect the data (which the researcher apparently wasn’t based on her own admission she didn’t know about computer security and the faculty review boards conclusion) it seems like a very risky proposition to even collect the data in the first place, let alone put it on the Internet. Unfortunately Social Security Numbers have been used as a unique identifier for years, but I know many institutions (and specifically many libraries) have gone to great lengths to not have that information any more. Without being knowledgeable about thus particular research I can’t say why the researcher needed the SSN or if she considered using some other identifier. This is really what I think is the crux of the matter. Even well-secured servers can be hacked (ask Google , the United Sates Department of Defense, or Iran. The best way to make sure that private data such as Social Security Numbers aren’t hacked is to not have them in the first place.

iPads in the University Classroom

The Wired Campus section of the Chronicle of Higher Education has a brief column about the use of iPads instead of textbooks in one class at Notre Dame. Really, there is not enough information in the article to say if tablets like iPads will be a bad or good replacement in the long term (i.e. no control group reported on, maybe it is just this professors style that works, etc.). However, it does seem i was successful and with proper apps (like ones that can annotate pdfs) it is a viable option for course materials. However judging from this paragraph

And when it came time for their computer-based final exam, 39 of the 40 students in class put away their iPads in favor of a laptop.

iPads don’t replace a laptop, so why not just deliver materials to laptops? Still, that said, it is good to see higher education looking into these new tools. We just have to remember that while some things are new ans shiny, they might not actually be better for a particular purpose so we need to keep a critical eye when considering/evaluating at new technology.

U.S. Academic Libraries switching to Koha in 2010

One of the things that interested me in Marshall Breeding’s “ILS Turnover Reverse report from Library Technology Guides” was what libraries were switching to Koha. In particular I was interested in which academic libraries have switched to Koha in 2010. As commentators in my earlier blog post about my “Thoughts on Library Technology Guides’ ILS Turnover Report” there are some questions about the data. In my opinion, most of the questions – at least those about numbers – are more problematic outside of the United States and a few other countries. For some of the reasons behind this, see Marshall Breeding’s comment on my last blog post about this report where he discusses how he gathers the data used in this report. For that reason, I decide to limit this post to U.S. Academic Libraries that switched to Koha in 2010.

According to my count [1] 15 U.S. academic libraries switched to Koha from another ILS and one more, a trade school named Antonelli College, went from no ILS to using Koha. I was interested in looking at was the profiles of the schools, and in particular the number of volumes [2], and the type and size of patrons served. I was also interested in looking at what libraries are listed as being independently supported. To a lesser degree I wanted to see if there was anything particularly interesting in who academic libraries were choosing to acquire Koha support from.

All of the U.S. academic libraries switching to Koha have less then 140,000 volumes (at least as far as I can tell) [3]. The two largest are in the New York City metro area and are getting support from Liblime. It is possible (likely?) that they are using Koha via WALDO consortium which has a partnership with PTFS/LibLime. The only other U.S. Academic library to switch in 2010 that has more then 100,000 volumes is D’Youville College. D’Youville is listed as independent, however the demo video on their Website of their new catalog shows that they are hosted via PTFS/LibLime as well and may possibly also be contracting through WALDO [4]. In other words, the larger U.S. academic libraries that moved to Koha in 2010 are doing so via LibLime/PTFS and I am pretty sure they are using “Liblime Enterprise Koha” and not the Open Source version. According to the Liblime Website, Liblime Enterprise Koha enhancements include many acquisitions enhancements and enhanced authority control. Here I need to plead ignorance of recent Koha developments in this area and of how “enhanced” Liblime Enterprise Koha really is in these areas, but from previous experience, these were areas of that I am under the impression that the Open Source version needs some development to attract larger academic libraries [5]. Many libraries still do not use acquisitions within the ILS or make extensive use of authority records (any use?). So these are not always a high priority when selecting an ILS in smaller libraries. However when you start getting closer to medium sized academic libraries they become more of an issue. In other words, I am not surprised that the U.S. academic libraries that are switching to Koha are small academic libraries, and that the larger ones that are migrating are switching to Liblime Enterprise Koha. Although the largest of the bunch selected Liblime, ByWater did attract some schools with volume counts that were not much smaller. Goddard College, for example, has 97,000 volumes and two others have about 75,000 records.

Besides D’Youville, the other library that is listed as independent is University of Science and Arts of Oklahoma. This is the third largest in terms of volume counts to make the switch on 2010. I wanted to check there catalog to see what it looked liked, but it is currently unavailable. If they truly are independent, it would be interesting to here about there experiences migrating to Koha.

The academic libraries that migrated serve a diverse type of schools. There is trade schools, 2 year community colleges, 4 year schools, graduate schools, and seminaries. Therefore it doesn’t look like the type of college or university being served is a factor for those who have selected Koha.

Of the schools that switched to Koha, 4 were using Koha, 3 Unicorn, 2 Horizon. Single schools had EOS.Webm Vurtua, Winnebego Spectrum, Athena, and Millennium.

[1] Defining what is an academic library can be tricky sometimes. While it is easy to say Binghamton University Libraries, for example, is an academic library, there are places that fall into the gray area like trade schools, advanced research institutes, etc. Also, if a school is based in the United States, but the library is in London as part of a undergraduate program, is it a U.S. academic library (FWIW: In this case I said no). So, you might count more or less libraries than I. However, for purposes of this inquiry, I don’t think it is a factor since the ones I didn’t include really weren’t “outliers” in terms of size or scope.

[2] I used a variety methods to get volume counts. Mostly though, I looked at what the libraries self-reported wither on Library Technology Reports or somewhere else

[3] There was one larger academic library to make the switch in the United Kingdom. Staffordshire University has approximately 180,000 volumes and switched to Koha with support from PTFS-Europe.

[4] This demonstrates some of the concerns members of the Koha community have with whether or not the self-reporting of Koha service providers is accurate

[5] As I mentioned in the past I do support a Koha install for a small collection (> 1000 records). I did look at some of these issues briefly while installing Koha and migrating items to the new install. I didn’t notice anything that made me think these features are not still lacking compared to their proprietary counterparts, but I did not look closely, so I may be wrong and I welcome any information that shows me they can do the same things, as streamlined, as something like Millennium, Voyager, or Aleph.

Academic Search Engine Spamming

Jonathan Rochkind had a really interest blog post commenting on a recent article about “Academic Search Engine Spam and Google Scholar’s Resilience Against it” published by Joeran Beel and Bela Gipp in Journal of Electronic Publishing. The article (and Rochkind’s blog post) discuss how scholars could manipulate citation counts and visibility in Web-based academic search engines like Google Scholar. It is unclear what the risk-reward factor for this would be, but if it can be done, I am sure at least a few scholars will try to do it. However, it is also true as Beel and Gipp point out that citation gaming is not at all new. Some publishers and journals actively encourage people to cite from there journal(s), and there are citation circles and of course self-citing.

I am not really sure how much we should be worried about this, at least how much we should worry about it MORE than we do the whole idea of using dubious measures such as citation counts to account for promotion and tenure decisions to begin with. As Rochkind sums it up:

Once you start to look too carefully, the whole academic publishing endeavor can start to seem like a somewhat arbitrary game played by agreed upon rules in order to justify tenure decisions, rather than attempt to share knowledge with ones peers or the world or in general. In this light though, the possibility of gaming Google Scholar is perhaps less alarming, as it’s really just business as usual.

Happy reading.

Thoughts on Library Technology Guides’ ILS Turnover Report

Marshall Breeding published his “ILS Turnover Reverse report from Library Technology Guides” that lists what ILS products were replaced by libraries in 2010. I am not sure what you can gather form these stats, but still they are interested to look at. There are a few things to keep in mind when looking at this report (and when looking at Library Technology Guides in general):

  1. A lot of the information is self-reported.
  2. Switch dates are based on contract signings and not implementation so sometimes a library may have switched in 2010 but signed in 2009, likewise they may be reported as a 2010 switch but did not switch yet.
  3. Although Marshall Breeding tries to make this list as worldly as possible, it still has a heavy slant on English language libraries, and more specifically on ones located in the United Sates.
  4. Consortium are funny things when it cums to these states. Even one consortium changing to a different vendor can really effect the counts, even if it is just one contract switch.

Some things I found interesting:

  • 214 libraries migrated from the various SyrisDynix Systems listed (Horizon (119), Unicorn (77), Dynix(18), Symphony (0)). Of them only 34 migrated to SyrisDynix’s new system, Symphony (Horizon (25), Unicorn (0), Dynix(9)). All-in-all, 46 libraries migrated to Symphony in 2010. That is a net loss of 168 Libraries. On the surface that does not look like good news for SirsiDynix. Of course, if the 12 new customers are larger, it might not be all bad, but still it is hard to see this as anything but SirsiDynix having not done well this past year in the ILS marketplace.
  • 20 libraries are already listing Ex Libris’s next-next generation ILS – Unified Resource Management, as there new ILS even though it is still in the early stages of development. All of them were already Ex Libris customers (Aleph (18), Voyager (2)). They are also all in Australia.
  • As those of you who follow Koha, an Open Source ILS, are probably aware, there has been some controversy involving LibLime and the company (PTFS) that bought them during the last year or so Without rehashing it, lets just say many members of the Koha community (especially those involved with development), didn’t see eye-to-eye with PTFS on a variety of issues. Because of this, I was wondering if anything would show up in ILS provider switching. PTFS (I am counting PTFS, PTFS-Europe, and LibLime together in this case although that may or may not be fair – I don’t really know if there is a difference in support, etc. from the various listings) lost 16 customers. 13 of them switched from PTFS to Evergreen (many of them in what appears to be a consortia move), one switched to ByWater for Koha support, one stayed with Koha but is now running it independently, and another one switched to Horizon. Based on these numbers, I would say the controversy has not lead to librarians choosing another Koha provider, at least not yet. Off course, maybe librarians would like to move but can’t just yet because of contract issues, so maybe any migration would be more of a lagging-indicator of dis-satisfaction.
  • Talking about ByWater, 13 libraries reportedly switched from Koha-Independent to Koha via ByWater. I am not sure if this is an actually switch in service providers, or if maybe this is a switch in reporting.
  • 139 libraries switched to the Open Source Evergreen ILS (Zero switched from Evergreen to something else). That seems to be good news for the future of Evergreen.

What does this all mean? Probably nothing, especially without looking closer at the individual circumstances, but still it is interesting to look at. I have been looking a little more deeply into some of the libraries that switched to Koha from another ILS. That will be a subject of a future post.

Mobile Library Services?

Lukas Koster over at Commonplace.net has an interesting post asking “Do we need mobile library services?” His answer is “Not really.” It is very interesting take on mobile applications for libraries. The only application that has received significant use form his library’s mobile site is one that shows which desktop computers are free. I am not surprised by this. I never really saw the reason behind such the hype for mobile in libraries. Sure, having a Web page that is readable and has hours or related information is nice, but beyond that I was, and remain, skeptical. Still maybe Lukas, I, and others such as Aaron Tay that are not sure about the hype are in the minority considering the success of the Handheld Librarian Conference (The 4th conference is happening in about 2 months).

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »