New Book: Digital Preservation for Libraries, Archives, and Museums

A few weeks ago a new book I co-authored with Heather Lea Moulaison was published by Rowman and Littlefield. The book is titled Digital Preservation for Libraries, Archives, and Museums. Initial reaction has been extremely positive. It is available through all of the major book sellers such as Amazon where at one point it was #7 in one of its categories! If you interested in digital preservation, please consider purchasing the book or borrowing it from your local library. Below is the publisher’s description of the book:

Digital Preservation in Libraries, Archives, and Museums represents a new approach to getting started with digital preservation: that of what cultural heritage professionals need to know as they begin their work. For administrators and practitioners alike, the information in this book is presented readably, focusing on management issues and best practices. Although this book addresses technology, it is not solely focused on technology. After all, technology changes and digital preservation is aimed for the long term. This is not a how-to book giving step-by-step processes for certain materials in a given kind of system. Instead, it addresses a broad group of resources that could be housed in any number of digital preservation systems. Finally, this book is about “things (not technology; not how-to; not theory) I wish I knew before I got started.”

Digital preservation is concerned with the life cycle of the digital object in a robust and all-inclusive way. Many Europeans and some North Americans may refer to digital curation to mean the same thing, taking digital preservation to be the very limited steps and processes needed to insure access over the long term. The authors take digital preservation in the broadest sense of the term: looking at all aspects of curating and preserving digital content for long term access.
The book is divided into four parts based on the Digital Preservation Triad:

  1. Situating Digital Preservation,
  2. Management Aspects,
  3. Technology Aspects, and
  4. Content-Related Aspects.

The book includes a foreword by Michael Lesk, eminent scholar and forerunner in digital librarianship and preservation. The book features an appendix providing additional information and resources for digital preservationists. Finally, there is a glossary to support a clear understanding of the terms presented in the book.

Digital Preservation will answer questions that you might not have even known you had, leading to more successful digital preservation initiatives.

Checking if xml is well formed using xmllint

I had to check if an xml document was well formed today. I have a few nice graphical tools that can do this, but I wanted to do it in a script on my linux box. I found out that xmllint does a good job of this. The command I used is:

xmllint -noout input.xml

If the XML document is not well-formed errors well go to standard out and then I can figure out how to correct the cause of the errors.

Symlinking MySQL data files

Due to space issues, I had to move my MySQL data files (/var/lib/mysql) on a Ubuntu box to another file system. I did that and created a symlink, but MySQL would not start. It turns out to be an easy fix. I found a post on MySQL Forums from someone who had a similar problem, and someone named Richard Guy posted my solution:

Did you fix the apparmor config file for mysqld and restart apparmor?

first you need to edit /etc/apparmor.d/usr.sbin.mysqld and add the new fully qualified (ie, no symbolic link) path(s). [you may want to leave the original /var/lib/mysql entries intact :-) ]

then “restart” apparmor:

sudo invoke-rc.d apparmor reload

for more info, see

I edited the file as appropriate, and MySQL started fine and now I don’t have to worry about my disk space filling up again.

Is that annonymous e-mail annonymous?

A friend on facebook posted a link the other day to an article about University of Illinois President Michael Hogan’s chief of staff resigning after an anonymous e-mail was sent to the University Senates Conference from a Yahoo! e-mail account. I don’t know much about what is happening at the University of Illinois but I was intrigued about the attempt at anonymous e-mail.

The article stated that a computer science professor, Roy Campbell, was able to determine that the emails may have been sent by someone in the president’s office. The initial article I read didn’t say how the computer science professor figured that out so I thought he might have looked at the e-mail headers. I did some checking with e-mails sent to my personal e-mail account from people with Yahoo! addresses and found that, indeed, Yahoo! e-mail does include the senders ip address in the header (actual IP replaced by XXX.XXX.XXX.XXX):

Received: from [XXX.XXX.XXX.XXX] by via HTTP; Fri, 13 Jan 2012 12:11:28 PST

However, I came across another article that had a little more information and while I don’t know that Dr. Campbell didn’t look at the headers (I imagine he did), he also found some clues as to where the e-mail was sent from because the person who sent them composed the e-mail using Microsoft Word and then pasted the content into the Yahoo! Mail. A Chicago Tribune article noted Dr. Campbell as saying “One should also be careful writing anonymous email using (Microsoft) Word :-).

I did some testing with cut and pasting from Microsoft Word and I wasn’t able to find any personally identifying information in the mark-up that comes across when you don’t send the e-mail as plain text via Yahoo! but I am sure that depending on your configuration and version of Word it could happen.

I think the take-away from this story in regards to e-mail is that you should never assume any e-mail you send is truly anonymous. It is true that you can make it “more anonymous” and harder to figure out depending on how you sent it and what tools you used, but unless you really take great lengths and know what you are doing, given enough resources if someone wants to enough where an e-mail came from thy can probably can figure it out or come close enough. Maybe not enough for a court of law, but enough that you’ll probably wish you didn’t send it. While it was a computer science professor that first figured out the e-mail was probably not from someone on the committee, it really wouldn’t have taken a computer genius in this case to figure out where it may have come from.

Worldcat record use policy causes National Library of Sweden to end negotiatons with OCLC

The National Library of Sweden has decided to end negotiations with OCLC about uploading their union catalog, Libris, into WorldCat as well as using WorldCat as a source of records in Libris. According to the announcement, Libris is and needs to remain an open database and OCLC’s WorldCat Rights and Responsibilities for the OCLC Cooperative does not make that possible. The National Library also believes that the record use terms would make it impossible to contribute biographical data to Europeana and the European Library. As Karen Coyle mentions in her blog post about this decision, open data (or the lack of it) is not just an idealogical stance: it “has real practical applications.” Whatever good the WorldCat record use policy has had, this is a real-world example of how it can (and in this case, has) also harm libraries – including OCLC member libraries who will not be able to access Libris records via WorldCat.

Library Journal contacted OCLC about the announcement, but they did not immediately respond to LJ’s request for comment.


Some of you probably have seen MIT’s announcement of MITx on December 19. Basically, “MITx will offer a portfolio of MIT courses through an online interactive learning platform.” It will “operate on an open-source, scalable software infrastructure” and offer many features that current learning Management Systems offer as well as some other unique features. While the technology sounds interesting, I am most interested in the program itself, in particular the credentialing. MIT has been a leader in the open education with its OpenCourseWare project, but adding a level of credentialing is a huge step. There isn’t a lot of information available yet, but basically if you want to learn, you can do that for fee. If you want some form of credential, there will be a fee for that. The credential will be a certificate of completion that will be offered buy a not-for-profit body within the Institute created to do such a thing. The body offering the credentials will be distinctly named to avoid confusion that MIT “proper” awarded the credential and costs are yet to be determined.

MITx has yet to announce what classes will be available but they plan to start offering classes in Spring 2012. More information can be found on the MITx announcement FAQ. If they have something I am interested in and it fits my schedule, I may try to take a class and, if I do, I’ll probably pay for the credential.

What I did on my September Vacation

Last week I took off from work for some vacation, but I didn’t leave the library world behind. In fact, I co-presented a Webinar, “Cloud computing and libraries: The view from 10,000 feet, with Dr. Heather Lea Moulaison that was put on by Education Institute (Canada) and the Neal-Schuman Professional Education Network (USA), talked to an LIS class at the University of Missouri (incidentally, I was very impressed by the students), and attended and co-presented a session with Dr. Moulaison at the LITA National Forum.

I skipped the last couple of LITAs National Forums as in the past I have not found them as useful for me as some other conferences I go to. With limited travel budgets, you need to look for value. LITA does not appear to be highly subsidized by sponsors and isn’t a cheap conference compared to other library conferences and the content has been a little weak in my areas. However when an opportunity to present with my co-editor, Heather Lea Moulaison, of Getting Started with Cloud Computing: A LITA Guide in her home state emerged, I figure, hey, why not? What else am I going to do with these vacation days? If I don’t use some, I’ll lose them, so I might as well hang out with some library peeps.

I am not going to review the whole conference but I was happy to see what seemed like an increase in sessions that were more advanced (technology-wise). It isn’t that past Forums were bad, I just wasn’t the proper audience. Kudos to this year’s program planners. I’d like to see less long breaks and it seemed odd that the posters were at the end of the day Saturday with no food or refreshment, but oh well. While I am on it, this isn’t just a LITA thing, but I think at most conferences sessions are too long. I’d much rather see two 25 minute presentations then one fifty minute one. I think this is were Code4Lib with it’s 20 minute time slots does a real good job. Library Journal has a good review of the 2011 LITA National Forum (and I’m not just saying that because they liked our presentation, although I’m pleased that they did.

The slides from our LITA presentation, Practical Approaches to Cloud Computing at YOUR Library, are available on CodaBox.

Webinar on Digital Preservation tommorow

Tomorrow (Tuesday, September 20, 2011) I will be one of two people presenting a Library Journal Webinar called Low Maintenance, High Value: How Binghamton University Libraries Used Digital Preservation to Increase its Value on Campus. My Co-presenter is Ido Peled, Rosetta Product Manager, Ex Libris Group. Ex Libris is also a cosponsor. The abstract of our talk is:

Is end-to-end Digital Preservation here today? Does it require an army of staff to manage? Is it a library function or a central IT function? Answer these questions and more while hearing Edward Corrado tell the story of turning the Binghamton University Libraries into the university’s identity and heritage storehouse.

Apparently you can register now and they will send you a link to the webcast is archived for your viewing pleasure.

Library Linked Data

Carl Grant has an excellent blog post about a vendor’s perspective on the case of the Library Linked Data Model. It is well worth a read if you are interested in Library Linked Data or how any other new idea/concept/profuct/service gets implemented by a vendor. Carl says that before vendors can invest (heavily) into Librry Linked Data the need to have some questions answered:

It includes a lack of clear understanding of what exactly are the problems being solved for the profession by this technology that can only be solved with the Library Linked Data model or that can’t be otherwise solved? Are these problems shared across the profession, across institutions? Is it agreed that the Library Linked Data model is the solution? If so, how many institutions, or even personal services, are in production status using this model to solve those problems?

This are interesting questions and ones I don’t have any answer for. The idea of Linked Data in the library world has been pushed around for a while, but it has only been recently that I have seen any working prototypes and implementations. While I am impressed with what some people have done and I understand some of the potential benefits, I don’t think any of the above questions have been answered. I’d really like to see some answers to the first one – especially what benefit will our users gain from it. I really want to be convinced that any significant investment in Library Linked Data will benefit our end users and I don’t see it (yet). I have never heard a student or professor come to me with a problem that linked data will solve more completely or more efficiently then other solutions. I imagine that will come with time, but until it does it is hard to make the case to go all-in on linked data.

There may be some benefits (mostly in the form of efficiency) from a staff point of view, but I am still not sure that at this point they outweigh the costs of implementation. Also, as Carl asks in his post (question #3), “How do we see this data being maintained? ” Unless you can give me a clear plan that shows sustainability, again it is hard to get behind the linked data model.

What does this all me? The proponents of Library Linked Data need to get out and show some real world examples on how it will help end-users and/or how it will create efficiencies that can not be seen by other solution. For example, if you are talking about bibliographic and related data, how would linked data be better then OCLC’s centralized Web Scale Management Services or Ex Libris’s Alma (assuming for Alma that the community zone is populated with the appropriate data).

Will these answers come? I believe so. The Library Linked Data Incubator Group is a good start — especially if they can provide examples as to how linked data will efficiently benefit end users in ways other technologies can not — but it will be a while before we see any signs that the “Early Majority” are ready to jump on board,

New Book: Getting Started with Cloud Computing

If you are looking for some fun and educational reading, why don’t you pick up a copy or two of Getting Started with Cloud Computing: A LITA Guide? I’d give a review, but I am biased since I am one of the co-editors along with Dr. Heather Lea Moulaison, I’ll just say that I think the book came out great and the author chapters did an excellent job. A million thanks to all of the authors and to Roy Tennant for writing the foreword. Neal-Schuman was great to work with as well.

Editing a book was a lot of work (more than I thought it would be, to be honest) but it was a rewarding experience and I leaned a lot along the way – both about the topic, and about editing a book.

By the way, if you happen to be in Europe, don’t fret, you can head over to Facet Publishing and get the UK imprint of Getting Started with Cloud Computing.

« Previous entries Next Page » Next Page »