Ramblings about library technology, open source software, and other adventures!


Checking if xml is well formed using xmllint 2013 December 12

Filed under: technology — ecorrado @ 17:12:16

I had to check if an xml document was well formed today. I have a few nice graphical tools that can do this, but I wanted to do it in a script on my linux box. I found out that xmllint does a good job of this. The command I used is:

xmllint -noout input.xml

If the XML document is not well-formed errors well go to standard out and then I can figure out how to correct the cause of the errors.


Symlinking MySQL data files 2012 January 19

Filed under: technology — ecorrado @ 18:01:30

Due to space issues, I had to move my MySQL data files (/var/lib/mysql) on a Ubuntu box to another file system. I did that and created a symlink, but MySQL would not start. It turns out to be an easy fix. I found a post on MySQL Forums from someone who had a similar problem, and someone named Richard Guy posted my solution:

Did you fix the apparmor config file for mysqld and restart apparmor?

first you need to edit /etc/apparmor.d/usr.sbin.mysqld and add the new fully qualified (ie, no symbolic link) path(s). [you may want to leave the original /var/lib/mysql entries intact :-) ]

then “restart” apparmor:

sudo invoke-rc.d apparmor reload

for more info, see

I edited the file as appropriate, and MySQL started fine and now I don’t have to worry about my disk space filling up again.


Is that annonymous e-mail annonymous? 2012 January 17

Filed under: technology — ecorrado @ 22:01:57

A friend on facebook posted a link the other day to an article about University of Illinois President Michael Hogan’s chief of staff resigning after an anonymous e-mail was sent to the University Senates Conference from a Yahoo! e-mail account. I don’t know much about what is happening at the University of Illinois but I was intrigued about the attempt at anonymous e-mail.

The article stated that a computer science professor, Roy Campbell, was able to determine that the emails may have been sent by someone in the president’s office. The initial article I read didn’t say how the computer science professor figured that out so I thought he might have looked at the e-mail headers. I did some checking with e-mails sent to my personal e-mail account from people with Yahoo! addresses and found that, indeed, Yahoo! e-mail does include the senders ip address in the header (actual IP replaced by XXX.XXX.XXX.XXX):

Received: from [XXX.XXX.XXX.XXX] by via HTTP; Fri, 13 Jan 2012 12:11:28 PST

However, I came across another article that had a little more information and while I don’t know that Dr. Campbell didn’t look at the headers (I imagine he did), he also found some clues as to where the e-mail was sent from because the person who sent them composed the e-mail using Microsoft Word and then pasted the content into the Yahoo! Mail. A Chicago Tribune article noted Dr. Campbell as saying “One should also be careful writing anonymous email using (Microsoft) Word :-).

I did some testing with cut and pasting from Microsoft Word and I wasn’t able to find any personally identifying information in the mark-up that comes across when you don’t send the e-mail as plain text via Yahoo! but I am sure that depending on your configuration and version of Word it could happen.

I think the take-away from this story in regards to e-mail is that you should never assume any e-mail you send is truly anonymous. It is true that you can make it “more anonymous” and harder to figure out depending on how you sent it and what tools you used, but unless you really take great lengths and know what you are doing, given enough resources if someone wants to enough where an e-mail came from thy can probably can figure it out or come close enough. Maybe not enough for a court of law, but enough that you’ll probably wish you didn’t send it. While it was a computer science professor that first figured out the e-mail was probably not from someone on the committee, it really wouldn’t have taken a computer genius in this case to figure out where it may have come from.


Worldcat record use policy causes National Library of Sweden to end negotiatons with OCLC 2011 December 22

Filed under: libraries,technology — ecorrado @ 19:12:42

The National Library of Sweden has decided to end negotiations with OCLC about uploading their union catalog, Libris, into WorldCat as well as using WorldCat as a source of records in Libris. According to the announcement, Libris is and needs to remain an open database and OCLC’s WorldCat Rights and Responsibilities for the OCLC Cooperative does not make that possible. The National Library also believes that the record use terms would make it impossible to contribute biographical data to Europeana and the European Library. As Karen Coyle mentions in her blog post about this decision, open data (or the lack of it) is not just an idealogical stance: it “has real practical applications.” Whatever good the WorldCat record use policy has had, this is a real-world example of how it can (and in this case, has) also harm libraries – including OCLC member libraries who will not be able to access Libris records via WorldCat.

Library Journal contacted OCLC about the announcement, but they did not immediately respond to LJ’s request for comment.


MITx 2011 December 20

Filed under: general,technology — ecorrado @ 17:12:03

Some of you probably have seen MIT’s announcement of MITx on December 19. Basically, “MITx will offer a portfolio of MIT courses through an online interactive learning platform.” It will “operate on an open-source, scalable software infrastructure” and offer many features that current learning Management Systems offer as well as some other unique features. While the technology sounds interesting, I am most interested in the program itself, in particular the credentialing. MIT has been a leader in the open education with its OpenCourseWare project, but adding a level of credentialing is a huge step. There isn’t a lot of information available yet, but basically if you want to learn, you can do that for fee. If you want some form of credential, there will be a fee for that. The credential will be a certificate of completion that will be offered buy a not-for-profit body within the Institute created to do such a thing. The body offering the credentials will be distinctly named to avoid confusion that MIT “proper” awarded the credential and costs are yet to be determined.

MITx has yet to announce what classes will be available but they plan to start offering classes in Spring 2012. More information can be found on the MITx announcement FAQ. If they have something I am interested in and it fits my schedule, I may try to take a class and, if I do, I’ll probably pay for the credential.


What I did on my September Vacation 2011 October 5

Filed under: conferences,libraries,technology — ecorrado @ 17:10:33

Last week I took off from work for some vacation, but I didn’t leave the library world behind. In fact, I co-presented a Webinar, “Cloud computing and libraries: The view from 10,000 feet, with Dr. Heather Lea Moulaison that was put on by Education Institute (Canada) and the Neal-Schuman Professional Education Network (USA), talked to an LIS class at the University of Missouri (incidentally, I was very impressed by the students), and attended and co-presented a session with Dr. Moulaison at the LITA National Forum.

I skipped the last couple of LITAs National Forums as in the past I have not found them as useful for me as some other conferences I go to. With limited travel budgets, you need to look for value. LITA does not appear to be highly subsidized by sponsors and isn’t a cheap conference compared to other library conferences and the content has been a little weak in my areas. However when an opportunity to present with my co-editor, Heather Lea Moulaison, of Getting Started with Cloud Computing: A LITA Guide in her home state emerged, I figure, hey, why not? What else am I going to do with these vacation days? If I don’t use some, I’ll lose them, so I might as well hang out with some library peeps.

I am not going to review the whole conference but I was happy to see what seemed like an increase in sessions that were more advanced (technology-wise). It isn’t that past Forums were bad, I just wasn’t the proper audience. Kudos to this year’s program planners. I’d like to see less long breaks and it seemed odd that the posters were at the end of the day Saturday with no food or refreshment, but oh well. While I am on it, this isn’t just a LITA thing, but I think at most conferences sessions are too long. I’d much rather see two 25 minute presentations then one fifty minute one. I think this is were Code4Lib with it’s 20 minute time slots does a real good job. Library Journal has a good review of the 2011 LITA National Forum (and I’m not just saying that because they liked our presentation, although I’m pleased that they did.

The slides from our LITA presentation, Practical Approaches to Cloud Computing at YOUR Library, are available on CodaBox.


Webinar on Digital Preservation tommorow 2011 September 19

Filed under: libraries,technology — ecorrado @ 16:09:15

Tomorrow (Tuesday, September 20, 2011) I will be one of two people presenting a Library Journal Webinar called Low Maintenance, High Value: How Binghamton University Libraries Used Digital Preservation to Increase its Value on Campus. My Co-presenter is Ido Peled, Rosetta Product Manager, Ex Libris Group. Ex Libris is also a cosponsor. The abstract of our talk is:

Is end-to-end Digital Preservation here today? Does it require an army of staff to manage? Is it a library function or a central IT function? Answer these questions and more while hearing Edward Corrado tell the story of turning the Binghamton University Libraries into the university’s identity and heritage storehouse.

Apparently you can register now and they will send you a link to the webcast is archived for your viewing pleasure.


Library Linked Data 2011 August 5

Filed under: libraries,technology — ecorrado @ 17:08:53

Carl Grant has an excellent blog post about a vendor’s perspective on the case of the Library Linked Data Model. It is well worth a read if you are interested in Library Linked Data or how any other new idea/concept/profuct/service gets implemented by a vendor. Carl says that before vendors can invest (heavily) into Librry Linked Data the need to have some questions answered:

It includes a lack of clear understanding of what exactly are the problems being solved for the profession by this technology that can only be solved with the Library Linked Data model or that can’t be otherwise solved? Are these problems shared across the profession, across institutions? Is it agreed that the Library Linked Data model is the solution? If so, how many institutions, or even personal services, are in production status using this model to solve those problems?

This are interesting questions and ones I don’t have any answer for. The idea of Linked Data in the library world has been pushed around for a while, but it has only been recently that I have seen any working prototypes and implementations. While I am impressed with what some people have done and I understand some of the potential benefits, I don’t think any of the above questions have been answered. I’d really like to see some answers to the first one – especially what benefit will our users gain from it. I really want to be convinced that any significant investment in Library Linked Data will benefit our end users and I don’t see it (yet). I have never heard a student or professor come to me with a problem that linked data will solve more completely or more efficiently then other solutions. I imagine that will come with time, but until it does it is hard to make the case to go all-in on linked data.

There may be some benefits (mostly in the form of efficiency) from a staff point of view, but I am still not sure that at this point they outweigh the costs of implementation. Also, as Carl asks in his post (question #3), “How do we see this data being maintained? ” Unless you can give me a clear plan that shows sustainability, again it is hard to get behind the linked data model.

What does this all me? The proponents of Library Linked Data need to get out and show some real world examples on how it will help end-users and/or how it will create efficiencies that can not be seen by other solution. For example, if you are talking about bibliographic and related data, how would linked data be better then OCLC’s centralized Web Scale Management Services or Ex Libris’s Alma (assuming for Alma that the community zone is populated with the appropriate data).

Will these answers come? I believe so. The Library Linked Data Incubator Group is a good start — especially if they can provide examples as to how linked data will efficiently benefit end users in ways other technologies can not — but it will be a while before we see any signs that the “Early Majority” are ready to jump on board,


New Book: Getting Started with Cloud Computing 2011 July 26

Filed under: libraries,technology — ecorrado @ 19:07:23

If you are looking for some fun and educational reading, why don’t you pick up a copy or two of Getting Started with Cloud Computing: A LITA Guide? I’d give a review, but I am biased since I am one of the co-editors along with Dr. Heather Lea Moulaison, I’ll just say that I think the book came out great and the author chapters did an excellent job. A million thanks to all of the authors and to Roy Tennant for writing the foreword. Neal-Schuman was great to work with as well.

Editing a book was a lot of work (more than I thought it would be, to be honest) but it was a rewarding experience and I leaned a lot along the way – both about the topic, and about editing a book.

By the way, if you happen to be in Europe, don’t fret, you can head over to Facet Publishing and get the UK imprint of Getting Started with Cloud Computing.


RDA and transforming to a new bibliographic framework 2011 June 3

Filed under: libraries,technology — ecorrado @ 17:06:56

I haven’t had the opportunity to work much with RDA records yet, however I’ve been following some e-mail lists, blogs, and other commentaries where people have been discussing there experiences with it. The Library of Congress , the National Library of Medicine (NLM), and the National Agricultural Library (NAL) organized testing to evaluate whether or not they will implement RDA.

Out of this testing experience (which is still being analyzed), the Library of Congress issued “Transforming our Bibliographic Framework: A Statement from the Library of Congress” on May 13. According to the statement, “Spontaneous comments from participants in the US RDA Test show that a broad cross-section of the community feels budgetary pressures but nevertheless considers it necessary to replace MARC 21 in order to reap the full benefit of new and emerging content standards.” Therefore, Library of Congress is going to investigate, among other things, replacing MARC 21.

From what I have heard of the RDA testing, I think this makes sense. The general feel I get is that RDA by its self is not enough of a change to make libraries expend the resources necessary to implement it. Sure there are some improvements over AACR2, but there are also many things I read that are not improvements. This is especially true if you agree with the Taiga Forum 6′s 2011 Provocative Statement #2 that libraries will need to participate in radical cooperation. RDA offers a bit too much flexibility to insure that bibliographic records created by one library will fit well for other libraries. For example, the Rule of 3 is gone which on the cover is an improvement since it allows for more then 3 authors to be included as main or added entry. However, as discussions on the RDA-L list, it requires only the first author and illustrators of children’s books as author main or added entry. Local choices are great if you are only working for the local and not “radically cooperating.”

I won’t go through the list of complaints (and, to be fair, some complements) of RDA I’ve seen, as you can find them yourselves. I think my takeaway though is RDA on top of our existing bibliographic infrastructure is probably not going to make a monumental improvement for our patrons while at the same time it will be costly to implement (especially retroactively). RDA might be better than AACR2, but is it better enough that migrating to it is worth the time and costs? I am not so sure. Maybe simple changes to AACR2 would be just as good and more practical?

Some people I talk to think moving to RDA is a necessary first step that will make more significant or radical changes easier in the future. I, however, have a underlying fear that if libraries implement RDA in the current environment they will be stuck with it for a long time and it will actually make it harder to implement something different in the future. I hope the others are right and I am wrong since I believe in the short to medium term, RDA will be implemented on top of our existing bibliographic infrastructure – for better or worse.

If we replace our underlying bibliographic infrastructure with something else and change to RDA, say maybe something based on RDF or some other standard model for data interchange, we might actually get a significant change that will help expose our bibliographic data to the greater world of linked data while at the same time making it easier for libraries to take advantage of linked data.

One thing that the Library of Congress needs to take account in this process is the economic realities of implementing something new. I don’t see this specifically mentioned in the issues they plan on addressing. I assume that it will be part of the underlying discussions, but I would like to see it more prominently mentioned. Part of this is also involving vendors as well as open source developers of systems such as Evergreen and Koha. If LoC makes a change, it will effect libraries throughout the US (and probably the world). If the systems libraries use can’t function withing this new bibliographic framework, it will be a difficult and extremely expensive transition.

I think this is something librarians, especially those in systems and cataloging, should follow closely. I know I will be doing so.