Entries from February 2007 ↓

Code4lib, Day 1, Afternoon talks

Afternoon talks:

Fabian Tiburce, Peter Giansante, and Beth Jefferson.

Forger the lipstick, this pig need social skills:

People don’t complain about the catalog, they just go elsewhere. The next generation catalog should be about discovering, not just about finding. People are interested with alternative ways of exploring for content. They went on and talked about some of their technology and future opportunities.

Kevin Clarke:

The XQuery Expose: Practical experiences from a digital library

XQuery is an XML Query language. XQuery specification doesn’t include full text searching and doesn’t currently support updating.This fulltext issue should be kept in mind if you are going to use it (Kevin didn’t realize this at first). However, some of the engines do support this. XQuery is able to integrate a XML data from a variety of sources. Kevin went on to give a good overview of what XQuery can and can’t do. I found it interesting on how XQuery can do booth loose and more structured syntaxes. After the overview, Kevin went on to talk about how Princeton University is using XQuery. Things they liked about XQuery, the freedom (nothing between you and your data), ease of use, now an official W3C standard.

Richard Wallis (From Talis):

All protocols that libraries use have a shared problem, wether they are obscure or not, is that they have an insular view of the world. They don’t play well with others. The solution, according to Talis, is an API (there example is the Bigfoot Store API). Richard demoed (live) the API and how different parts can be augmented. One of the things it can do is deep links into OPACs. Considering the network in the conference center, this was a risky approach, but seemed to work OK for him.Overall, it is an interesting project and makes you why other ILS vendors aren’t doing things like this.

Richard explained what you get with the Bigfoot Store API. They include:
Items: Query + XSLT output transform
Augment: Augment recognized elements as RSS 1.0 feeds
Facets: Request facets for a query
OAI-PMH (coming)
Ability to Custom Config (coming)
On demand Stores (coming)
Transform service

The demoed a WordPress minicat using there transform service. It looked really neat. I want to look at it more at http://tdn.talis.com

Code4lib 2007, Day 1, Morning talks

Andrew Nagy

Andrew discussed how he went about implementing a MARCXML database. He started out discussing some of the performance issue and scalability testing. It was interesting that MarcXML wasn’t designed well for indexing because the name of fields are all the same. Villanova modified MarcXML and used the number of fields as elements. This allowed the native XML databases to easily index the fields and drastically improved response times. At this point these XML database implementing teams require a lot of knowledge about how things work to make searching work quickly. They products, hopefully will have better optimization in the future. However, Apache Solr has come to the rescue. SOLR implements a Lucene index on XML documents. Using SOLR made the performance much better.


Emily Lynema

Emily talked briefly about NCSU’s implementation of Endeca. I’m still amazed that NCSU is so excited about relevancy ranking. I’m not sure why SirsiDynix still doesn’t have this basic functionality. Admittedly, Voyager’s ranking is not as easy to adjust as it should be, but it has been in existence for about 15 years and works decently. Anyway, I’m off on a tangent… Emily’s main focus was talking about CatalogWS. CatalogWS is basically a web services interface their Endeca version of their library catalog. The URL is:

http://www.lib.ncsu.edu/catalogws/?

She talked a little bit about OpenSearch, and showed library catalogs in A9.

One interesting thing NCSU did with the search box on their webs, is making it search their web site and catalog at the same time. This seems better then one or the other.

Mike Rylander

Mike talked about the structure of PINEs. He started off by talking about the structure of Evergreen development. They have 4 categories, an administrative lead, a project manager, developers/domain specialties, and the customers. Mike felt that technical decisions should be made by the developers, and that the project manager should be the go-between for the developers and administrators and the customers.

Once you have your teams in place, you need to define your goals. Long term goals need to be flexible and have general time-lines (not exact dates). The long term goals should define an over-arching framework and should define to some degree over functional design.

Medium term goals should have soft deadlines, Some things take much longer (or shorter) then estimates. At this point you are defining “full team functional deliverables.”

Some do’s and don’ts for developers:

Do: Communication (talk about everything, talk early and often!)
Question every assumption that everyone makes
Build technical consensus

Don’t: Leave your ego at the door, but…
Let our ego rule you
Don’t take discussion for consensus
take management support for granted
forget to bring numbers when pushing back
be afraid to fight (if a simple disagreement will tear about the project, you have bigger problems)

Administration Do’s
always listen to everything your developers say
advocate for customers (to the developers)
advocate for developers (to the customers)

Administrator’s don’ts
force hard long-term deadlines
make a bad working environment

Costumer Do’s
be involved from the start

Costumer’s Don’t
make assumptions about functionality
avoid “touchy” subjects

Meetings: Meet as often as necessary, but not any more!
Do until done
only have the people needed at the meeting there
be ready to fight for what is right

Mike finished up talking about what to look for as far as “Results measurement” is concerned. His suggestions where:
where goals met on time?
functionality complete?
costumers happy?
working code wins!

Tito Sierra: SmartSubjects

Tito talked about NCSU’s SmartSubject talks. The program makes subject recommendations. He talked a little bit about how it works and what data is used to create this system. While a lot of the results are good, it isn’t perfect. Future plans include using this as a database advisor and gauging interest of this program outside of the NCSU.

Code4Lib, Keynote, Day 1

Here are my notes from Karen Schneider’s keynote. Sorry for the typos, etc… I’m just trying to get these up quickly…

Karen started off by telling us a little bit about herself and what she things is important in her life. One interesting thing, which I can somewhat relate to, is that some of what it is most important to us doesn’t get on lists of what is important because they are so integral to us they don’t get mentioned.

Karen went on to discus the “State of Emergency” in our profession. Four things she pointed out were: 1) We have given away our collections (in a very short period of time), 2) We don’t build or own the tools that manage the, 3) we provide complex, poorly-marketed systems, and 4) We function like a monopoly service when our competition is thriving tight under our nose.

Next, Karen went on to discuss some of the things we can fix in libraries. Three things she brought up that we can fix: digital preservation, standards adoption, the s sucky state of most library software, third-party issues, scholarly awareness of key issues in LibraryLand. Karen also said we need to seize control of the tools that we rely on! Luckily, some people are already doing some nifty happens in the library world such as scribilio, umalt, Evergreen, and SOLR.

Karen pointed out that we are on a renascence of library built software and that this is a good thing because it begins to restore the balance of power, reinstates the direction of profession, puts the emphasis back on the library as a memory org., and sends the message that we mean business. Karen also brought up the concept of the “resocialzation of librarian artisans” that is now taking place thanks to the internet and #code4lib.

Karen then went not to discuss the big OSS library project for today: Evergreen. According to Karen (and probably almost everyone else here), Evergreen is big …. really big. The timing for Evergreen is perfect, because wean era of worrisome ILS vendor consolidation, and paradoxaly.
the centrality of the ILS is weakening (there is less risk involved now, which allows us to take risks). Karen finished up with some of the “don’ts” when marketing an Open Source project with some useful over-generalizations (her words, not mine): 1) nobody cares about open source, 2)nobody cares about standards, 3) nobody cares about usability, and 4) nobody cares about Evergreen (they don’t care about software, period).

EndUser Voyager Hackfest Collaboration: Call for Participation

Here’s a copy of an e-mail about a Voyager hackfest that I sent out to a few mailing lists……

EndUser Voyager Hackfest Collaboration: Call for Participation

Hello Voyager Hackers,

New to EndUser in 2007 is a Voyager Hackfest. A hackfest is when a group of programmers and developers get together to work on a task. Hackfests have been a popular occurrence at computer-related conferences in years past and have been popularized in the library world by the Access conference [1] held in Canada each year. The 2002 Access Hackfest was described as a “collaborative effort to solve real world library problems using freely available tools” [2]. While many hackfests are competitions in which programmers compete against each other, the EndUser Voyager Hackfest will follow the Access conference tradition of collaboration.

The hackfest will be three hours and will take place on Saturday, April 28 from 9:00 AM to 12:00 noon. The meeting room will have Internet access. Participants must bring their own laptops if they wish to have computer access during the hackfest. Hackfest participants will be required to be registered for EndUser 2007. Because of the size of the room, the hackfest is limited to 20 people. Anyone who registers after the hackfest is full will be placed on a waiting list

A mailing list will be set up soon where hackfest participants and other interested parties will be able to discuss the project(s) that will be hacked on at EndUser. If you have any questions, please contact Edward M. Corrado at corrado@tcnj.edu.

Please pass this notice on to anyone at your institution that you feel may be interested in participating.

The hackfest is session number is 73 on the EndUser 2007 website – http://www.enduser2007.com

Sincerely,

Edward M. Corrado

[1] For more information about the history of the Access hackfest, please see the following Web Sites:

2002-2004: http://old.onebiglibrary.net/yale/curtis/hackfest/
2005: http://access2005.library.ualberta.ca/hackfest.php
2006: http://www.access2006.uottawa.ca/?cat=6

[2] Access 2002 Website: http://www.access.uwindsor.ca/units/access/main.nsf/hackfest?OpenForm

2006 Race Attendence Stats

While updating my 2007 race stats after going to Manzanita, I noticed I never summarized my 2006 stats. I also forgot to write about one race I went to at the end of the year. That race was a special event ran on the same weekend of SuperDirt week at The /New Afton Speedway in New York. As I recall it was an enjoyable race and one I would attend again. This wasn’t a new track for me, but was only the second time for me at the track.

My 2006 race attendance stats counting the race at New Afton Speedway:
Races: 24
Tracks: 20 (13 new) (150 lifetime)
States: 14 (3 new) (35 + DC lifetime)

Copper on Dirt, Manzanita Speedway, Feb. 9-10, 2007

On Feburary 9 and 10 I went to Manzanita Speedway to watch the Copper on Dirt. This was the first ever edition of the race which feature the USAC Natonal Midgets, USAC Silver Crown, and USAC/CRA Sprints. I found the event to be a huge success and I am sure they will have the event again. Sure, their was some bugs in the program, but overall I greatly enjoyed the racing. The track was kind of shot by the end of the races on Saturday night. They also could have ran the show off a little quicker, but I think with a few tweaks this could be one of the top 10 dirt track events in the USA in the years to come. It was also nice to Ricky Stenhouse Jr. win two of the events. I don’t think anyway (besides maybe Ricky’s family would have caused that one!

My stats for 2007 after two nights in Manzanita:

Races: 6
Tracks: 2 (0 new) (150 lifetime)
States: 2 (0 new) (35 lifetime)

Thinking about a backup mail server

I really like my LVS from Redwood Virtual. It give me full control of what, for all practical purposes appears to be a whole Linux server. Sure, it is a small one, but you pay for what you get. However, one thing that has happen once or twice in the years I’ve had my LVS is that for some reason, it will go down, and it will take a few days to come back up. The latest was last week, when a clerical error caused my LVS to be deactivated (this was the reason why if you tried to read my blog last week, you couldn’t connect). While Redwood made good on this, it was still down for a few days before it got fixed. They aren’t the fastest to respond to support requests, and I wasn’t in town, so it took me some time to get back to them with the information they needed. All-in-all, not a good situation for an e-mail server. Sure, I set up my DNS entry with Zoneedit to forward e-mail somewhere else when I noticed there was a problem, but I’m sure I missed some mail before I changed the DNS (although it was most likely spam). I guess I have a few options, but the ones I’m considering the most are 1) moving my mail server to another hosted site that is more suitable for mission-critical applications, 2) subscribing to a broadband Internet connection that will allow me to run my own mail server at home, or 3) find another mail server and make it a backup mailserver. I’m not really sure how that works with DNS and all, but I’m sure there is a way to do it. I think I’m leaning towards that option, but will need to investigate how. A fourth option would be to write some sort of shell script that will change my Zoneedit DNS settings if it notices my mail server is down to add a mail forwarder (and remove it when it is back working again). That seems too much like a kludge though.

As I mentioned despite this last episode which caused ecorrado.us to be down for a few days, I’m still happen with Redwood Virtual and don’t plan on giving up on my little LVS. They have done good by me, and offer a really nice service for the price.

Linux Distro Recommendations

On the LOPSA-NJ e-mail list, someone asked for a recommendation as to what Linux distro they should use for a file and print server. The response was predictable. Use what I use! Don’t use that, use blah! Etc. I think these lines of advice are not the most productive way. For most applications, it really comes down to what you are familiar with, being the best distro. In some cases, if the package maintainer for the main service is exceptional, you might want to use that distro over your normal choice. Of course, the opposite is true. Anyway, this was my response to the question about what Linux distribution to use for a file and print server:

Really, this comes down to a political or religious war. Some people
like Debian, some prefer CentOS, some prefer Suse (although less
and less theses days with Novell’s potential deal with Microsoft),
some prefer… . Really it comes down to what you need to do and
what you are comfortable supporting. While Debian may be a better
distro to host Service A, CentOS might be better at Service B, but in
the end, they all can probably do what you want. Also some have
shorter release schedule (Fedora Core) than others (Debian). In
some cases shorter is better, in other cases the opposite is true. In
most cases for a file and print server, it probably doesn’t matter. In
the case of a file and print server, I believe any major distro will be
fine. If the person maintaining the server is more used to RedHat,
probably coming with a RedHat variant (such as CentOS or Fedora
Core) might be the way to go because of familiarity. Personally, I
have used many different distros and have been using Mandriva on
servers most recently for a file and print services with Windows and
Mac clients and have had absolutely no problems. Mandriva also
proved easy for the non-Linux user on site to do basic administrative
tasks. I’ve used a couple of the distros that others on this list have
recommended and seem to love, but had problems with them in the
past. Go figure. When in comes down to it, it is all personal
preference. That said, I’d recommend to any new-to-Linux person
the following 1) Pick one of the “main-stream” binary distros that are
well supported. These could mean (on x86-type platforms one of the
following (I’m sure I’m missing one or two):

CentOS
Debian
Fedora Core
Mandriva
Ubuntu
RHEL (if you have the $$$)
Suse/OpenSuse

2) To narrow it down, if you know someone that is willing to help out
in a pinch, pick what they use! And 3) The next
thing I’d recommend is go to the meetings of your local Linux Users
Group, as there should be a number of people there that will be
willing to help you and give you some real good advice
.

Good luck!

Edward M. Corrado
President, Linux Users Group/In Princeton, Inc.
http://lugip.org

Open Source Software in Libraries

It seems that Evergreen, combined with a general dissatisfaction with commercial library software vendors, has prompted some serious interest in collaborative Open Source Software development in New Jersey-Philadelphia area academic libraries. Two proposals that I know about (and am in someway involved in) are VALE-OLS and a yet-to-be-named group being organized via PALINET. The VALE-OLS project involves a New Jersey state-wide academic library consortium, VALE, for which some librarians (including me) have been authorized to investigate the feasibility of a shared Open Source Integrated Library System (ILS) for New Jersey academic libraries and write a white paper about it. PALINET organized a conference call that took place earlier today that discussed the idea of starting some sort of shared development office for library-related Open Source Software centered in the Philadelphia area. The PALINET group is not just looking to work on an ILS, but is also planning to help develop and support other library technologies link resolvers and meta search tools. Although these initiatives are still very much in the beginning part of the planning/investigation stage, and a lot of questions will need to be answered and there are a lot of hurdles to be jumped before either of them even reach infancy, they are great signs of progress.

There are other signs of interest as well. On a more global scale there is an IFLA pre-conference about Open Source Software in Africa coming up this summer (that I will be presenting at). There is the huge success of the code4lib conference. And more libraries are releasing Open Source Software such as the LibraryFind meta-search software the Oregon State University announced yesterday.

On a more personal note, I have been invited to do three lectures/presentations on Open Source Software at state-wide library conferences. So far two of them are confirmed, and one is still pending, but even if that one doesn’t work out, it still shows their is significant interest in the topic. Besides those presentations, I will also be doing a talk about Open Source at a vendor-related conference this April. (BTW: Once the various conferences release there schedules, I’ll be sure to give everyone the details here on my blog).

While there has been slow movement in this area for at least the last 8 years, I think Open Source in libraries has finally started to pick up a good amount of steam. Advocates of Open Source in Libraries can thank Evergreen (and to a lesser extent, Koha) for that. The folks at Georgia PINES have proven that it can be done on a large scale, which is making not only programmers and other Open Source advocates in libraries take notice, but also making high-level library administrators take notice. I also should say that for academic libraries, at least, some credit needs to be given to the success of other Open Source initiatives in academia, such as the Moodle course management system.

I don’t know if either the VALE-OLS or PALINET open source initiatives will end up seeing the light of day, but just the fact that they are seriously being discussed shows that we might just be at a significant turning point. Open Source may or may not be sexy, but at least in libraries, it sure is exciting.