Code4lib 2007, Day 1, Morning talks
Andrew Nagy
Andrew discussed how he went about implementing a MARCXML database. He started out discussing some of the performance issue and scalability testing. It was interesting that MarcXML wasn’t designed well for indexing because the name of fields are all the same. Villanova modified MarcXML and used the number of fields as elements. This allowed the native XML databases to easily index the fields and drastically improved response times. At this point these XML database implementing teams require a lot of knowledge about how things work to make searching work quickly. They products, hopefully will have better optimization in the future. However, Apache Solr has come to the rescue. SOLR implements a Lucene index on XML documents. Using SOLR made the performance much better.
Emily Lynema
Emily talked briefly about NCSU’s implementation of Endeca. I’m still amazed that NCSU is so excited about relevancy ranking. I’m not sure why SirsiDynix still doesn’t have this basic functionality. Admittedly, Voyager’s ranking is not as easy to adjust as it should be, but it has been in existence for about 15 years and works decently. Anyway, I’m off on a tangent… Emily’s main focus was talking about CatalogWS. CatalogWS is basically a web services interface their Endeca version of their library catalog. The URL is:
http://www.lib.ncsu.edu/catalogws/?
She talked a little bit about OpenSearch, and showed library catalogs in A9.
One interesting thing NCSU did with the search box on their webs, is making it search their web site and catalog at the same time. This seems better then one or the other.
Mike Rylander
Mike talked about the structure of PINEs. He started off by talking about the structure of Evergreen development. They have 4 categories, an administrative lead, a project manager, developers/domain specialties, and the customers. Mike felt that technical decisions should be made by the developers, and that the project manager should be the go-between for the developers and administrators and the customers.
Once you have your teams in place, you need to define your goals. Long term goals need to be flexible and have general time-lines (not exact dates). The long term goals should define an over-arching framework and should define to some degree over functional design.
Medium term goals should have soft deadlines, Some things take much longer (or shorter) then estimates. At this point you are defining “full team functional deliverables.”
Some do’s and don’ts for developers:
Do: Communication (talk about everything, talk early and often!)
Question every assumption that everyone makes
Build technical consensus
Don’t: Leave your ego at the door, but…
Let our ego rule you
Don’t take discussion for consensus
take management support for granted
forget to bring numbers when pushing back
be afraid to fight (if a simple disagreement will tear about the project, you have bigger problems)
Administration Do’s
always listen to everything your developers say
advocate for customers (to the developers)
advocate for developers (to the customers)
Administrator’s don’ts
force hard long-term deadlines
make a bad working environment
Costumer Do’s
be involved from the start
Costumer’s Don’t
make assumptions about functionality
avoid “touchy” subjects
Meetings: Meet as often as necessary, but not any more!
Do until done
only have the people needed at the meeting there
be ready to fight for what is right
Mike finished up talking about what to look for as far as “Results measurement” is concerned. His suggestions where:
where goals met on time?
functionality complete?
costumers happy?
working code wins!
Tito Sierra: SmartSubjects
Tito talked about NCSU’s SmartSubject talks. The program makes subject recommendations. He talked a little bit about how it works and what data is used to create this system. While a lot of the results are good, it isn’t perfect. Future plans include using this as a database advisor and gauging interest of this program outside of the NCSU.