blog.ecorrado.us

Ramblings about library technology, open source software, and other adventures!

 

Checking if xml is well formed using xmllint 2013 December 12

Filed under: technology — ecorrado @ 17:12:16

I had to check if an xml document was well formed today. I have a few nice graphical tools that can do this, but I wanted to do it in a script on my linux box. I found out that xmllint does a good job of this. The command I used is:

xmllint -noout input.xml

If the XML document is not well-formed errors well go to standard out and then I can figure out how to correct the cause of the errors.

 

3 Comments for this post

 
Roy Tennant Says:

Interestingly enough, xsltproc (part of libxml2) can be used for this as well. Although xsltproc expects an XSLT stylesheet as part of the call, and will complain if it doesn’t see one:

% xsltproc sample.xml
compilation error: file sample.xml line 2 element cite
xsltParseStylesheetProcess : document is not a stylesheet

if the XML doc is malformed it will instead complain about that:

% xsltproc brokensample.xml
brokensample.xml:17: parser error : Premature end of data in tag cite line 16

^
brokensample.xml:17: parser error : Premature end of data in tag cite line 2

^
cannot parse brokensample.xml

I’ve used this at times this way to verify well-formedness.

 
ecorrado Says:

Thanks for the tip Roy. In this case I don’t have a stylesheet just something that says what a Google Scholar holdings file should look like.

 
ecorrado Says:

Incidently, I noticed if I don’t add the noout option to xmllint, it will convert some of these non-Roman character sets to their Hex Refernce. This may prove useful at some point (although I’m not sure that it is the best tool for that)