Checking if xml is well formed using xmllint

I had to check if an xml document was well formed today. I have a few nice graphical tools that can do this, but I wanted to do it in a script on my linux box. I found out that xmllint does a good job of this. The command I used is:

xmllint -noout input.xml

If the XML document is not well-formed errors well go to standard out and then I can figure out how to correct the cause of the errors.

3 Comments

  1. Roy Tennant said,

    December 12, 2013 at 17:12:52

    Interestingly enough, xsltproc (part of libxml2) can be used for this as well. Although xsltproc expects an XSLT stylesheet as part of the call, and will complain if it doesn’t see one:

    % xsltproc sample.xml
    compilation error: file sample.xml line 2 element cite
    xsltParseStylesheetProcess : document is not a stylesheet

    if the XML doc is malformed it will instead complain about that:

    % xsltproc brokensample.xml
    brokensample.xml:17: parser error : Premature end of data in tag cite line 16

    ^
    brokensample.xml:17: parser error : Premature end of data in tag cite line 2

    ^
    cannot parse brokensample.xml

    I’ve used this at times this way to verify well-formedness.

  2. ecorrado said,

    December 12, 2013 at 17:12:55

    Thanks for the tip Roy. In this case I don’t have a stylesheet just something that says what a Google Scholar holdings file should look like.

  3. ecorrado said,

    December 12, 2013 at 17:12:31

    Incidently, I noticed if I don’t add the noout option to xmllint, it will convert some of these non-Roman character sets to their Hex Refernce. This may prove useful at some point (although I’m not sure that it is the best tool for that)