Formatting XML

Formatting XML

Issue

In order to read and/or compare XML files it can be handy to format them in a common way. This article describes one method for doing this. It relies on a free 3rd party product that needs to be downloaded first.

Explanation

XML files are read by programs using an XML “parser”.  A proper XML parser will ignore all whitespace, tabs, line feeds, etc. in an XML document.  This means that when producing XML, a program may choose to format the xml in many ways, or even to omit formatting alltogether.

The good news is that properly constructed XML can be re-formatted quite easily with the right software.  There are many editors available that can do this. (we like Notepad++ with the XML Tools plug-in)  However, in order to “clean” lots of files, as in hundreds or thousands at a time, this becomes impractical. 

 

Solution

Steps to clean a folder full of XML files.

  1. There exists a utility called XMLStarlet at http://xmlstar.sourceforge.net/ that we use to process folders.  Download this and put the xml.exe in a folder that is in your system path. (Pd’ Programming employees – this utility is located on your U: drive)
  2. Hint: When using Windows Explorer to navigate your hard disk, you can enter “cmd” into the path edit (at the top of the window) to open a command prompt at that folder location.
  3. To process all files “in place”, losing the original content and replacing it with newly formatted content, execute this command from within the folder containing the xml files:
    for %f in (*.xml) do (ren %f x_%f & xml.exe fo --indent-spaces 1 --omit-decl --omit-decl x_%f > %f & del x_%f)
  4. To create new files in a subdirectory called “formatted”, create that directory and then execute this command from within the folder containing the xml files:
    for %f in (*.xml) do (xml.exe fo --indent-spaces 1 --omit-decl --omit-decl %f > formatted%f)
Was this article helpful?
0 out Of 5 Stars
5 Stars 0%
4 Stars 0%
3 Stars 0%
2 Stars 0%
1 Stars 0%
How can we improve this article?
How Can We Improve This Article?