In order to read and/or compare XML files it can be handy to format them in a common way. This article describes one method for doing this. It relies on a free 3rd party product that needs to be downloaded first.
XML files are read by programs using an XML “parser”. A proper XML parser will ignore all whitespace, tabs, line feeds, etc. in an XML document. This means that when producing XML, a program may choose to format the xml in many ways, or even to omit formatting alltogether.
The good news is that properly constructed XML can be re-formatted quite easily with the right software. There are many editors available that can do this. (we like Notepad++ with the XML Tools plug-in) However, in order to “clean” lots of files, as in hundreds or thousands at a time, this becomes impractical.
Steps to clean a folder full of XML files.
- There exists a utility called XMLStarlet at http://xmlstar.sourceforge.net/ that we use to process folders. Download this and put the xml.exe in a folder that is in your system path. (Pd’ Programming employees – this utility is located on your U: drive)
- Hint: When using Windows Explorer to navigate your hard disk, you can enter “cmd” into the path edit (at the top of the window) to open a command prompt at that folder location.
- To process all files “in place”, losing the original content and replacing it with newly formatted content, execute this command from within the folder containing the xml files:
for %f in (*.xml) do (ren %f x_%f & xml.exe fo --indent-spaces 1 --omit-decl --omit-decl x_%f > %f & del x_%f)
- To create new files in a subdirectory called “formatted”, create that directory and then execute this command from within the folder containing the xml files:
for %f in (*.xml) do (xml.exe fo --indent-spaces 1 --omit-decl --omit-decl %f > formatted%f)