XML is the format JSON replaced for most of the web, but it's still everywhere — SOAP payloads in financial services, SAP IDocs, Android manifests, Maven POMs, RSS and Atom feeds, SVG files, OpenDocument bundles, XMP metadata, sitemaps for search engines, and every Office .docx or .xlsx you've ever opened. Minifying XML is less glamorous than JSON but more consequential in enterprise contexts, where a .xml of a few megabytes is routine.
The tricky part is that XML has several places where whitespace is semantically meaningful, and a minifier that ignores them will quietly break documents. Let's go through what's safe, what isn't, and how to stay out of trouble.
What's Safely Removable
- Comments (
<!-- ... -->). Not parsed as data. - Whitespace between tags. Indentation and newlines between
</author>and<pubDate>are decorative. - Leading and trailing whitespace inside tag content — usually. Exceptions below.
That's roughly the same "safe zone" as HTML, with one important difference: XML has no equivalent of HTML's <pre> element. The preservation mechanism is explicit, and the minifier has to respect it.
Things You Must Preserve
1. xml:space="preserve"
XML has a built-in mechanism to opt specific elements out of whitespace normalization. If any element has xml:space="preserve", its descendant text nodes must be kept byte-for-byte. A common example:
<poem xml:space="preserve"> Roses are red Violets are blue Indentation matters here </poem>Collapsing the whitespace inside that element ruins the document's meaning. Our XML Minifier detects xml:space="preserve" and leaves those subtrees alone.
2. CDATA sections
A CDATA block (<![CDATA[ ... ]]>) is XML's way of saying "treat this literally, don't interpret any XML inside." Most commonly used when you want to embed code, HTML, or other markup:
<script><![CDATA[ if (x < 10) { console.log("small"); } ]]></script>That < inside the CDATA would otherwise confuse the XML parser. The minifier must preserve the entire CDATA span including any whitespace inside — the content is opaque to XML.
3. Digitally signed documents (XML Digital Signature)
If the XML is signed with XMLDSig, the signer computed a hash over the canonicalized serialization — and then someone else will re-canonicalize and re-hash it to verify. Whitespace is part of canonicalization only in specific ways (see XML C14N), but in practice, any structural change after signing invalidates the signature. Minifying a signed document silently breaks it.
Rule: if you need a signed, minified document, minify first, then sign. Never the other way around.
4. Attribute values
Everything inside attr="..." is data. Don't touch it. Our tool doesn't, but if you're rolling your own regex-based minifier, this is a common bug.
5. Entity references
References like &, <,  , or custom entities declared in a DTD must pass through exactly. The minifier is a text-level tool — it doesn't decode entities, which means weird but correct sequences like a three-character entity reference are kept intact.
XML Namespaces
XML namespace declarations (xmlns:foo="...") can look redundant when the same namespace is declared on multiple nested elements. A smart minifier could hoist them to the root — but "smart" is exactly where bugs creep in. Namespace handling interacts with processing pipelines, XSLT transforms, and signature canonicalization in ways a simple tool shouldn't mess with.
Our XML Minifier leaves namespaces alone. The SVG Minifier is more aggressive because SVG has well-defined conventions (for instance, the sodipodi namespace is only used by Inkscape), but general-purpose XML is treated conservatively.
Real-World Wins
RSS / Atom feeds
A feed served to thousands of subscribers per day adds up. Minification shrinks the XML you send over the wire; combined with gzip you save a lot. Bonus: many feed validators complain less about whitespace than a pretty-printed feed with inconsistent indentation.
Sitemaps
Google caps sitemap.xml at 50MB or 50,000 URLs per file. Minification lets you pack more URLs before hitting that cap. If you have 100k URLs and generate sitemaps with one URL per line (pretty), you need two sitemap files; minified you can fit them all in one.
SOAP
SOAP envelopes are notoriously verbose. Minifying at the transport layer (many SOAP clients do this automatically) saves real bandwidth on APIs with thousands of requests per minute. The XML has been signed? Minify first, then sign.
Android manifests / Maven POMs
These are committed to version control, not shipped over the wire. Minifying them is bad — you lose diff readability. Leave them pretty.
Pretty-Printing for Debugging
When you receive a minified XML response that looks like one long line of chevrons, paste it into our XML Minifier and hit Beautify. You'll get one element per line with consistent indentation, which makes it tractable to read. Comments that were stripped during the original minification aren't recoverable, but the structure comes back just fine.
Gzip Still Reduces It More
Gzip absolutely dominates XML compression. A 1MB XML file often compresses to 80KB — a 12x reduction. Minifying before gzip saves maybe 5–10% on the final compressed size. Not much, but not nothing either, and the uncompressed parse time benefits remain.
If your application uses DOMParser or SAX on a multi-megabyte XML blob, the parser's work is proportional to input bytes. Minification gives you fewer bytes to parse, measurably faster parse time on low-end hardware. It's the uncompressed-size argument all over again.
Tools and Libraries
- Node:
xml-minifier,minify-xml— AST-aware options. - Python:
lxmlwithremove_blank_text=Trueon the parser gives minified-ish output. - Java: Full XML processors like Xalan or Xerces have pretty/minify options on their Transformer output.
- XMLStarlet: Command-line
xmlstarlet fo -o -c doc.xmlcompacts XML safely. - Our browser tool: The XML Minifier for one-offs, no install needed.
Bottom Line
XML minification is more dangerous than JSON and safer than HTML, sitting in the middle of the risk spectrum. Strip comments, collapse whitespace between tags, but respect CDATA and xml:space="preserve", and stay away from anything that's been signed. The compressed savings are modest; the uncompressed savings matter for parser work and are real. Use our XML Minifier when you need a quick one-off compaction that won't break on the awkward parts.