XML Escape Checker
How it works
XML has five predefined character entities that must be escaped in text content and attribute values to avoid being interpreted as markup: & → &, < → <, > → > (optional in text, required in CDATA-free contexts), " → " (in attribute values), ' → ' (in attribute values with single-quote delimiters). Unescaped special characters cause XML parsing failures.
**Why XML escaping matters** An unescaped ampersand (&) in text content causes a "malformed XML" parse error — & must always be written as &. An unescaped < in text content is treated as the start of a tag. In attribute values, the enclosing quote character must be escaped: <elem attr="It's a "test""/> fails — either escape the inner quotes or switch delimiter: <elem attr='It's a "test"'/>.
**CDATA sections** CDATA (Character Data) sections allow embedding raw text without escaping: <description><![CDATA[Use <b>bold</b> & "quotes" freely]]></description>. CDATA content cannot contain the sequence ]]> (end marker). CDATA is often preferred over escaping for embedding HTML fragments or code samples in XML.
**XML vs HTML escaping** HTML5 parsers are error-tolerant — unescaped & in HTML is often parsed without error. XML parsers are strict — any well-formedness violation causes a hard parse failure. When generating XML programmatically, always use an XML library's serialization (rather than string concatenation) to guarantee correct escaping. Common bugs arise from inserting user input directly into XML strings.
Frequently Asked Questions
- & → & (must always be escaped in text and attribute values — & starts entity references). < → < (must be escaped in text content and attribute values — starts a tag). > → > (must be escaped in attribute values; optional in text content per spec but safest to escape). " → " (must be escaped in double-quoted attribute values). ' → ' (must be escaped in single-quoted attribute values). Rule of thumb: always escape & and < everywhere; escape > always; escape " in attributes delimited by double quotes; escape ' in attributes delimited by single quotes.
- CDATA (Character Data) sections allow embedding raw text without XML escaping: <description><![CDATA[Use <b>bold</b> & 'quotes' freely]]></description>. The only sequence forbidden inside CDATA is ']]>' (the CDATA end marker). Use CDATA for: embedding HTML fragments in XML, code samples containing < > & characters, JavaScript embedded in XHTML. Avoid CDATA for: values that might contain ']]>' (split the CDATA section or escape it), simple values (just escaping & < is cleaner). CDATA is valid XML; parsers convert it to text equivalent to the properly escaped version.
- Never build XML by string concatenation with user input — this is XML injection. Concatenating user input: '<name>' + userInput + '</name>' fails if userInput contains '<', '>', or '&'. Safe approach: use an XML library's serialization. Python: xml.etree.ElementTree — elem.text = user_input (auto-escapes). Java: DocumentBuilder/Transformer or JAXB — auto-escape. JavaScript: DOMParser/XMLSerializer — use createTextNode() for text content. String concatenation is safe only for hardcoded static XML fragments with no user input.
- HTML5 parsers are error-tolerant: an unescaped & in HTML is often parsed without error (treated as a literal ampersand if not followed by a valid entity name). XML parsers are strict: any well-formedness violation (including unescaped &) is a hard parse failure — the parser stops with an error. HTML has additional named entities ( , ©, —, etc.) beyond the 5 XML predefined entities. XML only has the 5 predefined entities; all others must be declared in the DTD. Never assume HTML escaping rules apply to XML — always use proper XML serialization.