Which characters are Invalid (unless encoded) in an XML attribute?

XmlXml Serialization

Xml Problem Overview


I can't believe I can't find this information easily accessible, so:

  1. Which characters cannot be incorporated in an XML attribute without entity-encoding them?

Obviously, you need to encode quotes. What about < and >? What else?

  1. Where exactly is the official list?

Xml Solutions


Solution 1 - Xml

Here is the definition of what is allowed in an attribute value.

'"' ([^<&"] | Reference)* '"'  |  "'" ([^<&'] | Reference)* "'" 

So, you can't have:

  • the same character that opens/closes the attribute value (either ' or ")
  • a naked ampersand (& must be &amp;)
  • a left angle bracket (< must be &lt;)

You should also not being using any characters that are outright not legal anywhere in an XML document (such as form feeds, etc).

Solution 2 - Xml

As per the (2) current recommendation, specifically regarding character data and Markup, they are (1) the ampersand (&), left angle bracket (<), right angle bracket (>) and both single-quote (') and double-quote (").

Solution 3 - Xml

See 2.2 Characters in "Extensible Markup Language (XML) 1.0 (Third Edition)".

Note that, at least with .NET, if you are using the XML APIs to work with XML, then you won't have to worry about this. It's the reason not to treat XML as being text.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionEuro MicelliView Question on Stackoverflow
Solution 1 - Xmlgreat_llamaView Answer on Stackoverflow
Solution 2 - XmlcodeheadView Answer on Stackoverflow
Solution 3 - XmlJohn SaundersView Answer on Stackoverflow