How do I remove the BOM character from my xml file

XmlXsltUnicodeByte Order-Mark

Xml Problem Overview


I am using xsl to control the output of my xml file, but the BOM character is being added.

Xml Solutions


Solution 1 - Xml

# vim file.xml
:set nobomb
:wq

Solution 2 - Xml

just need to add this in your xslt file:

<xsl:output method="text"
	    encoding="ASCII"/>

Solution 3 - Xml

Just strip first two bytes using any hex editor.

Solution 4 - Xml

Remove the BOM symbol from string with XSLT is pretty simple:

<xsl:value-of select="translate(StringWithBOM,'','')"/>

Solution 5 - Xml

I was under the impression that XML is encouraged to be written in Unicode, in some Unicode encoding, and that certain Unicode encodings are specified to contain an initial byte-order mark. Without that byte-order mark, your file is no longer correctly encoded in a Unicode encoding and therefore no longer correct XML. XML processors are encouraged to be unforgiving, to fail immediately on the slightest error (such as an incorrect Unicode encoding). What kinds of XML processors are you looking to break?

Obviously, stripping a byte-order mark from a UTF-8 encoded document makes that document appear to be ASCII encoded (not Unicode), and some text processors are capable only of using ASCII encoded documents. Is this what you're working with?

Solution 6 - Xml

What output encoding is your XSL set to use? What encoding is the input document? Where's the input coming from, and where was it saved/uploaded/dowloaded in the meantime?

XML and XSL should default to using UTF-8 if nothing else is specified. But clearly, something's going wrong here.

One thing which might happen is, the XML is being served up by a web server which is set by default to serve in ISO-8859-1, a pretty good default ... pre-Unicode.

Slightly off-topic, but Joel's very instructive article about text encodings was an eye-opener to me. There are a lot of people out there who are otherwise very smart about programming, but who persist in thinking there's such a thing as "plain text" or calling their text "ASCII" or "ANSI". It's an issue you really need to get to grips with if you haven't yet.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionraluxgazaView Question on Stackoverflow
Solution 1 - XmlBenedikt WaldvogelView Answer on Stackoverflow
Solution 2 - XmlkenView Answer on Stackoverflow
Solution 3 - XmlMarkoView Answer on Stackoverflow
Solution 4 - Xmldr_leevseyView Answer on Stackoverflow
Solution 5 - XmlyfeldblumView Answer on Stackoverflow
Solution 6 - XmlAmbroseChapelView Answer on Stackoverflow