"Content is not allowed in prolog" when parsing perfectly valid XML on GAE

JavaXmlGoogle App-EngineParsingStax

Java Problem Overview


I've been beating my head against this absolutely infuriating bug for the last 48 hours, so I thought I'd finally throw in the towel and try asking here before I throw my laptop out the window.

I'm trying to parse the response XML from a call I made to AWS SimpleDB. The response is coming back on the wire just fine; for example, it may look like:

<?xml version="1.0" encoding="utf-8"?> 
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/">
    <ListDomainsResult>
        <DomainName>Audio</DomainName>
        <DomainName>Course</DomainName>
        <DomainName>DocumentContents</DomainName>
        <DomainName>LectureSet</DomainName>
        <DomainName>MetaData</DomainName>
        <DomainName>Professors</DomainName>
        <DomainName>Tag</DomainName>
    </ListDomainsResult>
    <ResponseMetadata>
        <RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId>
        <BoxUsage>0.0000071759</BoxUsage>
    </ResponseMetadata>
</ListDomainsResponse>

I pass in this XML to a parser with

XMLEventReader eventReader = xmlInputFactory.createXMLEventReader(response.getContent());

and call eventReader.nextEvent(); a bunch of times to get the data I want.

Here's the bizarre part -- it works great inside the local server. The response comes in, I parse it, everyone's happy. The problem is that when I deploy the code to Google App Engine, the outgoing request still works, and the response XML seems 100% identical and correct to me, but the response fails to parse with the following exception:

com.amazonaws.http.HttpClient handleResponse: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.): <?xml version="1.0" encoding="utf-8"?> 
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><ListDomainsResult><DomainName>Audio</DomainName><DomainName>Course</DomainName><DomainName>DocumentContents</DomainName><DomainName>LectureSet</DomainName><DomainName>MetaData</DomainName><DomainName>Professors</DomainName><DomainName>Tag</DomainName></ListDomainsResult><ResponseMetadata><RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId><BoxUsage>0.0000071759</BoxUsage></ResponseMetadata></ListDomainsResponse>
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)
    at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source)
    at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:153)
    ... (rest of lines omitted)

I have double, triple, quadruple checked this XML for 'invisible characters' or non-UTF8 encoded characters, etc. I looked at it byte-by-byte in an array for byte-order-marks or something of that nature. Nothing; it passes every validation test I could throw at it. Even stranger, it happens if I use a Saxon-based parser as well -- but ONLY on GAE, it always works fine in my local environment.

It makes it very hard to trace the code for problems when I can only run the debugger on an environment that works perfectly (I haven't found any good way to remotely debug on GAE). Nevertheless, using the primitive means I have, I've tried a million approaches including:

  • XML with and without the prolog
  • With and without newlines
  • With and without the "encoding=" attribute in the prolog
  • Both newline styles
  • With and without the chunking information present in the HTTP stream

And I've tried most of these in multiple combinations where it made sense they would interact -- nothing! I'm at my wit's end. Has anyone seen an issue like this before that can hopefully shed some light on it?

Thanks!

Java Solutions


Solution 1 - Java

The encoding in your XML and XSD (or DTD) are different.
XML file header: <?xml version='1.0' encoding='utf-8'?>
XSD file header: <?xml version='1.0' encoding='utf-16'?>

Another possible scenario that causes this is when anything comes before the XML document type declaration. i.e you might have something like this in the buffer:

helloworld<?xml version="1.0" encoding="utf-8"?>  

or even a space or special character.

There are some special characters called byte order markers that could be in the buffer. Before passing the buffer to the Parser do this...

String xml = "<?xml ...";
xml = xml.trim().replaceFirst("^([\\W]+)<","<");

Solution 2 - Java

I had issue while inspecting the xml file in notepad++ and saving the file, though I had the top utf-8 xml tag as <?xml version="1.0" encoding="utf-8"?>

Got fixed by saving the file in notpad++ with Encoding(Tab) > Encode in UTF-8:selected (was Encode in UTF-8-BOM)

Solution 3 - Java

This error message is always caused by the invalid XML content in the beginning element. For example, extra small dot “.” in the beginning of XML element.

Any characters before the “<?xml….” will cause above “org.xml.sax.SAXParseException: Content is not allowed in prolog” error message.

A small dot “.” before the “<?xml….

To fix it, just delete all those weird characters before the “<?xml“.

Ref: http://www.mkyong.com/java/sax-error-content-is-not-allowed-in-prolog/

Solution 4 - Java

I was facing the same issue. In my case XML files were generated from c# program and feeded into AS400 for further processing. After some analysis identified that I was using UTF8 encoding while generating XML files whereas javac(in AS400) uses "UTF8 without BOM". So, had to write extra code similar to mentioned below:

//create encoding with no BOM
Encoding outputEnc = new UTF8Encoding(false); 
//open file with encoding
TextWriter file = new StreamWriter(filePath, false, outputEnc);           

file.Write(doc.InnerXml);
file.Flush();
file.Close(); // save and close it

Solution 5 - Java

I catched the same error message today. The solution was to change the document from UTF-8 with BOM to UTF-8 without BOM

Solution 6 - Java

In my xml file, the header looked like this:

<?xml version="1.0" encoding="utf-16"? />

In a test file, I was reading the file bytes and decoding the data as UTF-8 (not realizing the header in this file was utf-16) to create a string.

byte[] data = Files.readAllBytes(Paths.get(path));
String dataString = new String(data, "UTF-8");

When I tried to deserialize this string into an object, I was seeing the same error:

javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.

When I updated the second line to

String dataString = new String(data, "UTF-16");

I was able to deserialize the object just fine. So as Romain had noted above, the encodings need to match.

Solution 7 - Java

Removing the xml declaration solved it

<?xml version='1.0' encoding='utf-8'?>

Solution 8 - Java

I was facing the same problem called "Content is not allowed in prolog" in my xml file.

Solution

Initially my root folder was '#Filename'.

When i removed the first character '#' ,the error got resolved.

No need of removing the #filename... Try in this way..

Instead of passing a File or URL object to the unmarshaller method, use a FileInputStream.

File myFile = new File("........");
Object obj = unmarshaller.unmarshal(new FileInputStream(myFile));

Solution 9 - Java

In the spirit of "just delete all those weird characters before the <?xml", here's my Java code, which works well with input via a BufferedReader:

	BufferedReader test = new BufferedReader(new InputStreamReader(fisTest));
	test.mark(4);
	while (true) {
		int earlyChar = test.read();
		System.out.println(earlyChar);
		if (earlyChar == 60) {
			test.reset();
			break;
		} else {
			test.mark(4);
		}
	}

FWIW, the bytes I was seeing are (in decimal): 239, 187, 191.

Solution 10 - Java

Unexpected reason: # character in file path

Due to some internal bug, the error Content is not allowed in prolog also appears if the file content itself is 100% correct but you are supplying the file name like C:\Data\#22\file.xml.

This may possibly apply to other special characters, too.

How to check: If you move your file into a path without special characters and the error disappears, then it was this issue.

Solution 11 - Java

I had a tab character instead of spaces. Replacing the tab '\t' fixed the problem.

Cut and paste the whole doc into an editor like Notepad++ and display all characters.

Solution 12 - Java

In my instance of the problem, the solution was to replace german umlauts (äöü) with their HTML-equivalents...

Solution 13 - Java

bellow are cause above “org.xml.sax.SAXParseException: Content is not allowed in prolog” exception.

  1. First check the file path of schema.xsd and file.xml.
  2. The encoding in your XML and XSD (or DTD) should be same.
    XML file header: <?xml version='1.0' encoding='utf-8'?>
    XSD file header: <?xml version='1.0' encoding='utf-8'?>
  3. if anything comes before the XML document type declaration.i.e: hello<?xml version='1.0' encoding='utf-16'?>

Solution 14 - Java

I zipped the xml in a Mac OS and sent it to a Windows machine, the default compression changes these files so the encoding sent this message.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAdrian PetrescuView Question on Stackoverflow
Solution 1 - JavaRomain HippeauView Answer on Stackoverflow
Solution 2 - Javatechloris_109View Answer on Stackoverflow
Solution 3 - JavaSunmit GirmeView Answer on Stackoverflow
Solution 4 - JavaSaturn CAUView Answer on Stackoverflow
Solution 5 - JavamatjungView Answer on Stackoverflow
Solution 6 - JavadfritchView Answer on Stackoverflow
Solution 7 - JavaF.O.OView Answer on Stackoverflow
Solution 8 - JavaRavi Kiran GururajaView Answer on Stackoverflow
Solution 9 - JavaTamiasView Answer on Stackoverflow
Solution 10 - JavamiroxlavView Answer on Stackoverflow
Solution 11 - JavaSoloPilotView Answer on Stackoverflow
Solution 12 - JavaMBaasView Answer on Stackoverflow
Solution 13 - JavaAvinash DubeyView Answer on Stackoverflow
Solution 14 - JavahtafoyaView Answer on Stackoverflow