LibXML - ParseDoc fails

BlitzMax Forums/Brucey's Modules/LibXML - ParseDoc fails

Gabriel(Posted 2008) [#1]
I'm loading my XML files from memory, so I put them into a string and then called TXMLDoc.ParseDoc() to parse them.

The debug log shows something like this :

Entity: line 1: parser error : Start tag expected, '<' not found
<?xml version="1.0" encoding="utf-8"?>

I don't know where the garbage is coming from. It's not there in my string, when I send the string to the debuglog. Is LibXML expecting a different string format or something? Or is there a trick to this?


plash(Posted 2008) [#2]
Yap. I had this problem a while back.. You have to comment out the calls that free the string passed to ParseDoc()

Brucey, you haven't fixed this yet?


Gabriel(Posted 2008) [#3]
If you reported this a while back, it's probably my fault. I haven't updated for quite a while. I'll check for an updated version on his site. Thanks for the tip.

EDIT: Nope, There's a change to TXMLReader to do with string cleanup, but I don't think that gets called from ParseDoc.

I think I see your suggested fix though. He frees the string before calling txmldoc._Create(). I think I need to store that, free the string, and then return the doc.


Gabriel(Posted 2008) [#4]
Ok, well I've made the fix, but I'm still getting the same error, so I guess the fix wasn't necessary. I went and updated to 1.14 from the SVN just in case, but that didn't fix anything either.

I guess it must be a problem with _xmlConvertMaxToUTF8(text).toCString() because that's what prepares the string for parsing.


Brucey(Posted 2008) [#5]
Where did your xml come from? Is it possible you have a BOM in the first few bytes?

Those bytes (were it a BOM) would be : EF BB BF


Brucey(Posted 2008) [#6]
Could be an issue related to my conversion of the string from Max 16-bit format to UTF-8, which mangles the BOM.


Brucey(Posted 2008) [#7]
Snap :-p
SuperStrict

Framework bah.libxml
Import brl.standardio

Const BOM_UTF8:String = Chr(239) + Chr(187) + Chr(191)

Local s:String = "<?xml version=~q1.0~q?><root/>"

Local doc:TxmlDoc = TxmlDoc.ParseDoc(BOM_UTF8 + s)

...
Entity: line 1: parser error : Start tag expected, '<' not found
<?xml version="1.0"?><root/>



Brucey(Posted 2008) [#8]
Fix committed (rev 436)

Apologies for the delay in providing a fix...

btw, TxmlDoc.parseFile() supports "Incbin::....." filenames, if you want to skip the loading into a string first. This should also handle the BOM bytes, as libxml tests for those already.

The issue above was because of the byte-conversion I'm performing on the string to get it into UTF-8.


Gabriel(Posted 2008) [#9]
Thanks for that, Brucey. I'm actually reading the XML from an encrypted archive, so I stream it into a string, and parse it from there.