libxml and german umlaute

BlitzMax Forums/Brucey's Modules/libxml and german umlaute

Ratchet(Posted 2009) [#1]
Hi,
I got some strange problems with reading german umlaute from xml files. It doesn't matter with encoding I use (german or utf8). Currently I use <?xml version="1.0" encoding="UTF8" ?> at top of all my files. Very strange is that it workes from one file but all other files don't work.
I'm reading the text with the getText Method from the TXmlNode class.

Here is a sample of the working file (Maybe you can't see the correct german umlaute depending on your system):
<?xml version="1.0" encoding="UTF8" ?>
<items>
	<item id="6">
		<name>Parfüm</name>
		<image>parfuem.png</image>
		<script>
			char.say("me","Warum schleppe ich dieses stinkende Zeug eigentlich immer noch mit mir rum?")
		</script>
	</item>
</items>



Sample from a file that doesn't work:
<?xml version="1.0" encoding="UTF8" ?>
<texts>
	<general>
		<text id="1">Many Umlaute: öäüÖÄÜ</text>
	</general>
</texts>


I'm using BM 1.32. Any ideas?
Maybe there is a way to use the HTML tags for umlaute? (&uuml;, &auml; ...)


Brucey(Posted 2009) [#2]
It depends if your file really is UTF-8 or not.

To begin with, you need to call it "UTF-8".
Then you need to ensure that those characters are stored in the file as UTF-8. That will depend on what editor you are using to save the file.

I'm not sure if you can use HTML encoding, but you can use Unicode encoding, which I think for Ö, is &#00D6;

BlitzMax internally uses 2-byte characters, and I think by default saves files as ISO-8859-1.
If you create Strings in BlitzMax and store them with libxml, they will be correct.
If you save a standard file with BlitzMax, it won't be UTF-8.

If your editor supports correct file encoding, it should not be a problem.


Ratchet(Posted 2009) [#3]
&#00D6; dont work.
My xml files are written with Textmate. It supports UTF-8 saving. It's the default anyway.


Brucey(Posted 2009) [#4]
Hmm... must be int value, not hex then... maybe &#214


Brucey(Posted 2009) [#5]
I'm not sure what your umlaut problem is, exactly ?
Can you explain exactly where you are having an issue with the text?

I pasted your example into BBEdit, corrected the encoding attribute, saved as UTF-8, and loaded it into this :
SuperStrict

Framework bah.libxml
Import brl.standardio
Import brl.system

Local doc:TxmlDoc = TxmlDoc.parseFile("test.xml")

Local root:TxmlNode = doc.getRootElement()

Local children:TList = root.getChildren()

For Local general:TxmlNode = EachIn children
	
	For Local text:TxmlNode = EachIn general.getChildren()
		
		Print text.getText()
		Notify text.getText()
		
	Next
		
Next

If you are outputting to the console (in the IDE), the Print will appear corrupt. No idea why.. I guess it has issues...
Notify converts the text properly (to UTF8) and so displays it fine.
MaxGUI should also be able to display text fine, as well as wxMax.

The graphics modes should be okay too...


Blitzbat(Posted 2009) [#6]
Try to wrap it into cdata tags. Maybe this will help you.


Ratchet(Posted 2009) [#7]
Now I'm using the following codes and it workes fine (&#<CODE>;)

Ä 196
Ö 214
Ü 220
ä 228
ö 246
ü 252
ß 223

Example

&#196 ;

Makes Ä (without the space before the ; )