My Korean XML cannot be parsed.
Community Forums/General Help/My Korean XML cannot be parsed.
| ||
I have this structure of my level files:<Hotspot> <PosX>358</PosX> <PosY>306</PosY> <Width>121</Width> <Height>242</Height> <Text>L ڮչ֩; ҨӾ յѸС ȊࠤȘࠤ.</Text> <Sound>null</Sound> <Type>1</Type> <LogicID>0</LogicID> <MoveTo>null</MoveTo> <MoveToMinigame>null</MoveToMinigame> <NavIcon>null</NavIcon> </Hotspot> Obviously the only special characters are present in the <Text> hashes (even if here you guys see #&), but using the bah.xml module I receive the: "Document not parsed successfully." error message. I assume it's about encoding or somehting like that but I do not have experience with things like that.. |
| ||
Try putting this at the top:<?xml version=“1.0” encoding=“utf-8”?> [edit] By "at the top", I mean right at the very top - the first line - of your XML file, before anything else. |
| ||
double post. |
| ||
used the second one with the ? at the end.. still the same error. |
| ||
Hmm... you might try utf-16. I don't think I've ever done Korean, but I've done Japanese and I'm sure I used utf-8. :/ |
| ||
not a chance: levels/level0.xml:5: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xB0 0xD4 0xC0 0xCC <Name>°ÔŔĚĆ®</Name> ^ Name is the first tag with special characters in my file. I also tried utf 16 but the issue perssist and in debug console i receive information like it is utf 8 |
| ||
Bit stumped then! If nobody's solved it by tomorrow I'll have a look what I did for my Japanese stuff. |
| ||
Appreciate this. I really need the help to solve this one. Thanks |
| ||
What does not work exactly? I just took the content of <text>...</text>, placed it in my database.xml (replacing some other titles) - even the console output was nearly correct - only the last kept being unidentified characters). And this is what you are doing wrong: libxml says that you did not encode properly... and hence you did not encode properly: all non-ascii-characters must be encoded (this is done with &#CODENUMBER;) Do not forget to encode the "&" if used as normal character ("derron is dumb & lazy" must be converted to "derron is dumb & lazy"). bye Ron |
| ||
Hi Derron, the actual xml looks like this: not linke in the text u see in the forums. when I am trying to load the xml using the default "parseDoc" method I receive: levels/level0.xml:5: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xB0 0xD4 0xC0 0xCC <Name>°ÔŔĚĆ®</Name> |
| ||
wait, you say I must encode every special character? how should I do that in Korean? |
| ||
Are you sure the XML document is saved as UTF-8 and ALSO that the BOM is stored in the text XML file. http://en.wikipedia.org/wiki/Byte_order_mark |
| ||
I see in Notepad++ the document is "Encode in UTF-8". About the BOM.. cant understand exactly what is all about. Where should those characters be present? :| |
| ||
guys, just made a quick test. i pasted a text from google translate: 환영 in my xml this text is shown in actual Korean characters. the translation i received is still like in my images.. anyway using the xml with the characters from the G translate works fine.. |
| ||
And it's me again. I solve the issue. Steps: 1. Open the XML from my partners 2. Change from Notepad++ the character set to Korean (in this moment all those black characters are gone and we can see nice Korean symbols) 3. From Notepad++ choose: Encode to UTF-8 4. Save and it's working... |
| ||
That is one of the things concerning UTF8 and BlitzMax. By default the MaxIDE saves non-ISO-8859-1-pages as UTF16-LE. For "normal characters" (westeuropean, most non-asian) UTF8 is enough - but BlitzMax needs the BOM-indicator to load it correctly. But if you eg. use signs like "äöü" (the German umlauts) you may run into problems as they could be find in the ascii tables ... so it gets converted incorrect and you may run into garbaged text. To get rid of that trouble (at least in libxml) that "html entities encoding" should work (&#code;). Keep in mind that special chars still must be escaped properly: & & ' " < < > > That is why "escaping" when outputting the data could be used for creating valid xml files. bye Ron |
| ||
It's a problem when getting third-parties to do translations - if they are using a native OS in Windows, then they are more likely to be typing in their local codepage, rather than in Unicode (UTF8/16/etc). So as you've found, you need to assume the file you receive is in the local codepage and convert it as appropriate (into say, UTF-8). On other platforms - which properly support native unicode (that would be Linux and OS X), you shouldn't have this problem. |
| ||
@Derron: well I cannot make @# code for every character I need in my game... will take years :)) @Brucey: I will convert this way cause is the only solution now. Thanks all for supporting me! |
| ||
You still have to convert all & ' < > ... else you run into trouble as they are reserved XML characters. bye Ron |
| ||
The other alternative is to use a "standard" localisation format (like .po), of which there are editors available. |
| ||
That "standard localisation format" is useful for GUIs but imagine a database containing some thousand of localized entries... I doubt using .po and there editors are helpful then. It depends on the source of the textual data. bye Ron |