BlitzMax And Unicode IO - Question.

BlitzMax Forums/BlitzMax Programming/BlitzMax And Unicode IO - Question.

Brucey(Posted 2009) [#1]
I'm just curious, but how many people are actually interested in Unicode IO support in BlitzMax, other than siread?

Do you think it doesn't matter at all, or is it like, a prerequisite for a programming language these days?


Just to clarify... by Unicode IO, I mean read and write access on a Non-ASCII filesystem, where filenames and folders have character range > 255 (or perhaps even 127).


rs22(Posted 2009) [#2]
I think it would be a very useful thing to have. In my opinion, all languages should have the IO Unicode support.


jkrankie(Posted 2009) [#3]
I'd like it, i've had a number of issues raised by customers running foreign language operating systems.

Cheers
Charlie


ziggy(Posted 2009) [#4]
This is a must-have. Absolutelly. The most spoken languages in the world are chinese, then english, then spanish. There are really few languages that can be covered without unicode, and modern OS give a lot of freedom to users to name their files as they wish. Only a english speaker could think this is something to discuss... ;)


Tachyon(Posted 2009) [#5]
Yep, unicode should be supported. I sell games to Europe and have had numerous complaints over foreign characters in usernames in relation to your volumes.mod. Also, with Book II I want to have unicode text files for all the game data so that fan made translations can be done.


plash(Posted 2009) [#6]
I'm all for it.

Hopefully we can get Mark to make the changes to Max's core official.. along with all the other stuff you've modified.


Gabriel(Posted 2009) [#7]
Since you asked, I'd have to say that I think it's a prerequisite for a programming language these days. In fact, those are pretty much the exact words I would have used.


siread(Posted 2009) [#8]
Very excited about this! I have a bunch of fans translating NSS4 and to have unicode support would be a God send. :)


xlsior(Posted 2009) [#9]
I'm just curious, but how many people are actually interested in Unicode IO support in BlitzMax, other than siread?


It's pretty much a necessity -- without it, your app would blow up as soon as a user with an extended character in his username would try to save something to his documents folder, even aside from the likely existence of other folder names with extended characters.


degac(Posted 2009) [#10]

I'm just curious, but how many people are actually interested in Unicode IO support in BlitzMax, other than siread?


I think everyone want to use BlitzMax as a 'pro' language (*). After multithreading (beta, but in developement) UNICODE is another great addition.


* = despite the original/planned idea of BRL/Mark I think BlitzMax can do other things than only 2d games...and BlitzMax has the potential.

PS: at this point I think we could think to 're-name' BlitzMax as Brucey-Max: with changes to BMK and many modules support we have a quite-complete (multi)platform developement language.


Jim Teeuwen(Posted 2009) [#11]
It should really be a standard in all modern programming platforms.
There are just too many situations where not having it can cause issues.

vote: +1


ImaginaryHuman(Posted 2009) [#12]
What is unicode? What's it for? Why would we want it? What are the issues of not using it?


Jim Teeuwen(Posted 2009) [#13]
Unicode means Multi-byte character encoding.
The English language can mostly be represented with normal ASCII characters.. meaning a byte value ranging from 0-255 for each seperate character.

However the vast majority of languages on this planet use special characters which are not part of the ASCII standard. The most obvious one's are the Asian languages like Chinese and Japanese. But also a lot of European languages have this.

In order for them to accurately store and represent their text, they need more than 1 byte per character. This is where Unicode comes in. It allows the allocation of multiple bytes for a single character.

It comes in various flavours. The most common being 'UTF-8'. Which is a variable-byte encoding scheme that use either 1 or 2 bytes for a character, depending on whether it needs it. UTF-16 is a fixed-byte unicode format, because it /always/ uses 2 bytes per character, regardless of whether it's needed or not. Then there is the rarely used UTF-32 whcih can use up to 4 bytes per character.

UTF-8 is the more common choice for encodings when saving text files (xml documents for instance), because it can limit the file size as opposed to UTF-16. For obvious reasons, as UTF-8 will use 1 byte where possible whereas utf-16 will always use 2 bytes.

BlitzMax uses Unicode encoding for all it's string data by default.
So when you declare a string in blitzmax, you basically declare an array of 2-byte characters. A single char in blitzmax is 2 bytes in size, not 1.

As for, why you would want it? Well, strictly speaking you don't. But if you don't you are automatically barring a /lot/ of people from using your applications properly. Many people will want your app translated to their own language, and you can bet that only a tiny bit of them will have a language that requires only ASCII. In this day and age, where the entire world has become so very very small, having proper unicode support really is not a luxury option anymore.

A practical example is a simple Highscore table, where the player gets to enter his/her name along with the score they got. People will want to use their established handles which are, more often than not, comprised of some funky and exotic characters which do not go over well in a strictly ASCII oriented environment.

Even if you do not translate your app itself, you will likely be having your app installed on a filesystem that has non-ascii characters in it's file/directory paths. If your app tries to load resource files from this system, without the ability to deal with these unicode paths, the app is going to fail miserably and you are going to get a lot of bad press from the users :)


Brucey(Posted 2009) [#14]
Actually, UTF-8 can use up to 4 characters (apparently. I thought it was 3, but wikipedia knows better) :-)

Both Mac and Linux support UTF-8 encoded file-systems.


(if it is 4 bytes, then my conversion code is borked.. ho hum)


Jim Teeuwen(Posted 2009) [#15]
Ah, well we learn something new every day :)


Brucey(Posted 2009) [#16]
A nice article by Joel on Unicode :-)


ImaginaryHuman(Posted 2009) [#17]
Thanks for the great explaination JimT!


Jim Teeuwen(Posted 2009) [#18]
That's an interesting read brucey. Thanks.


Brucey(Posted 2009) [#19]
After ImaginaryHuman's question about what Unicode is, I've been reading a *lot* more about it.

It appears that my UTF8 -> BBString (and back) conversion routines won't handle characters outside of the first "plane" - that is, anything greater than 65k.
I'm working on a fix - but as it's so unlikely anyone is going to be using characters from the other planes, I'm thinking you'd never realise it wasn't fully-compliant anyway. :-)