Double byte chars

BlitzPlus Forums/BlitzPlus Programming/Double byte chars

Jazzyhat

(Posted 2003) [#1]

I can not draw double byte chars (Chinese Big-5 code) on screen normally, is it Blitz's limitation ?

cyberseth

(Posted 2003) [#2]

Use a bitmap font instead. The text command is very slow in comparison.

mearrin69

(Posted 2003) [#3]

While I agree with this in general, cyberseth, it's going to be a real pain to make a double-byte interpreter (to bitmap font) for whatever language you're interested in.

For instance, I've written my own ASCII to bitmap interpreter for use in my apps. It simply uses an array to equate ASCII code to the appropriate location in an animstrip of letters (and I've got a version that includes kerning info, etc. too).

This works great for English and other roman-alphabet languages but it would get messier when you want to do extended character sets.

Has anyone done some kind of double-byte work like this in Blitz before? I'd be interested to see what you've come up with.
M

Jazzyhat

(Posted 2003) [#4]

Thanks !

But you may not know, chinese char set contains atleast 13400 chars....

But bitmap font seems to be the only way !

Jazzyhat

mearrin69

(Posted 2003) [#5]

Jazzyhat,
Ever hear of STC? Standard Telegraphic Code. It puts characters in a grid referenced by four numbers. First two are 'page' and the second two are row and column.

I'd maybe try to use that as an easy (hah) way to reference the character set. It's not the complete set of characters but it's enough to do most things and you could replace some of the entries with your own if you wanted. I think there are some blank cells.

You might be able to get it at a Chinese bookstore...I didn't buy mine I think it's something I got in the Army.
M

Jazzyhat

(Posted 2003) [#6]

Thanks mearrin69 !! Your suggestion that I have ever think about but that may be possible !

Jazzyhat

Oldefoxx

(Posted 2003) [#7]

If you want to do effective coding in this area, then you need to consider Unicode. Unicode has not really caught on yet, because it is extremely wasteful of memory, and there is only partial support in the Windows operating system for it at present. But it seems certain to gain support over the long haul, and familiarity with it may serve you a good turn sometime in the future.

The big problem with Unicode is the fact that some language needs are going to be huge, such as with Chinese and Japanese, and there is no way to establish parity between dissimular languages. Nobody has quite figured out how Unicode is going to work as a result. Is the operating system suppose to "see" a chinese character in terms of its Unicode two-byte value, then figure out what that should translate to in some other language? And if that is not the goal, then why bother with trying to create a uniform code in the first place? And how does the basic character value maintain its many individual representations if we do not recognize different fonts as well?

Still and all, if this is where your interest lies, then I believe checking out what Unicode is and where it is going would put you in the main stream instead of on the fringe.

mearrin69

(Posted 2003) [#8]

I don't really know anything about Unicode but I do know Chinese - I think that using two bytes to represent the characters is all about getting enough unique ids to represent a useable number of characters.

There's something like 40k characters in the chinese language...and you need to know about 10% of those for even a reasonable level of communications. You'll add a bunch more when you start talking about a particular topic - such as technology, warfare, plumbing, etc.

As far as any kind of parity for exchanging info between languages - we can't even do that well between English/French/Spanish/German. I wish it were possible with Chinese and the other simpler (alphabet wise...I know, I know. Japanese has more complicated grammar than Chinese) Asian languages, but I doubt it'll be doable for some time.

I keep thinking about trying to do something in Chinese - hey, a whole bunch of people speak it after all - but whenever I do I just think of the technical junk to work out and shudder.
M

cbmeeks

(Posted 2003) [#9]

lol..that may be a challenge for Chinese.

Assuming you can fit any Chinese char into a 20x20 pixel image:

1 char = 400 bits * 16 bits (16-bit mode) = 6400 bits = 800 bytes.

13400 x 800 bytes = 10720000 bytes = 10468.75 megabytes = 10.223388671875 GIGABYTES.

Assuming you only need 10% of the 13400 characters, then you are looking at a GIGABYTE for the BMP images.

Good luck!

cb

John Pickford

(Posted 2003) [#10]

20 x 20?

Who uses sizes like that?

16 x 16 or 32 x 32

Your sums are wrong:

20x20 = 400 bits = 50 Bytes

13400 x 50 = 670000 Bytes or 655K

Not a lot.

That's for a one bit font. Probably sufficient in 3D as you can multiply it with any colour or use it to mask a texture. 16bit would be 10480k or 10 Megs.

10% would be 1 Megabyte of course.

WolRon

(Posted 2003) [#11]

13400 x 800 bytes = 10720000 bytes = 10468.75 megabytes = 10.223388671875 GIGABYTES

Sorry but it's 10468 KILOBytes and 10.2 MEGABytes

cbmeeks

(Posted 2003) [#12]

You are correct..duh..that is KILObytes.

However, using 16-bit graphics, my original calculations are correct.

20x20 pixels = 400 bits * 16 planes = 6400 BITS. 6400 / 8 = 800 bytes.

13400 characters at 800 bytes (once again, 16 bit mode) = 10720000 BYTES

10720000 Bytes = 10468.75 KILOBYTES
10468.75 Kilobytes = 10.223388671875 MEGABYTES

10% would roughly be 1 MEGABYTE. That is acceptable.

Sorry...but I did that at work really fast...I'm really not that bad with numbers!

cb

Oldefoxx

(Posted 2003) [#13]

Actually, you probably don't have to worry about how much memory is going to be requred for holding bit fonts. The future of memory growth will probably be expodential through a number of technical innovations that are surfacing.

The problem is going to be referencing them. How in the world are we to understand a system where the indexing has to relate to the imagry of a given symbol? I believe that bit maps won't suceed here.

And instead of a 20x20 grid, and bitmaps, let's consider a system that responds to font and size issues. I guess you are looking at something like a TrueType object. It descrives the general form of the object, one that can be manipulated to produce the equivalent of Bold, Italics, Block form, Cursive, etc. fonts, and one that can bet stretched to be of any desired sise, But it might need more than that, say the ability to return a meaning that is consistent with the inquiry - a german word or phrase if You query in german, and english word or phrase if in english, a spoken word of phrase on request,

Now you are talking about something that will have a HUGE demand on memory, processing power, and real-time adaptation. Perhaps there will be universal language storage banks that house this stuff, and you can go online to download any language subset that you want.

I think that the limitations of a bit-map approach, as fundamental as it seems at first, become readily apparent when you really get into the project. I think the future will belong the the person who looks beyond what he/she can do most easily and says instead: "Now what we REALLY need is ..." and goes from there.