Unhandled Exception:Unicode character out of UCS-2

BlitzMax Forums/BlitzMax Programming/Unhandled Exception:Unicode character out of UCS-2

Murilo(Posted 2010) [#1]
I'm writing an app that permits the dragging and dropping of files/folder. Everything appeared to be working nicely, when I decided to drag the "developer" folder on to my app (to test recursion in large folders). It resulted in the following error:

Unhandled Exception:Unicode character out of UCS-2 range

Surely BlitzMax should be able to handle all OS X filenames? Am I missing something?

Thanks


Brucey(Posted 2010) [#2]
Unhandled Exception:Unicode character out of UCS-2 range

Oooh.. I wondered who'd be the first to see that :-p

But I'm surprised there are unicode chars above 65k...


Murilo(Posted 2010) [#3]
It looks like it's an issue with a Japanese character:

"H.264(Blu-ray 用).setting"

Is there nothing I can do?


Brucey(Posted 2010) [#4]
Speak to Mark really nicely, perhaps? :-/

The problem lies with BlitzMax's newer string handling code which better supports UTF8.
Specifically, Mark added a cutoff so that each real "character" would be able to fit only into a single BlitzMax character. A BlitzMax character is UTF16 or UCS-2, depending how you look at it.
True UTF16 would allow for multiple UTF16 entries to represent a single real character.
That means, that a character with a code like 65289 should in reality be stored in TWO BlitzMax UTF16 characters.
As you might imagine, that kind of throws things like LENGTH checks up in the air, because you'd need some UTF16 savvy code to understand that 5 characters might actually be only 4 real characters.

BlitzMax uses UTF16/UCS-2 because Windows does by default.

Ideally, one might prefer UTF32. That would certainly avoid this particular issue.

As for a "workaround"... the code you'd be interested in, is bbStringFromUTF8String() within blitz_string.c / brl.blitz.


marksibly(Posted 2010) [#5]
Hmm...

But 65289 *should* fit into 16 bits.

> "H.264(Blu-ray 用).setting"

Is that HTML encoding?

Does the 用) bit 'mean' a single character or something? From here, it looks like 2 characters that should fit OK into 16 bit UCS-2.

Could be a bug in StringFromUTF8 I guess - wouldn't be the first!


Brucey(Posted 2010) [#6]
Oh... apologies... This is related to this bug for 1.37 :

Using the patch, this works :
SuperStrict

Local s:String = Chr(65289)

Print Asc(s)

Local s2:String = String.FromCString(s.ToUTF8String())

Print s2.length

Local s3:String = String.FromUTF8String(s2)

Print Asc(s3)



Murilo(Posted 2010) [#7]
Thanks guys.

@Mark: Yeah - That's HTML encoding. When I pasted the filename into the text area, I could see the Japanese character.

I'm using 1.37 - I'll give the patch a whirl now...


Murilo(Posted 2010) [#8]
That patch worked perfectly! Cheers


Imphenzia(Posted 2010) [#9]
Since upgrading to 1.38 (from 1.36) my Raknet implementation also crashes out with the "Unhandled Exception:Unicode character out of UCS-2" being thrown, presumably in my line:

Local value:String = String.FromUTF8String(Self.rPacket.GetData())

I'll haven't found the issue yet but I'll keep troubleshooting it.

Update: Found it - I padded my string with CHR(255) which was not appreciated :) Now I just have to figure out why the first Raknet ID byte is messing up my string, I assume it's dropping out of alignment somehow...


degac(Posted 2010) [#10]
Hi, I've found a new problem now!
And this time is for MaxIDE itself! It doesn't start anymore!

This morning I was testing some basic code (no string-UFT related) and MaxIDE works fine.
Now a double click on the icon is useless: I opened a terminal and I get the following error...
degac@degac-VirtualBox:~/Scrivania/BlitzMax$ ./MaxIDE
Unicode character out of UCS-2 range


Any hints?
I'm using Ubuntu 10.10 on a VirtualBox and I dont' know really what happened!
Some 'silent' upgrading in Ubuntu?


degac(Posted 2010) [#11]
I just re-downloaded BlitzMax 1.41 from the site, extracted in another folder, copied the MOD folder and now it works...
Linux mistery...