wxScintilla with utf8 on windows and linux?

BlitzMax Forums/Brucey's Modules/wxScintilla with utf8 on windows and linux?

UNZ(Posted 2013) [#1]
Hi,

for cross platform reasons I would like scintilla to use the utf8 encoding on all platforms.
But it seems that although utf8 is used on linux, scintilla is using ansi encoding on windows. I was trying to find a solution but GetCodePage() always returns wxSCI_CP_UTF8. If one tries to use GetCodePage() with another parameter than wxSCI_CP_UTF8 an assertion failed.

Is this a bug?
What can I do to change the encoding to utf8 on windows?

example:


Save a file "test_in.txt" in utf8 encoding with your fav editor.
The file should contain some special characters of course:
ä ö ß ê

If you run the program and enter the characters again the output is ansi and messed up.

BTW:
Me and some others would prefer rem...endrem blocks foldable in the blitzmax lexer. Brucey Could you please change CheckBMFoldPoint() in LexMax.cxx? to



Pete Rigz(Posted 2013) [#2]
I just tested this on windows 7 and it seems ok to me? The test_in which I made with notepdad containing those accented characters, was saved exactly the same on the test_out.txt.


Derron(Posted 2013) [#3]
like said... "ÄÜÖß" are available with normal "encoding" so if you create your file in windows - and load/save it there, you wont have problems.

I had problems with files created in linux (with umlauts) as they got garbage during editing in windows.

bye
Ron


Pete Rigz(Posted 2013) [#4]
Ahh I see what you mean, I just created the file on Mac, and indeed it is garbled up on windows :)


Derron(Posted 2013) [#5]
Like said... think windows creates "ansi" or "iso-9xxx-yy" while the others create utf8.

If you now check what happens if you write with "BOM" (byte order mark) you may get crazy :p.


bye
Ron


Pete Rigz(Posted 2013) [#6]
Well there're string conversion functions in wxglue.cpp in the wx.mod/wx.mod folder:

// Converts a wxString to a BBString
BBString *bbStringFromWxString(const wxString &s ) {
#ifdef WIN32
	return bbStringFromShorts((const BBChar*)s.wc_str(wxConvISO8859_1), s.Length());
#else
#ifdef wxUSE_UNICODE_UTF8
	return bbStringFromInts((const int*)s.wc_str(wxConvISO8859_1), s.Length());
#else
	return bbStringFromInts((const int*)s.wc_str(wxConvISO8859_1).data(), s.Length());
#endif
#endif
}

// Converts a BBString to a wxString
wxString wxStringFromBBString(BBString * s) {
	return wxString( (char*)s->buf, wxMBConvUTF16(), s->length * 2 );
}


wxStringFromBBString is the function that the scintilla load and save methods call, so there's maybe some kind of hack that will fix it... or maybe not ;)


UNZ(Posted 2013) [#7]
Oh my gosh!
BlitzMax has functions String.FromUTF8String() and String.ToUTF8String() ?!
Seriously, why is this not documented! o.O
It's not even on the wiki: http://en.wikibooks.org/wiki/BlitzMax/Language/Strings

Anyway, loading utf8 text is less of a problem now.
The thing is how do I know which encoding is used for a file?
When should I use LoadText() and when String.FromUTF8String(LoadText() ) ?


UNZ(Posted 2013) [#8]
Ok,

A simple solution is to use the blitzmax functions LoadText() and SaveText() which take care of correct encoding in most cases.
So you do scintilla.SetText(LoadText(path) ) for loading and SaveText(scintilla.GetText(), path) for saving.

But because LoadText() may load a wrong encoding (eg. UTF8 without BOM is interpreted as LATIN1) it is advisable to have a possibility for the user to load a file in a directly specified encoding.

You can do this for example with a modified version of LoadText:


bye


Derron(Posted 2013) [#9]
Think there are some postings in the forum about loadtext and utf8 in blitzmax in general.

Glad you found a working solution for your program.


bye
Ron


Brucey(Posted 2013) [#10]
BlitzMax has functions String.FromUTF8String() and String.ToUTF8String()

And it took a while to convince Mark they would be useful too :-p