How to Covert Unicode to UTF8?

BlitzMax Forums/BlitzMax Programming/How to Covert Unicode to UTF8?

Space Fractal(Posted 2008) [#1]
finally went back after some month break, due to some personal acttacks against me in one of my old threads, even I got a simple workaround a BLitzMax bug.....

I have this code to convert UTF8 back to unicode:

	Function Unicode$(UTF8$)
		Local RESULT$=""
		Local UTF$=""
		Local Length=0
		Local Last$=""
		For Local i=1 To Len(UTF8$)
			Local Char$=Mid$(UTF8$,i,1)
			Local B$=Right$(Bin$(Asc(Char$)),8)
			If Length>0
				UTF$=UTF$+Right$(B$,6)
				Length:-1
				If Left$(B$,2)<>"10"
					Length=0
					RESULT$=RESULT$+Last$
					Last$=""
				ElseIf Length=0
					RESULT$=RESULT$+Chr$(Bin2Int(UTF$))
				EndIf
			EndIf
			
			If Length=0
				If Left$(B$,1)="0"
					Result$=Result$+Char$; Length=0
				ElseIf Left$(B$,3)="110"
					Last$=CHAR$
					UTF$=Right$(B$,5)
					Length=1
				ElseIf Left$(B$,4)="1110"
					UTF$=Right$(B$,5)
					Length=2
				ElseIf Left$(B$,4)="11110"
					UTF$=Right$(B$,5)
					Length=3
				EndIf
			EndIf
		Next
		Return RESULT$
	EndFunction
	
	Function Bin2Int:Int(Binary$)
		Local result=0
		Local D=1
		For Local i=Len(Binary$) To 1 Step -1
			If Mid$(Binary$,i,1)="1" Then result=result+d
			D=D+D
		Next
		Return result
	EndFunction


I found some Visual Basic 6 (as I dosent have) native code, that can covert unicode 2 utf8, but Im are not sure how the \ 6 works in Visual Basic 6 i commented out?

Function UTF8_Encode$(Value$)
	Local result$
	For Local i = 1 To Len(value$)
		Local char = Asc(Mid(Value$, i, 1))
		If char < 128
			Result$ = Result$ + Mid(value$, i, 1)
		ElseIf ((char > 127) And (char < 2048))
'			Result$ = Result$ + Chr$(((char \ 64) Or 192))
			Result$ = Result$ + Chr$(((char And 63) Or 128))
		Else
'			Result$ = Result$ + Chr$(((char \ 144) Or 234))
'			Result$ = Result$  + Chr$((((char \ 64) And 63) Or 128))
			Result$ = Result$ + Chr$(((char And 63) Or 128))
		EndIf
	Next
	Return result$
EndFunction


Original VB6 code (under comments, not the article self) is here:
http://www.nonhostile.com/howto-convert-byte-array-utf8-string-vb6.asp

NB. Can somebody fix the spelling in the title I cant fix (covert -> convert)?


Brucey(Posted 2008) [#2]
A lot of my modules do UTF-8 to Max String conversions (and back). Have a look at libxml/gtkmaxgui/etc mods for examples.

wxMax lets wxWidgets take the stress of string conversion, which was nice for a change :-)


Space Fractal(Posted 2008) [#3]
I must want a native way, since I are really close above.

BTW I do not use MaxGUI and need doing that before sending to some dlls files (A plugin system for Jukebox software).

Hence I want some small native code. Is a module and example I should look to?

Here is the usable code and I put that one in the Code archives:

Function UTF8$(Unicode$)
	Local RESULT$=""
	
	For Local i=1 To Len(Unicode$)
		Local Char$=Mid$(Unicode$,i,1)
		If Asc(Char$)<128
			result$=result$+char$
		ElseIf Asc(Char$)>127 And Asc(Char$)<2048
			Local Bytes$=Right$(Bin$(Asc(Char$)),11)

			Local Byte1$="110"+Left$(Bytes$,5)
			Result$=Result$+Chr(Bin2Int(Byte1$))

			Local Byte2$="10"+Right$(Bytes$, 6)
			Result$=Result$+Chr(Bin2Int(Byte2$))
		Else
			Local Bytes$=Right$(Bin$(Asc(Char$)),16)

			Local Byte1$="1110"+Left$(Bytes$,4)
			Result$=Result$+Chr(Bin2Int(Byte1$))

			Local Byte2$="10"+Mid$(Bytes$, 5, 6)
			Result$=Result$+Chr(Bin2Int(Byte2$))

			Local Byte3$="10"+Right$(Bytes$, 6)
			Result$=Result$+Chr(Bin2Int(Byte3$))		
		EndIf
	Next
	Return Result$
EndFunction


I know it can been faster, but it (should) work.

Please note:
Bin2Int is needed which can see in the UTF8 2 Unicode example (the other way).


EDIT:
hey, I should have looked in Code achives first, since Junkprogger allready have created such a function, that appeared recently.