UTF8 Unicode Functions

Blitz3D Forums/Blitz3D Programming/UTF8 Unicode Functions

virtlands(Posted 2013) [#1]
Hi, this is an unusual topic which is UNICODE.

I created some hard to find unicode FUNCTIONS,
Specifically this shows how to convert regular UNICODE numbers into UTF8.

The following B3D code requires a purchase of the FastText Library
to see it in action.
http://fastlibs.com/


For those that already have that LIB, you can test and run
the following code functions:

;; Some important UNICODE to UTF8 conversion Functions:
;;


;;  You'll need to purchase your copy of FastText.DLL

Include "C:\Blitz3d Projects\FastText_unicode.BB"    

Global WhiteSquare_UTF8$ = Uni_to_UTF8$($2581)

Type UTF8
     Field byte[4]
     Field s$
End Type 

UTF8sam.UTF8 = New UTF8   ;; UTF8 Sample

 Graphics 1024,480,32,2
 Local s$

 UniFont0=LoadFont("Tahoma",25)
 font0=LoadFont("arial.ttf",20)	
 SetFont UniFont0
	
 horiz = 60  ;; a horizontal placement	

 ;; This loop prints UTF8 (unicode) characters one-at-a-time. 
 ;; 	
 For u = $24E6 To $24FE
        Text horiz,40, Uni_To_UTF8(u)        
        horiz = horiz +20
 Next 	

 ;; This loop builds up a string first, and then prints them all at once
 ;;
 For u = $2730 To $274D
       s$ = s$ + uni_to_utf8(u)           
 Next      

 Text 60,70, s    


 WaitKey():End 


; This function converts a unicode value
; into a UTF8 byte string
;
Function Uni_to_UTF8$(u)
 Local b1,b2,b3,b4

 If (u < $80)    
     Return Chr$(u)    
 End If 

 If (u < $800)
     b1 = ((u Shr 6) And $1f) Or $c0
     b2 = (u And $3f) Or $80
     Return Chr$(b1)+Chr$(b2)   
 End If 

 If (u<$10000)
     b1 = ((u Shr 12) And $0f) Or $e0
     b2 = ((u Shr 6) And $3f) Or $80
     b3 = (u And $3f) Or $80
     Return Chr$(b1)+Chr$(b2)+Chr$(b3) 
 End If 

 If (u<$110000)
     b1 = ((u Shr 18) And $7) Or $f0
     b2 = ((u Shr 12) And $3f) Or $80
     b3 = ((u Shr 6) And $3f) Or $80
     b4 = (u And $3f) Or $80 
     Return Chr$(b1)+Chr$(b2)+Chr$(b3)+Chr$(b4)
 End If 

End Function 


;  creates a UTF8 type from a unicode input
;
Function Uni_to_UTF8t.UTF8(ut.UTF8,u) 
 Local b1,b2,b3,b4 

 If (u < $80)    
     Return SetUTF8(ut,u)  
 End If 

 If (u < $800)
     b1 = ((u Shr 6) And $1f) Or $c0
     b2 = (u And $3f) Or $80
     Return SetUTF8(ut,b1,b2)     
 End If 

 If (u<$10000)
     b1 = ((u Shr 12) And $0f) Or $e0
     b2 = ((u Shr 6) And $3f) Or $80
     b3 = (u And $3f) Or $80    
     Return SetUTF8(ut,b1,b2,b3)   
 End If 

 If (u<$110000)
     b1 = ((u Shr 18) And $7) Or $f0
     b2 = ((u Shr 12) And $3f) Or $80
     b3 = ((u Shr 6) And $3f) Or $80
     b4 = (u And $3f) Or $80      
     Return SetUTF8(ut,b1,b2,b3,b4)
 End If 

End Function 

Function SetUTF8.UTF8(ut.UTF8, b1=0,b2=0,b3=0,b4=0)
  Local s$  
  If ut=Null Then ut=New UTF8  

  ut\byte[1] = b1
  ut\byte[2] = b2
  ut\byte[3] = b3
  ut\byte[4] = b4

  For z=1 To 4
      If ut\byte[z]=0 Then Exit
      s = s + ut\byte[z]
  Next 
  ut\s = s
  Return ut
End Function 


; Blitz3D apparently cannot display UNICODE directly, since it has
; to be converted into UTF8 (string) sequences first, and then once
; that is done, you can display virtually anything, but it also
; depends on the font you've loaded.
; Some fonts are known as 'unicode' fonts, and some are not.

Here is an excellent wikipedia webpage for learning about UNICODE:
THE UNICODE PLANE
http://en.wikipedia.org/wiki/Plane_%28Unicode%29
http://en.wikipedia.org/wiki/UTF-8

THere are different kinds of unicode encodings, as listed here:
UTF-8, UTF-16, UTF-32 & BOM

This is the webpage where I got the code listed in my program.
http://www.unicode.org/faq/utf_bom.html

( But B3D can probably only handle UTF8, I'm guessing...)

There are likely over 100000 unicodes; I haven't counted them, I'm
just guessing.

When you run the above program, all it does is output this sample


The fasttext.dll is a specialized program that uses the "Text" command
and some others.

Someday I wanted to do something with this unicode, but I never got around to it.

I thought I had the UTF16 and UTF32 functions somewhere, but I apparently
misplaced them.