RegExModule doesn't support Unicode characters

BlitzMax Forums/Brucey's Modules/RegExModule doesn't support Unicode characters

Pineapple(Posted 2014) [#1]
PCRE is supposed to support Unicode characters but the module doesn't. Behavior ranges from misinterpreting the start argument to the find method to throwing exception -11, PCRE_ERROR_BADUTF8_OFFSET.

Here's an example program to show the errors in action:

SuperStrict

Import bah.regex

Local expression:TRegEx=TRegEx.Create("[\pL]+[\s]*")

Local teststring$="Here are six unicode characters àéïõúÿ"

Print teststring

Local match:TRegExMatch,start%=0

While 1
	match=expression.find(teststring,start)
	If match
		Print "'"+match.SubExp()+"' of length "+match.SubExp().length+" found at "+match.SubStart()
		start:+match.SubExp().length
	Else
		Exit
	EndIf
Wend


edit: Those count as Unicode, right? Actually I'm not sure. Whatever they are, they don't work.


Brucey(Posted 2014) [#2]
You may want to try the latest source (from SVN/github), which results in this on my Mac :
Here are six unicode characters àéïõúÿ
'Here ' of length 5 found at 0
'are ' of length 4 found at 5
'six ' of length 4 found at 9
'unicode ' of length 8 found at 13
'characters ' of length 11 found at 21
'àéïõúÿ' of length 6 found at 32



Pineapple(Posted 2014) [#3]
That worked! Thanks much for the quick response.