bah.RegEx glitch

BlitzMax Forums/Brucey's Modules/bah.RegEx glitch

xlsior(Posted 2009) [#1]
I've been playing around with the extremely useful bah.regex module, but noticed some odd behaviour: It appears that it does not properly distinguish between upper- and lowercase characters.

For example:

Import BaH.RegEx
demo:String="<b>28 test</b>"
Print "Original : " + demo + "~n"
Local RegEx:TRegEx = TRegEx.Create("^(\<\w\>){0,10}(\d{1,3})( )([ABCDEFGHIJKLMNOPQRSTUVWXYZ])")
	Local match:TRegExMatch = regex.Find(demo)
	If match
		Print match.SubExp()
	End If


This expressions -should- look for the beginning of the line, then up to 10 HTML 'open' tags, a number up to three digits, a space, and a sinlge upper-case letter.

However, in this example it will also match the first 't' in 'test', even though it's a lower case character and the Regular expression is explicitly looking for upper case characters -- it shouldn't be returning a match at all with this string.

I tried using [[:upper:]] instead of the [ABCDEFGHIJKLMNOPQRSTUVWXYZ], with similar results.

Case is significant in the text I'm parsing through, so this is kind of throwing me a curveball.

Any ideas?


Brucey(Posted 2009) [#2]
By default case matching is insensitive.

So, you need to pass in a TRegExOptions object and override the default...
Import BaH.RegEx
demo:String="<b>28 test</b>"
Print "Original : " + demo + "~n"

Local options:TRegExOptions = New TRegExOptions
options.caseSensitive = True

Local RegEx:TRegEx = TRegEx.Create("^(\<\w\>){0,10}(\d{1,3})( )([ABCDEFGHIJKLMNOPQRSTUVWXYZ])", options)
	Local match:TRegExMatch = regex.Find(demo)
	If match
		Print match.SubExp()
	End If



Htbaa(Posted 2009) [#3]
I'm used to Perl's regular expressions and by default it is case-sensitive, the same goes for PHP's implementation of regular expression. To make the regular expression case-insensitive I need to add a /i on the end of the line.

So, is this really expected behavior?


Brucey(Posted 2009) [#4]
Well, it's easy enough to reverse the default behaviour...

But according to PCRE man page :

By default, matching is case insensitive.



So, my implementation is just following the documentation.


xlsior(Posted 2009) [#5]
Ah, thanks!

So far the other instances I've used RegEx (e.g. with Grep) it's always been case-sensitive by default as well, so I didn't know I had to explicitly enable it here.


Htbaa(Posted 2009) [#6]
In that case let it stay that way.