bah.RegEx glitch
BlitzMax Forums/Brucey's Modules/bah.RegEx glitch
| ||
I've been playing around with the extremely useful bah.regex module, but noticed some odd behaviour: It appears that it does not properly distinguish between upper- and lowercase characters. For example: Import BaH.RegEx demo:String="<b>28 test</b>" Print "Original : " + demo + "~n" Local RegEx:TRegEx = TRegEx.Create("^(\<\w\>){0,10}(\d{1,3})( )([ABCDEFGHIJKLMNOPQRSTUVWXYZ])") Local match:TRegExMatch = regex.Find(demo) If match Print match.SubExp() End If This expressions -should- look for the beginning of the line, then up to 10 HTML 'open' tags, a number up to three digits, a space, and a sinlge upper-case letter. However, in this example it will also match the first 't' in 'test', even though it's a lower case character and the Regular expression is explicitly looking for upper case characters -- it shouldn't be returning a match at all with this string. I tried using [[:upper:]] instead of the [ABCDEFGHIJKLMNOPQRSTUVWXYZ], with similar results. Case is significant in the text I'm parsing through, so this is kind of throwing me a curveball. Any ideas? |
| ||
By default case matching is insensitive. So, you need to pass in a TRegExOptions object and override the default... Import BaH.RegEx demo:String="<b>28 test</b>" Print "Original : " + demo + "~n" Local options:TRegExOptions = New TRegExOptions options.caseSensitive = True Local RegEx:TRegEx = TRegEx.Create("^(\<\w\>){0,10}(\d{1,3})( )([ABCDEFGHIJKLMNOPQRSTUVWXYZ])", options) Local match:TRegExMatch = regex.Find(demo) If match Print match.SubExp() End If |
| ||
I'm used to Perl's regular expressions and by default it is case-sensitive, the same goes for PHP's implementation of regular expression. To make the regular expression case-insensitive I need to add a /i on the end of the line. So, is this really expected behavior? |
| ||
Well, it's easy enough to reverse the default behaviour... But according to PCRE man page : By default, matching is case insensitive. So, my implementation is just following the documentation. |
| ||
Ah, thanks! So far the other instances I've used RegEx (e.g. with Grep) it's always been case-sensitive by default as well, so I didn't know I had to explicitly enable it here. |
| ||
In that case let it stay that way. |