Questions about bah.regex
BlitzMax Forums/Brucey's Modules/Questions about bah.regex
| ||
I read through the help and PCRE man, but didn't quite understand everything... What does TRegEx.Find do? It returns a match object, which has SubCount matches, but what are they? PCRE has "standard" and "alternative" matching methods, of which common.bmx seems to import the standard one... However the man says this about the standard method: If a leaf node is reached, a matching string has been found, and at that point the algorithm stops. So why would it ever return more than one match? Also, will the match always be the first from left, the shortest, the longest or something else? If I need to find the longest match beginning at position 0 in the string, what should I do? |
| ||
Well, I'm no expert... but I'll see what I can do :-) What does TRegEx.Find do? It finds the first match in the string. On subsequent calls, it will find the next match, and so on. ...a match object, which has SubCount matches, but what are they? A regular expression can be quite complex, finding not only the main match, but all sub-pattern matches which make up the whole. For a basic search you would not necessarily be interested in these, but for other searches you might want it to pre-split a date into its constituent parts, for example. Here's a little example of subpatterns. The expression itself is taken from the docs : SuperStrict Framework BaH.Regex Import BRL.StandardIO Local pattern:String = "the ((red|white) (king|queen))" Local search:String = "the red king" Local regex:TRegEx = TRegEx.Create(pattern) Try Local match:TRegExMatch = regex.Find(search) While match For Local i:Int = 0 Until match.SubCount() Print i + ": " + match.SubExp(i) Next match = regex.Find() Wend Catch e:TRegExException Print "Error : " + e.toString() End End Try It outputs this : 0: the red king 1: red king 2: red 3: king which are the subpatterns as defined by the brackets (). The bbdoc docs for BaH.Regex do go over how subpatterns and suchlike work, although I have to say it is a tad on the deep and technical side. You can also use things such as "Lookahead assertions" and "Lookbehind assertions", and a whole load of other meaty set of character combinations to be very specific in your search parameters. So why would it ever return more than one match? I can't say I've read much on the page you linked to, but it does appear to work. Changing the search string in the above example to Local search:String = "the red king was here, but the white bishop was nowhere to be found. Did you see the white king, perhance?" results in the following output: 0: the red king 1: red king 2: red 3: king 0: the white king 1: white king 2: white 3: king so obviously Find() was able to pick up two separate matching cases in the string, and break down the subpatterns at the same time. Also, will the match always be the first from left, the shortest, the longest or something else? I think that depends how you write the expression, but I'm not an expert. Obviously, if you do something like this - [A]+ - it will match *any* series of A. .... Hope this helps a little bit for now? |
| ||
Thanks! I totally misunderstood the part about subexpressions, that's why I was confused. I thought the different subexpressions in a TRegExMatch object were related to different matches instead of patterns within the same match. So, I guess what I needed to know was whether something like [a-z]+ always matches the longest string from the beginning. Eg. for search$ = "abcd efghi" will I always get the match for "abcd" instead of "abc" or "efghi" or "a" - all of which match the pattern. |
| ||
For your example, I would expect it to match twice, once for "abcd" and once for "efghi". "a" doesn't necessarily match in this case, since you are asking for a run of characters, where the full set of "abcd" is the match. |