BaH.RegEx - Regular Expressions
BlitzMax Forums/Brucey's Modules/BaH.RegEx - Regular Expressions
| ||
Has been updated to PCRE 7.4, although I've just noticed they've released 7.6 the other day, so expect another update in the not so distant future :-p For the uninitiated, Regular Expressions are an extremely powerful method of search/replacing in strings. Here's a complex example, just to show you what it can do : Imagine you wanted to be able parse a string for dates in format dd/mm/yyyy or dd/mm/yy, and replace those entries in the string with the year only inside quotes. Writing a program to do that would be quite involved. Now let's look at such a program using Regular Expressions : SuperStrict Framework BaH.RegEx Import BRL.StandardIO Local date:String = "The dates are: 12/30/1969, 06/04/1974 and 15/08/1980" Print "Original : " + date + "~n" Local regex:TRegEx = TRegEx.Create("(\d\d)[-/](\d\d)[-/](\d\d(?:\d\d)?)") Print regex.replaceAll(date, "~q\3~q") would result in the output : Original : The dates are: 12/30/1969, 06/04/1974 and 15/08/1980 The dates are: "1969", "1974" and "1980" The difficult part is learning how to write the regular expression, but there are good tutorials online, as well as an in-depth guide included with the module. Have Fun ! :-) |
| ||
Thanks -- your regex module has definitely come in handy in the past. Any specific changes/improvements in this new version? |
| ||
By the way: Windows XP's zipfolders had problems extracting the files from your zip file, although Winrar worked Ok. |
| ||
Windows XP's zipfolders had problems extracting the files Thanks for that... They were compressed on Mac (using the built-in context menu), and unzipped fine here on Win2k with Winzip so I hadn't noticed. Any specific changes/improvements in this new version? Just the update to the core library, in which they tend to fix bugs, improve speed, and generally make it better here and there. No changes to the Max-side of things. |
| ||
Thanks for that... They were compressed on Mac (using the built-in context menu), and unzipped fine here on Win2k with Winzip so I hadn't noticed. I have a feeling that XP might have an issue with both a folder and a file with the same name in the main folder... (regex.mod) I just noticed that some of your other mods have the same problem. |
| ||
XP might have an issue with both a folder and a file with the same name in the main folder Except there aren't any files called regex.mod :-) I've just tried unzipping the regex_1_03_src.zip folder in Win2k using the built-in "Extract Here" context menu, and it appeared to work as expected. |
| ||
When I double-click on the regex_1_03_src.zip file in XP on my computer, it opens a compressed folder containing both a folder named regex.mod and a file named regex.mod When I try to extract the file, it tells me 'the system cannot find the file specified'... Yet I do see it. :-? |
| ||
Use Winzip @Brucey - Thanks for the update! :) |
| ||
Thanks for this module Brucey (and all the rest :) ). BTW, I noticed a bug lately: when doing replaceAll on a non-ascii text, everything is screwed. IIRC it seemed to be due to the fact that pcre is giving you indexes that relate to the utf8 buffer, but you use these indexes to index the BlitzMax string. Hence the mismatch. |
| ||
Koriolis, that's possible. It didn't used to support UTF-8 the way it does now. And I see that the newest version is even more unicode friendly. I'll need to revamp the string conversion - hopefully that won't be a huge job! |
| ||
I found a bug. I'm using the SVN version by the way.Import bah.regex Local strs:String[] = ["test1", "test2", "test3", "test4"] Local text:String = "Just a test1 string with test, test2, test3 and ofcourse test 4" For Local str:String = EachIn strs Try Local regex:TRegEx = TRegEx.Create(str) text = regex.Replace(text, "`" + str + "`") regex = Null Catch ex:TRegExException Notify ex.toString() Catch ex:Object Notify ex.ToString() End Try Next Print text It gives a Unhandeld Memory Exception error. I scanned the module and couldn't see data member pcre (Byte Ptr) being freed anywhere. I'm not sure if the problem is there though. When removing the Try/Catch blocks it sends me to regex.bmx on line 190. |
| ||
Sadly (or not!), this example works for me (on Mac). I'll give it a try elsewhere and see if I can make it break. EDIT : working on Linux too. |
| ||
BTW, I noticed a bug lately: when doing replaceAll on a non-ascii text, everything is screwed. Bit of a late fix.... but I think it's working properly now with non-ASCII text. Well, the offsets seem to be correct now. May still require some work though. |
| ||
Have you tried my bug on Windows XP or Vista? |
| ||
Yep, and have just applied a fix :-) |
| ||
I'll try it soon. Nice to see that it's fixed. (it is right?) |
| ||
I hope so :-) Certainly, it crashed on XP as described before I changed the code, and now it doesn't... |
| ||
Updated to PCRE 7.8, and fixed another utf8 issue. utf8 support seems to be quite solid now. |
| ||
Finally had some time to test it. It works for me now. Thanks! |