BaH.RegEx - Regular Expressions

BlitzMax Forums/Brucey's Modules/BaH.RegEx - Regular Expressions

Brucey(Posted 2008) [#1]
Has been updated to PCRE 7.4, although I've just noticed they've released 7.6 the other day, so expect another update in the not so distant future :-p

For the uninitiated, Regular Expressions are an extremely powerful method of search/replacing in strings.

Here's a complex example, just to show you what it can do :

Imagine you wanted to be able parse a string for dates in format dd/mm/yyyy or dd/mm/yy, and replace those entries in the string with the year only inside quotes. Writing a program to do that would be quite involved.

Now let's look at such a program using Regular Expressions :
SuperStrict

Framework BaH.RegEx
Import BRL.StandardIO

Local date:String = "The dates are: 12/30/1969, 06/04/1974 and 15/08/1980"
Print "Original : " + date + "~n"

Local regex:TRegEx = TRegEx.Create("(\d\d)[-/](\d\d)[-/](\d\d(?:\d\d)?)")

Print regex.replaceAll(date, "~q\3~q")

would result in the output :
Original : The dates are: 12/30/1969, 06/04/1974 and 15/08/1980

The dates are: "1969", "1974" and "1980"


The difficult part is learning how to write the regular expression, but there are good tutorials online, as well as an in-depth guide included with the module.

Have Fun ! :-)


xlsior(Posted 2008) [#2]
Thanks -- your regex module has definitely come in handy in the past.

Any specific changes/improvements in this new version?


xlsior(Posted 2008) [#3]
By the way: Windows XP's zipfolders had problems extracting the files from your zip file, although Winrar worked Ok.


Brucey(Posted 2008) [#4]
Windows XP's zipfolders had problems extracting the files

Thanks for that... They were compressed on Mac (using the built-in context menu), and unzipped fine here on Win2k with Winzip so I hadn't noticed.

Any specific changes/improvements in this new version?

Just the update to the core library, in which they tend to fix bugs, improve speed, and generally make it better here and there.
No changes to the Max-side of things.


xlsior(Posted 2008) [#5]
Thanks for that... They were compressed on Mac (using the built-in context menu), and unzipped fine here on Win2k with Winzip so I hadn't noticed.


I have a feeling that XP might have an issue with both a folder and a file with the same name in the main folder... (regex.mod)

I just noticed that some of your other mods have the same problem.


Brucey(Posted 2008) [#6]
XP might have an issue with both a folder and a file with the same name in the main folder

Except there aren't any files called regex.mod :-)

I've just tried unzipping the regex_1_03_src.zip folder in Win2k using the built-in "Extract Here" context menu, and it appeared to work as expected.


xlsior(Posted 2008) [#7]
When I double-click on the regex_1_03_src.zip file in XP on my computer, it opens a compressed folder containing both a folder named regex.mod and a file named regex.mod

When I try to extract the file, it tells me 'the system cannot find the file specified'... Yet I do see it. :-?


North(Posted 2008) [#8]
Use Winzip

@Brucey - Thanks for the update! :)


Koriolis(Posted 2008) [#9]
Thanks for this module Brucey (and all the rest :) ).
BTW, I noticed a bug lately: when doing replaceAll on a non-ascii text, everything is screwed. IIRC it seemed to be due to the fact that pcre is giving you indexes that relate to the utf8 buffer, but you use these indexes to index the BlitzMax string. Hence the mismatch.


Brucey(Posted 2008) [#10]
Koriolis, that's possible. It didn't used to support UTF-8 the way it does now. And I see that the newest version is even more unicode friendly.
I'll need to revamp the string conversion - hopefully that won't be a huge job!


Htbaa(Posted 2008) [#11]
I found a bug. I'm using the SVN version by the way.

Import bah.regex

Local strs:String[] = ["test1", "test2", "test3", "test4"]
Local text:String = "Just a test1 string with test, test2, test3 and ofcourse test 4"

For Local str:String = EachIn strs
	Try
		Local regex:TRegEx = TRegEx.Create(str)
		text = regex.Replace(text, "`" + str + "`")
		regex = Null
	Catch ex:TRegExException
		Notify ex.toString()
	Catch ex:Object
		Notify ex.ToString()
	End Try
Next

Print text


It gives a Unhandeld Memory Exception error. I scanned the module and couldn't see data member pcre (Byte Ptr) being freed anywhere. I'm not sure if the problem is there though.

When removing the Try/Catch blocks it sends me to regex.bmx on line 190.


Brucey(Posted 2008) [#12]
Sadly (or not!), this example works for me (on Mac).

I'll give it a try elsewhere and see if I can make it break.

EDIT : working on Linux too.


Brucey(Posted 2008) [#13]
BTW, I noticed a bug lately: when doing replaceAll on a non-ascii text, everything is screwed.

Bit of a late fix.... but I think it's working properly now with non-ASCII text. Well, the offsets seem to be correct now. May still require some work though.


Htbaa(Posted 2008) [#14]
Have you tried my bug on Windows XP or Vista?


Brucey(Posted 2008) [#15]
Yep, and have just applied a fix :-)


Htbaa(Posted 2008) [#16]
I'll try it soon. Nice to see that it's fixed. (it is right?)


Brucey(Posted 2008) [#17]
I hope so :-)

Certainly, it crashed on XP as described before I changed the code, and now it doesn't...


Brucey(Posted 2008) [#18]
Updated to PCRE 7.8, and fixed another utf8 issue.

utf8 support seems to be quite solid now.


Htbaa(Posted 2008) [#19]
Finally had some time to test it. It works for me now. Thanks!