RE2 regular expression library

BlitzMax Forums/BlitzMax Programming/RE2 regular expression library

Otus(Posted 2010) [#1]
In case anyone is interested, I have wrapped the Google RE2 library for BlitzMax. It is under the new BSD (permissive) license, so I cannot post it in the code archives. The download is here.

Brucey has already wrapped the excellent PCRE library, but apparently there are some corner cases, where PCRE takes exponential time. RE2 gives the guarantee of linearly bounded runtime. I also expose a simpler API that I feel is more in BlitzMax style. RE2 supports almost all the same features (with the notable exception of backreferences).

From my minimal testing, compiling a RegExp object is 3-5x slower than with PCRE and matching with very small patterns and targets is also slightly slower. However, with long patterns and strings they are pretty much even, RE2 slightly faster.

I guess the case where you really should use this instead of PCRE is if you accept untrusted patterns, since the exponential runtime allows easy denial of service attacks.


Jim Teeuwen(Posted 2010) [#2]
PCRE is indeed not the most efficient parser. Specifically when using certain backreference constructs. You can watch the universe die and be born again before the pattern match finishes :p

RE2 does not support backreferences for that very reason as far as I am aware. At least the RE2 implementation in the Go language doesn't.


Otus(Posted 2010) [#3]
PCRE is indeed not the most efficient parser. Specifically when using certain backreference constructs. You can watch the universe die and be born again before the pattern match finishes :p

No back references needed:
SuperStrict

Framework BRL.StandardIO
Import Otus.RE2

Local b:Byte[300]
For Local i:Int = 0 Until 100
	b[2*i] = Asc("a")
	b[2*i+1] = Asc("?")
	b[i+200] = Asc("a")
Next

Local pattern:String = String.FromBytes(b, 300)
Local target:String = String.FromBytes(Byte Ptr(b)+200, 100)

Local t:Int = MilliSecs()
If Not RegExIsFullMatch(target, pattern) RuntimeError("Failed!")
Print MilliSecs()-t

Takes a few milliseconds with RE2. Would take on the order of 10^100 years with PCRE, if you had a high enough stack space limit.
(Black holes are estimated to evaporate in 10^100 years - long after all nucleons have decayed.)


Jim Teeuwen(Posted 2010) [#4]
Interesting. I was under the impression that Mono used a somewhat more reliable regex library than PCRE, but apparently it's the same deal.
You code throws it into a 500 billion year loop :p

The regex lib in Go has no problem with the above (completes program in 1.021 seconds)


xlsior(Posted 2010) [#5]
Can't compile it on my PC (windows)


Compiling:arena.cpp
Compiling:bitstate.cpp
Compiling:compile.cpp
Compiling:dfa.cpp
Compiling:filtered_re2.cpp
Compiling:hash.cpp
Compiling:mimics_pcre.cpp
Compiling:nfa.cpp
Compiling:onepass.cpp
Compiling:parse.cpp
C:/Code/BlitzMax/mod/otus.mod/re2.mod/re2/parse.cpp: In function `re2::UGroup* re2::LookupGroup(const re2::StringPiece&, re2::UGroup*, int)':
C:/Code/BlitzMax/mod/otus.mod/re2.mod/re2/parse.cpp:1012: error: no match for 'operator==' in '((+(((unsigned int)i) * 20u)) + groups)->re2::UGroup::name == name'
/mingw/lib/gcc/mingw32/3.4.5/../../../../include/objbase.h:79: note: candidates are: re2::BOOL re2::operator==(const re2::GUID&, const re2::GUID&)
C:/Code/BlitzMax/mod/otus.mod/re2.mod/re2/parse.cpp: In function `re2::UGroup* re2::LookupUnicodeGroup(const re2::StringPiece&)':
C:/Code/BlitzMax/mod/otus.mod/re2.mod/re2/parse.cpp:1034: error: no match for 'operator==' in 'name == "Any"'
/mingw/lib/gcc/mingw32/3.4.5/../../../../include/objbase.h:79: note: candidates
are: re2::BOOL re2::operator==(const re2::GUID&, const re2::GUID&)
Build Error: failed to compile C:/Code/BlitzMax/mod/otus.mod/re2.mod/re2/parse.cpp




Otus(Posted 2010) [#6]
Hmm, I don't get that error. Which MinGW version are you using?


Scaremonger(Posted 2014) [#7]
I know this is an old post: The library compiles fine, but importing the library crashes the app.