RegEx Mod Help with Custom Tag & Bracket Linefeed

BlitzMax Forums/Brucey's Modules/RegEx Mod Help with Custom Tag & Bracket Linefeed

RustyKristi(Posted 2016) [#1]
Usually it is easy to get the value between brackets with regex, but I need something like extracting the contents inside a bracketed linefeed with a custom tag

MyTag {
"Content Here"
}

Looking to get Content Here without the quotes..


Derron(Posted 2016) [#2]
You are talking about bah.regex ?

Looking at the code it might be like:
local regex:TRegex = ... 'might contain "namedExpression"
local regexMatch:TRegexMatch = regex.Find() 

local contentBegin:string = regexMatch.SubStart(0)
local contentEnd:string = regexMatch.SubEnd(0)
local content:string = ...
'or
local content:string = regexMatch.SubExpByName("namedExpression")


Example for content extraction:
https://github.com/maxmods/bah.mod/blob/master/regex.mod/tests/test_01.bmx

Example for "ByName":
https://github.com/maxmods/bah.mod/blob/master/regex.mod/tests/test_09.bmx


If the regex gets to complicated for you: split your content into "wrapper blocks" and do another regex on that blocks.
This is how many website-scrapers (thinking of Kodi/XBMC-addons) fetch their information from non-API-enabled-websites.


bye
Ron


Midimaster(Posted 2016) [#3]
why not in "classic" blitzmax?

Local a$="MyTag{" + Chr(13) + Chr(34) +"Content Here" +Chr(34) + "}"

Print "RESULT=" + Between( a$ , "{"+Chr(34) , Chr(34)+"}" )


Function Between$(Text$, Starts$, Ends$)
	Local Result$, From%, Too%
	Text=EliminateLineFeed(Text)
	From=Text.Find( Starts)
	If From=-1 Return ""

	Too=Text.FindLast( Ends)
	If Too=-1 Return ""

	Result=Mid(Text,From+Len(Starts)+1, too-from-Len(Starts))
	Return Result
End Function


Function EliminateLineFeed$(Text$)
	Return	Text.Replace(Chr(13),"")
End Function



Derron(Posted 2016) [#4]
@ Midimaster
I assume MyTag etc. might change.

Else - for this "fixed code block" you might just do a

(untested)
local firstQuotePos:int = a.Find("~q")
local lastQuotePos:int = a.FindLast("~q")

'the alternative is shorter and should work too as "not found" results in -1
'if firstQuotePos <> lastQuotePos and firstQuotePos <> -1 and lastQuotePos <> -1 

if firstQuotePos < lastQuotePos 
  print "content: " + Mid(content, firstQuotePos+1, lastQuotePos-1)
endif



bye
Ron


RustyKristi(Posted 2016) [#5]
Thanks guys. Yes it's bah.regex.

I assume MyTag etc. might change.


Actually, I wanted to do or find multiple matches and it has to be checking the MyTag with the paired brackets beside it.

If this is difficult with regex, I guess I'll just do per line checking and check below it..


Brucey(Posted 2016) [#6]
I haven't had time to try anything out, but here's something that may be useful: https://nikic.github.io/2011/12/10/PCRE-and-newlines.html

The key to multiline, I believe, is to get regex to handle the string as a single block of text - rather than the default behaviour which is to work on a line-by-line basis.

Also, in TRegExOptions you can play around with "dotMatchAll" and "targetIsMultiline".


Midimaster(Posted 2016) [#7]
@Derron

Do not understand your criticism... My code also works when "MyTag" changes. It only looks out for the limiting markers like <{"> and <"}>. The a$="..." line is only a testing sample.

And I think RustyChristi is not only searching for quotation marks, but for the combination of quotation marks and brackets. Therefore you need a cutting algo, which also considers a limiting marker length>1.

And I think he wanted to cut out the Linfeeds too.


Derron(Posted 2016) [#8]
that qas not meant as harsh critic.


Regarding linefeed... there is crlf and lf...you might consider handling that too.


@ krusty
Regex might be slower than a tight specialized string extraction. But if you only do things during load the regex road might be more flexible to expand

Bye
Ron


RustyKristi(Posted 2016) [#9]
yes I figured thanks I will try now the line feed thing. Thank you guys.

@MidiMaster

I would like to try out your code but I don't see the raw source string input?


Midimaster(Posted 2016) [#10]
a$ is the sample for the "input string". Replace a$ in the BETWEEN function with your text:
Global text=LoadText("...")
Print "RESULT=" + Between( text , "MyTag{"+Chr(34) , Chr(34)+"}" )


If you are searching for several command words, wrap the BETWEEN in a third function
Global Text=LoadText("...")
Print "MYTAG=" + SeachFor(Text$, "MyTag")
Print "NAME=" + SeachFor(Text$, "Name")

Function SearchFor$(Text$, CommandWord$)
	Return Between( Text , CommandWord+"{"+Chr(34) , Chr(34)+"}" )
End Function


it is no problem to expand the ELIMINATE function also to CRLF:
Function EliminateLineFeed$(Text$)
	Text=Text.Replace(Chr(10),"")
	Return Text.Replace(Chr(13),"")
End Function


@Derron
Sorry, I did not want to offend. I did not mean that you answer was "harsh", when I wrote "criticism" (in the sense of "constructive criticism") My main intension was "do not understand"!


RustyKristi(Posted 2016) [#11]
Thanks MidiMaster! I will try out your updated example.