RegEx Mod Help with Custom Tag & Bracket Linefeed
BlitzMax Forums/Brucey's Modules/RegEx Mod Help with Custom Tag & Bracket Linefeed
| ||
Usually it is easy to get the value between brackets with regex, but I need something like extracting the contents inside a bracketed linefeed with a custom tag MyTag { "Content Here" } Looking to get Content Here without the quotes.. |
| ||
You are talking about bah.regex ? Looking at the code it might be like: local regex:TRegex = ... 'might contain "namedExpression" local regexMatch:TRegexMatch = regex.Find() local contentBegin:string = regexMatch.SubStart(0) local contentEnd:string = regexMatch.SubEnd(0) local content:string = ... 'or local content:string = regexMatch.SubExpByName("namedExpression") Example for content extraction: https://github.com/maxmods/bah.mod/blob/master/regex.mod/tests/test_01.bmx Example for "ByName": https://github.com/maxmods/bah.mod/blob/master/regex.mod/tests/test_09.bmx If the regex gets to complicated for you: split your content into "wrapper blocks" and do another regex on that blocks. This is how many website-scrapers (thinking of Kodi/XBMC-addons) fetch their information from non-API-enabled-websites. bye Ron |
| ||
why not in "classic" blitzmax?Local a$="MyTag{" + Chr(13) + Chr(34) +"Content Here" +Chr(34) + "}" Print "RESULT=" + Between( a$ , "{"+Chr(34) , Chr(34)+"}" ) Function Between$(Text$, Starts$, Ends$) Local Result$, From%, Too% Text=EliminateLineFeed(Text) From=Text.Find( Starts) If From=-1 Return "" Too=Text.FindLast( Ends) If Too=-1 Return "" Result=Mid(Text,From+Len(Starts)+1, too-from-Len(Starts)) Return Result End Function Function EliminateLineFeed$(Text$) Return Text.Replace(Chr(13),"") End Function |
| ||
@ Midimaster I assume MyTag etc. might change. Else - for this "fixed code block" you might just do a (untested) local firstQuotePos:int = a.Find("~q") local lastQuotePos:int = a.FindLast("~q") 'the alternative is shorter and should work too as "not found" results in -1 'if firstQuotePos <> lastQuotePos and firstQuotePos <> -1 and lastQuotePos <> -1 if firstQuotePos < lastQuotePos print "content: " + Mid(content, firstQuotePos+1, lastQuotePos-1) endif bye Ron |
| ||
Thanks guys. Yes it's bah.regex. I assume MyTag etc. might change. Actually, I wanted to do or find multiple matches and it has to be checking the MyTag with the paired brackets beside it. If this is difficult with regex, I guess I'll just do per line checking and check below it.. |
| ||
I haven't had time to try anything out, but here's something that may be useful: https://nikic.github.io/2011/12/10/PCRE-and-newlines.html The key to multiline, I believe, is to get regex to handle the string as a single block of text - rather than the default behaviour which is to work on a line-by-line basis. Also, in TRegExOptions you can play around with "dotMatchAll" and "targetIsMultiline". |
| ||
@Derron Do not understand your criticism... My code also works when "MyTag" changes. It only looks out for the limiting markers like <{"> and <"}>. The a$="..." line is only a testing sample. And I think RustyChristi is not only searching for quotation marks, but for the combination of quotation marks and brackets. Therefore you need a cutting algo, which also considers a limiting marker length>1. And I think he wanted to cut out the Linfeeds too. |
| ||
that qas not meant as harsh critic. Regarding linefeed... there is crlf and lf...you might consider handling that too. @ krusty Regex might be slower than a tight specialized string extraction. But if you only do things during load the regex road might be more flexible to expand Bye Ron |
| ||
yes I figured thanks I will try now the line feed thing. Thank you guys. @MidiMaster I would like to try out your code but I don't see the raw source string input? |
| ||
a$ is the sample for the "input string". Replace a$ in the BETWEEN function with your text:Global text=LoadText("...") Print "RESULT=" + Between( text , "MyTag{"+Chr(34) , Chr(34)+"}" ) If you are searching for several command words, wrap the BETWEEN in a third function Global Text=LoadText("...") Print "MYTAG=" + SeachFor(Text$, "MyTag") Print "NAME=" + SeachFor(Text$, "Name") Function SearchFor$(Text$, CommandWord$) Return Between( Text , CommandWord+"{"+Chr(34) , Chr(34)+"}" ) End Function it is no problem to expand the ELIMINATE function also to CRLF: Function EliminateLineFeed$(Text$) Text=Text.Replace(Chr(10),"") Return Text.Replace(Chr(13),"") End Function @Derron Sorry, I did not want to offend. I did not mean that you answer was "harsh", when I wrote "criticism" (in the sense of "constructive criticism") My main intension was "do not understand"! |
| ||
Thanks MidiMaster! I will try out your updated example. |