Something along the lines of a lexical analyzer
BlitzMax Forums/BlitzMax Programming/Something along the lines of a lexical analyzer
| ||
Well, I have come up with something along the lines of a lexical analyzer, as the title states. It works on Linux (ubuntu 11.10 64bit). For those of you who do not know what a lexical analyzer is, it breaks up the parts of code into chunks (aka. Tokens), and sends them to a parser to parse them. A good example is Flex (which creates a lexical analyzer), and Bison (which creates a parser). They work together flawlessly, but I personally HATE C. I hope my code doesn't suck too bad for 2AM on a weeknight (yes, I didn't comment, I just wanted to get it done)... Const CNULL:Int = -1 Const CSTATEMENT:Int = 0 Const CINTEGER:Int = 1 Const CADD:Int = 2 Const CSUB:Int = 3 Type Token Field typ:Int Field txt:String Function Create:Token(typ1:Int,txt1:String) Local toret:Token = New Token toret.typ = typ1 toret.txt = txt1 Return toret End Function End Type Function Lex:Token(statement:String) DebugLog "Lex" Local tokens:Token[] = New Token[3] For Local x:Int = 0 To 2 Step 1 tokens[x] = Token.Create(CNULL,"") Next Tokenize22(statement,tokens) For x = 0 To 2 Step 1 DebugLog tokens[x].typ DebugLog tokens[x].txt Next Local c:Int = 0 If tokens[0].typ = CNULL Then llerror("Lexical Analyzer Error (Internal)!") EndIf If tokens[0].typ <> CINTEGER Then llerror("Invalid Token!") EndIf If tokens[1].typ = CNULL Then Return tokens[0] EndIf If tokens[2].typ = CNULL Then llerror("Bad Number Of Tokens!") EndIf If tokens[2].typ = CSTATEMENT Then Local ttok:Token = Lex(tokens[2].txt) tokens[2].typ = CINTEGER tokens[2].txt = ttok.txt Return Token.Create(CINTEGER,Parse(tokens[0].typ,tokens[0].txt,tokens[1].typ,tokens[2].txt)) EndIf End Function Function Parse:String(typ:Int,txt1:String,modifier:Int,txt2:String) Local toret:String Select typ Case CINTEGER Local txt1_1:Int = Int(txt1) Local txt2_1 = Int(txt2) Select modifier Case CADD toret = String(txt1_1 + txt2_1) Case CSUB toret = String(txt1_1 - txt2_1) Default llerror("Unknown Type of Modifier For Stuff To Parse!") End Select Default llerror("Unknown Type of Stuff To Parse!") End Select Return toret End Function Function Tokenize22(statement:String,tokens:Token[] Var) Local mytokens:Byte = 0 Local inint:Byte = 0 For Local x:Int = 1 To Len(statement) Step 1 If inint = 1 Then If Mid(statement,x,1) <> " " Then tokens[mytokens].txt :+ Mid(statement,x,1) Continue Else inint = 0 mytokens :+ 1 Continue EndIf EndIf If mytokens = 2 Then tokens[2].typ = CSTATEMENT tokens[2].txt = Mid(statement,x) Return EndIf Select Mid(statement,x,1) Case " " Continue Case "+" tokens[mytokens].typ = CADD mytokens :+ 1 Case "-" tokens[mytokens].typ = CSUB mytokens :+ 1 Case "1","2","3","4","5","6","7","8","9","0" inint = 1 tokens[mytokens].typ = CINTEGER tokens[mytokens].txt = Mid(statement,x,1) Default llerror("Unknown Character!") End Select Next End Function Function llerror(text:String) RuntimeError text End Function I am open to any suggestions! |
| ||
Looks like good fun. Have you thought about how to extend it into a more generic framework? (I'm too lazy to write efficient lexer code by hand, so my own scanners invariably end up making heavy use of regular expressions.) What would be really cool is if it were possible to devise some kind of parser-framework (think Spirit, rather than a parser-generator like Bison) for BlitzMax. While I have a few ideas, the lack of much in the way of metaprogramming options makes it hard to come up with something that's both suitably expressive and fast (I wonder if using something like C macros counts as "cheating" when writing BlitzMax code?). |