Something along the lines of a lexical analyzer

BlitzMax Forums/BlitzMax Programming/Something along the lines of a lexical analyzer

Luke111

(Posted 2011) [#1]

Well, I have come up with something along the lines of a lexical analyzer, as the title states. It works on Linux (ubuntu 11.10 64bit).

For those of you who do not know what a lexical analyzer is, it breaks up the parts of code into chunks (aka. Tokens), and sends them to a parser to parse them. A good example is Flex (which creates a lexical analyzer), and Bison (which creates a parser). They work together flawlessly, but I personally HATE C.

I hope my code doesn't suck too bad for 2AM on a weeknight (yes, I didn't comment, I just wanted to get it done)...

Const CNULL:Int = -1
Const CSTATEMENT:Int = 0
Const CINTEGER:Int  = 1
Const CADD:Int = 2
Const CSUB:Int = 3

Type Token
	Field typ:Int
	Field txt:String
	Function Create:Token(typ1:Int,txt1:String)
		Local toret:Token = New Token
		toret.typ = typ1
		toret.txt = txt1
		Return toret
	End Function
End Type

Function Lex:Token(statement:String)
	DebugLog "Lex"
	Local tokens:Token[] = New Token[3]
	For Local x:Int = 0 To 2 Step 1
		tokens[x] = Token.Create(CNULL,"")
	Next
	Tokenize22(statement,tokens)
	For x = 0 To 2 Step 1
		DebugLog tokens[x].typ
		DebugLog tokens[x].txt
	Next
	Local c:Int = 0
	If tokens[0].typ = CNULL Then
		llerror("Lexical Analyzer Error (Internal)!")
	EndIf
	If tokens[0].typ <> CINTEGER Then
		llerror("Invalid Token!")
	EndIf
	If tokens[1].typ = CNULL Then
		Return tokens[0]
	EndIf
	If tokens[2].typ = CNULL Then
		llerror("Bad Number Of Tokens!")
	EndIf
	If tokens[2].typ = CSTATEMENT Then
		Local ttok:Token = Lex(tokens[2].txt)
		tokens[2].typ = CINTEGER
		tokens[2].txt = ttok.txt
		Return Token.Create(CINTEGER,Parse(tokens[0].typ,tokens[0].txt,tokens[1].typ,tokens[2].txt))
	EndIf
End Function

Function Parse:String(typ:Int,txt1:String,modifier:Int,txt2:String)
	Local toret:String
	Select typ
		Case CINTEGER
			Local txt1_1:Int = Int(txt1)
			Local txt2_1 = Int(txt2)
			Select modifier
				Case CADD
					toret = String(txt1_1 + txt2_1)
				Case CSUB
					toret = String(txt1_1 - txt2_1)
				Default
					llerror("Unknown Type of Modifier For Stuff To Parse!")
			End Select
		Default
			llerror("Unknown Type of Stuff To Parse!")
	End Select
	Return toret
End Function

Function Tokenize22(statement:String,tokens:Token[] Var)
	Local mytokens:Byte = 0
	Local inint:Byte = 0
	For Local x:Int = 1 To Len(statement) Step 1
		If inint = 1 Then
			If Mid(statement,x,1) <> " " Then
				tokens[mytokens].txt :+ Mid(statement,x,1)
				Continue
			Else
				inint = 0
				mytokens :+ 1
				Continue
			EndIf
		EndIf
		If mytokens = 2 Then
			tokens[2].typ = CSTATEMENT
			tokens[2].txt = Mid(statement,x)
			Return
		EndIf
		Select Mid(statement,x,1)
			Case " "
				Continue
			Case "+"
				tokens[mytokens].typ = CADD
				mytokens :+ 1
			Case "-"
				tokens[mytokens].typ = CSUB
				mytokens :+ 1
			Case "1","2","3","4","5","6","7","8","9","0"
				inint = 1
				tokens[mytokens].typ = CINTEGER
				tokens[mytokens].txt = Mid(statement,x,1)
			Default
				llerror("Unknown Character!")
		End Select
	Next
End Function

Function llerror(text:String)
	RuntimeError text
End Function

I am open to any suggestions!

Yasha

(Posted 2011) [#2]

Looks like good fun. Have you thought about how to extend it into a more generic framework? (I'm too lazy to write efficient lexer code by hand, so my own scanners invariably end up making heavy use of regular expressions.)

What would be really cool is if it were possible to devise some kind of parser-framework (think Spirit, rather than a parser-generator like Bison) for BlitzMax. While I have a few ideas, the lack of much in the way of metaprogramming options makes it hard to come up with something that's both suitably expressive and fast (I wonder if using something like C macros counts as "cheating" when writing BlitzMax code?).