Code archives/Miscellaneous/BlitzMax Lexer Module
This code has been declared by its author to be Public Domain code.
Download source code
| |||||
This sourcecode is now available under the zlib license at http://github.com/nilium/cower.bmxlexer This doesn't mean you have to go there to get the source, but this code has some bugs in it, and in the interest of migrating people away from this code and towards the code that's under version control, please go to the above URL. I originally wrote this in Ruby, but there is a rather annoying issue with writing any code in Ruby: using it anywhere else is an immense pain. If you've ever had to work with the C API to embed Ruby in something, you're probably aware of this. You may also be insane if you're going "I did it and I thoroughly enjoyed the experience." I can't help those people, they're clearly lost causes. Anyhow, so I ported the code to C, and overall I think it's an improvement because it's a little less messy. There's not a lot of comments — mostly in the BlitzMax code just because BlitzMax sucks at actually working with C code and sometimes I need to make a note about what type something really is. The C API is private in this, mostly because I think most BlitzMax users would find it terrifying even if it's relatively simple. The BlitzMax API is fairly simple, I don't think I need to explain what each method does or what the fields of something are. If it has an _ before it, you don't touch that, fairly simple. If you need to parse BlitzMax code, this is probably a decent starting point so you don't have to concern yourself with the annoying string parsing crap you'd otherwise have to do and just focus on structure and chunks of code. If you want to tweak the lexer to match certain other things, it's probably fairly easy to do and could be a decent starting point for something else (most of what you'd change would likely be covered by the token singles/pairs arrays and changing those to match your own preferences - case sensitivity options are in there, so you could work that in as well). On a side-note about "additional things," this will recognize certain keywords that are not keywords in BlitzMax, include Protocol, EndProtocol (and its spaced variant), and Implements. It's not hard to remove these, but I've left them in partly because I use that code and partly because it'll illustrate how you can create additional tokens fairly easily. However, bear in mind that I've only supported combining ordered pairs of tokens. Anything beyond that isn't really needed. Anyhow, the C side of things... lexer.h lexer.c | |||||
SuperStrict Module Cower.BMXLexer ModuleInfo "Name: BlitzMax Lexer" ModuleInfo "Description: Wrapped lexer for BlitzMax source code" ModuleInfo "Author: Noel Cower" ModuleInfo "License: Public Domain" Import "lexer.c" Private Extern "C" Function lexer_new@Ptr(source_begin@Ptr, source_end@Ptr) Function lexer_destroy(lexer@Ptr) Function lexer_run:Int(lexer@Ptr) Function lexer_get_error$z(lexer@Ptr) Function lexer_get_num_tokens:Int(lexer@Ptr) Function lexer_get_token:Int(lexer@Ptr, index%, token@Ptr) ' Function lexer_copy_tokens@Ptr(lexer@Ptr, num_tokens%Ptr)'unused Function token_to_string@Ptr(tok@Ptr) Function free(b@Ptr) End Extern Public Type TToken Field kind% ' token_kind_t Field _from:Byte Ptr ' const char * Field _to_:Byte Ptr ' const char * Field line% ' int Field column% ' int Field _cachedStr$=Null Method ToString$() If _cachedStr = Null Then Local cstr@Ptr = token_to_string(Self) _cachedStr = String.FromCString(cstr) free(cstr) EndIf Return _cachedStr End Method '#region token_kind_t Const TOK_INVALID% = 0 Const TOK_ID% = 1 Const TOK_END_KW% = 2 Const TOK_FUNCTION_KW% = 3 Const TOK_ENDFUNCTION_KW% = 4 Const TOK_METHOD_KW% = 5 Const TOK_ENDMETHOD_KW% = 6 Const TOK_TYPE_KW% = 7 Const TOK_EXTENDS_KW% = 8 Const TOK_ABSTRACT_KW% = 9 Const TOK_FINAL_KW% = 10 Const TOK_NODEBUG_KW% = 11 Const TOK_ENDTYPE_KW% = 12 Const TOK_EXTERN_KW% = 13 Const TOK_ENDEXTERN_KW% = 14 Const TOK_REM_KW% = 15 Const TOK_ENDREM_KW% = 16 Const TOK_FLOAT_KW% = 17 Const TOK_DOUBLE_KW% = 18 Const TOK_BYTE_KW% = 19 Const TOK_SHORT_KW% = 20 Const TOK_INT_KW% = 21 Const TOK_STRING_KW% = 22 Const TOK_OBJECT_KW% = 23 Const TOK_LOCAL_KW% = 24 Const TOK_GLOBAL_KW% = 25 Const TOK_CONST_KW% = 26 Const TOK_VARPTR_KW% = 27 Const TOK_PTR_KW% = 28 Const TOK_VAR_KW% = 29 Const TOK_NULL_KW% = 30 Const TOK_STRICT_KW% = 31 Const TOK_SUPERSTRICT_KW% = 32 Const TOK_FRAMEWORK_KW% = 33 Const TOK_MODULE_KW% = 34 Const TOK_MODULEINFO_KW% = 35 Const TOK_IMPORT_KW% = 36 Const TOK_INCLUDE_KW% = 37 Const TOK_PRIVATE_KW% = 38 Const TOK_PUBLIC_KW% = 39 Const TOK_OR_KW% = 40 Const TOK_AND_KW% = 41 Const TOK_SHR_KW% = 42 Const TOK_SHL_KW% = 43 Const TOK_SAR_KW% = 44 Const TOK_MOD_KW% = 45 Const TOK_NOT_KW% = 46 Const TOK_WHILE_KW% = 47 Const TOK_WEND_KW% = 48 Const TOK_ENDWHILE_KW% = 49 Const TOK_FOR_KW% = 50 Const TOK_NEXT_KW% = 51 Const TOK_UNTIL_KW% = 52 Const TOK_TO_KW% = 53 Const TOK_EACHIN_KW% = 54 Const TOK_REPEAT_KW% = 55 Const TOK_FOREVER_KW% = 56 Const TOK_IF_KW% = 57 Const TOK_ENDIF_KW% = 58 Const TOK_ELSE_KW% = 59 Const TOK_ELSEIF_KW% = 60 Const TOK_THEN_KW% = 61 Const TOK_SELECT_KW% = 62 Const TOK_CASE_KW% = 63 Const TOK_DEFAULT_KW% = 64 Const TOK_ENDSELECT_KW% = 65 Const TOK_SELF_KW% = 66 Const TOK_SUPER_KW% = 67 Const TOK_PI_KW% = 68 Const TOK_NEW_KW% = 69 Const TOK_PROTOCOL_KW% = 70 Const TOK_ENDPROTOCOL_KW% = 71 Const TOK_AUTO_KW% = 72 Const TOK_IMPLEMENTS_KW% = 73 Const TOK_COLON% = 74 Const TOK_QUESTION% = 75 Const TOK_BANG% = 76 Const TOK_HASH% = 77 Const TOK_DOT% = 78 Const TOK_DOUBLEDOT% = 79 Const TOK_TRIPLEDOT% = 80 Const TOK_AT% = 81 Const TOK_DOUBLEAT% = 82 Const TOK_DOLLAR% = 83 Const TOK_PERCENT% = 84 Const TOK_SINGLEQUOTE% = 85 Const TOK_OPENPAREN% = 86 Const TOK_CLOSEPAREN% = 87 Const TOK_OPENBRACKET% = 88 Const TOK_CLOSEBRACKET% = 89 Const TOK_OPENCURL% = 90 Const TOK_CLOSECURL% = 91 Const TOK_GREATERTHAN% = 92 Const TOK_LESSTHAN% = 93 Const TOK_EQUALS% = 94 Const TOK_MINUS% = 95 Const TOK_PLUS% = 96 Const TOK_ASTERISK% = 97 Const TOK_CARET% = 98 Const TOK_TILDE% = 99 Const TOK_GRAVE% = 100 Const TOK_BACKSLASH% = 101 Const TOK_SLASH% = 102 Const TOK_COMMA% = 103 Const TOK_SEMICOLON% = 104 Const TOK_PIPE% = 105 Const TOK_AMPERSAND% = 106 Const TOK_NEWLINE% = 107 Const TOK_ASSIGN_ADD% = 108 Const TOK_ASSIGN_SUBTRACT% = 109 Const TOK_ASSIGN_DIVIDE% = 110 Const TOK_ASSIGN_MULTIPLY% = 111 Const TOK_ASSIGN_POWER% = 112 Const TOK_ASSIGN_SHL% = 113 Const TOK_ASSIGN_SHR% = 114 Const TOK_ASSIGN_SAR% = 115 Const TOK_ASSIGN_MOD% = 116 Const TOK_ASSIGN_XOR% = 117 Const TOK_ASSIGN_AND% = 118 Const TOK_ASSIGN_OR% = 119 Const TOK_ASSIGN_AUTO% = 120 Const TOK_DOUBLEMINUS% = 121 Const TOK_DOUBLEPLUS% = 122 Const TOK_NUMBER_LIT% = 123 Const TOK_HEX_LIT% = 124 Const TOK_BIN_LIT% = 125 Const TOK_STRING_LIT% = 126 Const TOK_LINE_COMMENT% = 127 Const TOK_BLOCK_COMMENT% = 128 Const TOK_EOF% = 129 Const TOK_LAST%=TOK_EOF Const TOK_COUNT%=TOK_LAST+1 '#endregion End Type Type TLexer Field _lexer@Ptr ' lexer_t Field _run:Int = False Field _cstr_source@Ptr Field _length% Field _tokens:TToken[] Field _error:String = Null Method InitWithSource:TLexer(source$) Assert _cstr_source=Null Else "Lexer already initialized" _cstr_source = source.ToCString() _length = source.Length _lexer = lexer_new(_cstr_source, _cstr_source+_length) Return Self End Method Method Delete() If _cstr_source Then MemFree(_cstr_source) EndIf If _lexer Then lexer_destroy(_lexer) EndIf End Method Method Run:Int() Assert _run = False Else "Lexer has already run" _run = True Local r% = lexer_run(_lexer) If r <> 0 Then _error = lexer_get_error(_lexer) EndIf Return (r=0) End Method Method _cacheTokens() If _tokens = Null Then _tokens = New TToken[lexer_get_num_tokens(_lexer)] For Local init_idx:Int = 0 Until _tokens.Length _tokens[init_idx] = New TToken lexer_get_token(_lexer, init_idx, _tokens[init_idx]) Next EndIf End Method Method GetToken:TToken(index%) _cacheTokens() Return _tokens[index] End Method Method GetTokens:TToken[]() _cacheTokens() Return _tokens End Method Method NumTokens:Int() If _tokens Then Return _tokens.Length EndIf Return lexer_get_num_tokens(_lexer) End Method Method GetError$() Return _error End Method End Type |
Comments
| ||
bmk doesn't seem to want to compile it. There is no error message because bmk is sh*t. Removing the module stuff and compiling as an exe results in lexer.c:(.text+0x35a): undefined reference to `asprintf' |
| ||
What OS are you using? Edit: Looks like asprintf is something like a GNU/BSD extension. MinGW apparently lacks it for some reason, but whatever. Easy enough to fix... |
| ||
Should be fixed now. |
| ||
I've updated this to fix an amazingly stupid bug in lexer_asprintf. Also an.. oddity.. in the function for checking singles. I'm still not sure how to explain that one. |
| ||
Hey Nilium, small (?) request from my side. As This is a useful module (it's used by Maximus) could you put it on GitHub? |
| ||
Sure, I'll throw it up there now. Only downside is I haven't been using any version control for it, so previous versions will be lost. Edit: Additionally, this will be covered by a license other than public domain on github (zlib). |
| ||
At least now some version history can be made :-). Doesn't GitHub allow Public Domain? Anyhow, much appreciated. |
| ||
This sourcecode is now available under the zlib license at http://github.com/nilium/cower.bmxlexer Doesn't GitHub allow Public Domain? It does, but I'd rather have the zlib license attached to it if I'm moving it elsewhere. Either that or BSD, but I picked zlib for this. |
Code Archives Forum