C to monkey2 converter 95% ready

Community Forums/Monkey2 Talk/C to monkey2 converter 95% ready

GC-Martijn(Posted 2015) [#1]
edit, I can't post the full topic.
Do it in multiple.


In my previous topic here http://www.monkey-x.com/Community/posts.php?topic=9953&post=108371
I say that one monkey2 power is to extend with third party C modules.
Without any problem and with one click.

What I made is a C parser to Monkey2 (alpha version 00.1)
There are some small issues before its 100% working and I need some info from mark.

the steps are simple.
1. download a C lib, for example the latest SDL2 https://www.libsdl.org/download-2.0.php
2. use the parser (not ready yet) (point to the tar/zip file)
3. give the monkey path, or copy the output files manual to the modules dir.

example output for the SDL2 C Lib

1.) din't check if all functions/structs/enums are 100% oke.
2.) I did only parse using the prefix SDL_


GC-Martijn(Posted 2015) [#2]
deleted because this was a big old output


GC-Martijn(Posted 2015) [#3]
deleted because this was a big old output


GC-Martijn(Posted 2015) [#4]
deleted because this was a big old output


GC-Martijn(Posted 2015) [#5]
The non 5% working issue is this

1. can't use all reserved words...
I don't have the time now to think how to handle this, I can rename it to _something
But I think that stuct is not working then anymore
Struct SOMETHING
	Field start:Int
	'Field end:Int  <=== end is a reserved word in monkey
End


2. A struct using a Field struct
Did not test this yet, but I guess its not working (yet)
Struct StuctA
	Field start:Int
End
Struct StructB
	Field bla:StructA
End


3. There where some duplicated Fields inside the struct with the same name but different Data Types.
Din't check the actual C code yet, if this is possible inside C
Struct StructB
	Field bla:StructB
        Field bla:Int
End



Need Marks info about this.
1.) are you going to make a C converter ? (then I can stop :))
2.) I don't know if this is possible, but make the Stuct and Enum dynamic without the need to give all the possible fields.

Then there is no problem with reserved names, and its easy to parse C libs.

For example:

this is the new current output
Struct SDL_Surface
	Field flags:Int
	Field format:SDL_PixelFormat
	Field w:Int
	Field pitch:Int
	Field pixels:void
	Field userdata:void
	Field locked:Int
	Field lock_data:void
	Field clip_rect:SDL_Rect
	Field map:Void
	Field refcount:Int
End


In your example code you do this
Struct SDL_Surface
	Field format:SDL_PixelFormat Ptr
End


I want this
Struct SDL_Surface
End



marksibly(Posted 2015) [#6]
> 1.) are you going to make a C converter ? (then I can stop :))

Well, I was thinking about it, but may not have to now! I had a quick look at SWIG, but it's pretty complex. Maybe later...

> 'Field end:Int <=== end is a reserved word in monkey

You'll need to do something like:

Field end_:Int="end"

So the compiler knows the 'real' name of end_.

> Field format:SDL_PixelFormat

You need the 'Ptr' at the end here because this is exactly what 'format' is - a pointer to a SDL_PixelFormat struct, not an instance of the struct.


therevills(Posted 2015) [#7]
Is MX2 going to support DLLs? If so would this mean a C to MX2 wouldnt be needed?


marksibly(Posted 2015) [#8]
Depends what you mean by 'support DLLs'.

If you mean 'load openall.dll' or similar AND you have access to the 'static link lib' for the dll, then yes, no problem. However, some kind mx2 wrapper will still be needed so mx2 knows what the dll contains, and the easiest way to do this is probably just to run a c2mx2 convertor on the lib's header files.

If you mean 'load openall.dll' and you don't have access to the static link lib, then you'll need to 'roll your own' using a bunch of GetProcAddress calls or similar. A c2mx2 util could also probably generate this for you based on the lib header files.

If you mean 'load dlls written in monkey', this is also doable but is in the 'later' pile. However, I am keeping it in mind.


Richard Betson(Posted 2015) [#9]
Neat stuff. :) +1


GC-Martijn(Posted 2015) [#10]
I have < 3 hours time this week, so I guess next week I have a valid converter (98% ready).

@marksibly
The first thing I did was trying SWIG but that was overkill and din't work how I think it needs to work ;)
Field end_:Int="end" 
Yea i'm happy with this



GC-Martijn(Posted 2015) [#11]
@marksibly

I'm stuck because monkey2 enums has this bug
Enum SDL_TextureModulate
	SDL_TEXTUREMODULATE_COLOR=1,
	SDL_TEXTUREMODULATE_ALPHA=-2,
	SDL_TEXTUREMODULATE_NONE=0
End

Enum initializer must be an int literal

Could you fix that -ints are oke for Enums

there are a few things to tackle but then the C to MX2 converter is ready.
i'm now looking what to do with typedef for example this C code:
typedef void (SDLCALL * SDL_AudioCallback) (void *userdata, Uint8 * stream,int len);


bug 2 (not really but that you know it)
Struct SDL_Surface
	'Field format_:SDL_PixelFormat Ptr="format"
	Field format:SDL_PixelFormat Ptr
End

When I use the first format_ to avoid forbidden names, then the program don't detect it using this.

Local format:=surface[0].format


In this case its not a forbidden name, but when it was then there was a problem.


marksibly(Posted 2015) [#12]
> typedef void (SDLCALL * SDL_AudioCallback) (void *userdata, Uint8 * stream,int len);

It work work in the current release, but the mx2 equivalent of this is:

Alias SDL_AudioCallback:Void( userdata:Void Ptr,stream:Int Ptr,len:Int )

This declares a new 'type' that can be used with vars etc, eg:

Function SDL_StartAudio:Void( callback:SDL_AudioCallback ) 'just an example, no idea if this func exists!


GC-Martijn(Posted 2015) [#13]
@marksibly

I created my first git here :) maybe someone wants to help :)
I really don't know what will happen when someone does haha, but I guess it will work after the sync button :)

https://github.com/gcmartijn/C2MX2

The first goal is to get the full SDL2 lib working, after that it will be 'easy' to create many C bindings for Monkey2

Example output:
https://github.com/gcmartijn/C2MX2/tree/master/example/sdl3

(I use the name sdl3, to test it in monkey, and not to overwrite the sdl2 lib)

The enums with a negative number problem exists, but I don't know if the enum needs a value or if its a placeholder
But maybe you can see other things that need te be fixed ?

I don't have a C background nor monkey2 :)
Some quick questions:
- does the Cons need to have the default C value (I saw you din't do that)
- does the enum need to have the default C value (negative numbers gives a error now)
the main rule in the converter is : "if its not a monkey2 type (int/float/string/void) then its a Ptr


marksibly(Posted 2015) [#14]
Looking good!

> does the Cons need to have the default C value (I saw you din't do that)

You mean 'Const'?

No, if it's Extern, Const is really just a symbol that gets dumped into the c++ code. Since the c++ code includes the sdl.h files, the const values are already defined there so we don't need to define them again.

> does the enum need to have the default C value (negative numbers gives a error now)

I haven't attempted to get extern enums working yet. You'll notice that in my little sdl2 wrapper I just converted all enums to consts. This 'sort of' works, because C enums aren't really scoped - you can just use the name of the enum directly pretty much as if it were a const. More work to be done here, but in general enums probably wont need initializer values either, as long as they're declared 'extern'.

To get something 'good enough' going, you could drop 'enum' altogether and convert enum members to plain consts, eg: this...

Enum SDL_TextureModulate
	SDL_TEXTUREMODULATE_NONE=0,
	SDL_TEXTUREMODULATE_ALPHA=2,
	SDL_TEXTUREMODULATE_COLOR=1
End


...becomes this...

Const SDL_TEXTUREMODULATE_NONE:Int
Const SDL_TEXTUREMODULATE_ALPHA:Int
Const SDL_TEXTUREMODULATE_COLOR:Int


I think it'd be a good idea to retain the ability to emit proper enums though, and to determine enum/const initializers where possible for the future. If we want it to be possible to ship modules without C/C++ source code (eg: just a blah.lib and a blah.mx2), then we WILL need to know the values of all consts and enums.

> the main rule in the converter is : "if its not a monkey2 type (int/float/string/void) then its a Ptr

That wont work everywhere, eg:

typedef struct SDL_Surface
{
    //lots chopped out
    SDL_PixelFormat *format;
    void *pixels;
    SDL_Rect clip_rect;
} SDL_Surface;


Here, 'format' is a pointer (because there's a '*' before it) while 'clip_rect' is not (because there's no '*'). Also, monkey2 types may need a Ptr, eg: the 'pixels' field above. Here's the above in monkey2:

Struct SDL_Surface
   Field format:SDL_PixelFormat Ptr
   Field pixels:Void Ptr
   Field clip_rect:SDL_Rect
End



Danilo(Posted 2015) [#15]
Any news about MX2 basic data types? (Byte, Short, Ascii, Unicode, Char, Long, Int, Int8, Int16, Int32, Int64, Quad, Float, Double, ...)

It is especially important to have fixed 'Int32' and 'Int64' types. Could be 'Long' and 'Quad', for example.
And one 'Int' type that is 32bit when compiled with 32bit C++ compilers, and 64bit with 64bit compilers.

WinAPI uses many DWORD and LONG parameters, that are always 32bit, even with 64bit compiler.
On Mac and Linux it's often different sizes in 32bit and 64bit compilation mode.


marksibly(Posted 2015) [#16]
I haven't given this too much thought as yet, but as far as I can tell most 64 bit OSes use either the LP64 or LLP64 data model, see:

https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models

In these models, ints are still 32 bit. Only pointers (and either 'longs' or 'long longs') are 64 bit. This has been my experience messing around on 64 bit Mac/Windows/Linux recently.

So I think we should keep it simple:

Byte = 8 bits
Short = 16 bits
Int = 32 bits
Long = 64 bits
Float = 32 bits
Double = 64 bits
Char = (byte, short or int depending on a build setting?)

(There may be more/longer types eventually, but this is a good starting point).

So Int and Long are our 'fixed' 32 and 64 bit types, and there are no 'unfixed' types.

The size of int only really affects how data is stored. Bytes and shorts are always 'promoted' to ints when math is performed (which is a NOP if they're already in a local/register) so the choice of 8, 16, 32 or 64 bit ints only affects how much memory objects/arrays consume and struct layout. Which is of course important if you're interacting with an extern API.

If we ever want to support 'HAL Computer Systems port of Solaris to SPARC64' (never say never!) which has 64 bit ints, there will hopefully be an int32 c/c++ typedef available for use to use as a monkey 'int'. Failing that, the translator could do 32 bit int loads/stores manually. Math would occur in 64 bit and ints not already in registers would require promotion along with bytes/shorts, but that shouldn't be a problem.


Samah(Posted 2015) [#17]
@marksibly: Char = (byte, short or int depending on a build setting?)

I think you're best making char 16-bit for better wide character support. Also, have you decided on whether it will support explicitly unsigned types?


ImmutableOctet(SKNG)(Posted 2015) [#18]
I agree with Samah, +1 for Char being 16-bit (At least as the default). As for consistent sizes, I assume you'll be supporting C++11 and onward, if not C++14 and onward. If that's the case, could use the 'cstdint' header. Though, depending on your settings with g++, that could cause problems. Plus, support might be a problem on some hardware platforms. Perhaps the best option is to go with the usual semi-loose type definitions?

Something like this:
* 'Char' is system or compiler defined.
* 'Byte' is at least 8 bit.
* 'Short' is at least 16 bit.
* 'Int' is at least 32-bit. (And commonly defined as such)
* 'Long' is normally at least 64-bit, but can be configured to be smaller on systems that don't support it...? (May just be easier to say it's not always portable)

At least with C++11, 'long long' is defined as at least 64-bit. Though, the traditional size of 'int' has been semi-defined as 32-bit, even though it's supposed to be at least 16. And of course, 'long' in C++ is usually 32-bit on Windows, but all the other vendors are fine making it 64-bit.

There's also the question of extensions/external types. It would be nice to be able to declare an 'Alias' as external/native. That way, we could theoretically have types like 'double double'. By the way, any thoughts on a definitive size-type? One that could be signed or unsigned, but would be the most effective for storing something's size. Basically, I'm asking if we'll see 'size_t'.


marksibly(Posted 2015) [#19]
> Also, have you decided on whether it will support explicitly unsigned types?

I'm still firmly in the 'unsigned is evil' camp, as it has all sorts of weird effects, eg:

If x-1>x '...if x is unsigned, this can be true!
While i>=0 '...if i is unsigned, this loop will never exit!

...and many, many more. And you can end up with unsigned expressions if ANY of the subexpressions are unsigned so it can quite easy for an expression to become 'infected' with unsigned-ness without you knowing it. I stopped using unsigned ages ago, and have NEVER missed it.

There is probably only one situation I can think of where signed/unsigned might be useful, and that's for controlling how bytes/shorts are promoted to ints.

For example, DataBuffer.PeekByte() in monkey1 sign extends the byte when you load it. This effectively makes the data in a databuffer signed. If you want to treat it as unsigned, then you need to use PeekByte(blah) & 255.

IMO, this is an acceptable solution, but here are alternatives that don't involve going 'full on' unsigned, including:

* Change byte/short to be unsigned instead of signed. You'd still need to use '& 255' when dealing with 'signed' data so this just flips the problem, while introducing an 'exception' to the language. 'All data is signed' is IMO better than 'All data is signed except for bytes/shorts'. Some byte data is signed, some is unsigned, so I don't think there's a 'right' approach here, in which case KISS wins. But then, perhaps this is the one case where the 'extra bit of storage' argument makes sense?

* Allow unsigned for just byte/short. An extremely practical approach, but I suspect it'd be controversial - "if I can have unsigned bytes/shorts, why not ints?!?".

The current situation is still 'all data is signed, and if you're working with low level stuff you may have to use & now and then...' but I'm still thinking this stuff over.


marksibly(Posted 2015) [#20]
> I think you're best making char 16-bit for better wide character support.

16 bit chars are OK for most apps, but for full unicode support we need at least the option of 32 bit chars don't we?


Samah(Posted 2015) [#21]
@marksibly: I'm still firmly in the 'unsigned is evil' camp...

Enforced signedness is one of the things I hate about Java. It means that reading a byte from a stream must return something greater than 8 bits to prevent negative values. In some terrible design decision they went for 32-bit, and streams are big-endian by default.

If x-1>x '...if x is unsigned, this can be true!
While i>=0 '...if i is unsigned, this loop will never exit!

In each of these examples the developer would see the variable declaration and know immediately whether it was unsigned. If it's a return value from another method, they really should be reading the documentation... XD

@marksibly: There is probably only one situation I can think of where signed/unsigned might be useful, and that's for controlling how bytes/shorts are promoted to ints.

Such as supporting both big- and little-endianness.

@marksibly: ...sign extends the byte when you load it...

That's fine for 8-bit, but how do you extend a 64-bit integer to ignore the sign?

@marksibly: Allow unsigned for just byte/short.

This is probably the best compromise, as it removes the need for stupid workarounds as outlined above, and small unsigned values are much more useful than large ones.

@marksibly: 16 bit chars are OK for most apps, but for full unicode support we need at least the option of 32 bit chars don't we?

Then you should probably allow for Long Char as 32-bit, or they can just use Int.


Nobuyuki(Posted 2015) [#22]
Why exactly are Chars needed as a type primitive? I'm only hearing about the disadvantages. If they do end up being used, however, wide chars should use 32 bits by default, because 16-bit wide chars require supplemental pairs to handle unicode extended planes. This is 2015, not 1994, so not being able to handle that case scenario is kinda unacceptable.


GC-Martijn(Posted 2015) [#23]
I think the first question is: for who is monkey2
- for beginners/basic = keep it simple Int,Float
- for beginners and pro's = use extra types: Byte,Short,Int,Long,Float,Double
- only for pro's = maybe the other things


dawlane(Posted 2015) [#24]
16 bit chars are OK for most apps, but for full unicode support we need at least the option of 32 bit chars don't we?
True. Somewhere in the unicode specification it tells you why wchar_t shouldn't be used in portable code.
Unincode Standard


Gerry Quinn(Posted 2015) [#25]
Just thinking aloud...

Could many of these problems be solved by adding some functions like Databuffer.PeekByteUnsigned:Int( address:Int )?


tiresius(Posted 2015) [#26]
I thought wchar_t in C++ was what we should be using and what I thought Mark was hinting at making Monkey Char behave as it.
So confused... :-/


Pharmhaus(Posted 2015) [#27]

If x-1>x '...if x is unsigned, this can be true!
While i>=0 '...if i is unsigned, this loop will never exit!


Would it be possible to change trans so that a conversion from int to uint and vice versa needs to be stated explicitly?
It is still better to me than no unsigned at all and it would be way easier to spot errors like these.


Danilo(Posted 2015) [#28]
'x-1' is valid with both, signed and unsigned, number types. Both wrap around, so I don't see a problem here. It is known behavior.

If you make Unicode strings with 32-bit characters, you need to convert it for every system API function call (forth and back), because most
systems don't use 32-bit chars directly. Linux is the only one, AFAIK.
I think UTF-16 is the most used today. It saves space and memory, compared to UTF-32. For english texts ASCII is usually enough,
and with UTF-32 you always quadruplicate the required string space/memory. With UTF-16 it's only doubling space/memory consumption,
most of the time. See also UTF-16: Usage.

UTF-16 can use multiple Words to encode one character, for some extended languages.
There is a limited subset of UTF-16. It's called UCS-2 (sometimes UCS-16) and is limited to a Word (2 Bytes)
for one character. It supports the Basic Multilanguage Plane only.
That means one characters is always 2 bytes long. UCS-2 can already display many languages, but not all.

For all characters, UTF-16 should be the best choice for most platforms. But if you fully support UTF-16, there must be functions
to differentiate string.ByteLen() and string.Len(). With the limited UCS-2 set, byteLen is always string.Len() * 2.

In ASCII compilation mode, additional support for UTF-8 could be helpful, but that's library functions,
no built-in type. UTF-8 is compatible to 7-bit ASCII and uses 1 up to 4 bytes to encode all Unicode characters.


Nobuyuki(Posted 2015) [#29]
I don't think that the fact some OS's still use ancient UCS-2 encoding is a good enough reason to deliberately gimp wide chars. But then again, I'm still wondering why char types are necessary. Is this too silly a question to be addressed? As far as I can tell, they're the same thing as int values, just with some different handling in c-style syntax, which mx2 isn't in the first place. Do chars offer some intrinsic value to the language other than some syntax tricks which can offset a whole mess of target-specific baggage?


Danilo(Posted 2015) [#30]
For system API programming and importing many different (C/C++/System) libraries, it makes sense to think about it, IMO.
It is also important for low-level pointer-access (which is basically what the systems and libs do).
*charPointer + sizeOf(Char) to access the next character, or variable-length encodings, where the next character could be at
*charPointer + 1 or *charPointer + 2 or *charPointer + 3 or *charPointer + 4.
Most underlying things (library internals, behind the scene) is interaction with the system libraries. So, if the language uses
32-bit strings internally, and almost all other systems require UTF-16 encoded strings, you have the convert language strings
to system strings back and forth all the time. That's for all system calls. CreateWindow, GetText, GetWindowTitle, etc...

I'm using Unicode and 64-bit compilation for many, many years. 64-bit started with XP Professional for me, round about 10 years ago.
With PureBasic, I make my codes compilable in Ascii/Unicode and 32bit/64bit mode, most of the time.
It's just a compiler switch, and should just work, most of the time. And, I'm doing custom work using PB (side job),
where I always try to make sure it works with Ascii/Unicode and x86/x64, and Windows+Linux+MacOS,
and any combinations of that. I think my customers are happy with the work, according to the feedback. ;)
A good C++ code can just be re-compiled using a different compiler switch, and will work in ASCII/Unicode mode, and 32bit + 64bit compilation mode.

But, at the same time, I'm still aware that it's a very complicated topic nonetheless, especially when it comes to different underlying operating systems,
and things like supporting right-to-left languages. Even with programming language systems that actively support Unicode -
there are almost always some problems... when it comes to conversations between encodings,
and when it comes to certain languages (Vietnamese, Indian languages, Arabic, etc).

Using 32bit chars isn't such a big problem at all. SizeOf(Char) is always 4 (bytes), and that's very nice, even for pointer access.
It's just a waste of memory and space - and on many systems you have to always use conversion functions to exchange (give to and get from)
the underlying system. Every system function call requires a string conversion in this case. If the system (like Windows) does not
understand 32-bit null-terminated strings, you have to convert it to UFT-16, for example. That's slow and occupies system resources, if
you need to convert every simple string. MX2 isn't a closed system in itself. Internally, 95%+ is interaction with the underlying systems,
so I consider this to be important to think about.


dawlane(Posted 2015) [#31]
I think UTF-16 is the most used today.
I think that UTF-8 would be the most used when you throw in text files and the Internet.
When using Qt and QChar the internal encoding is UTF-16, but saving text is converted to UTF-8 or what ever you choose. GTK expects UTF-8 strings.
Here's a little blog that I stumbled across about unicode and applications, with the pro's and con's.


DruggedBunny(Posted 2015) [#32]

Would it be possible to change trans so that a conversion from int to uint and vice versa needs to be stated explicitly?
It is still better to me than no unsigned at all and it would be way easier to spot errors like these.


Still think this would be a good way of handling both signed and unsigned types...


GC-Martijn(Posted 2015) [#33]
Not to be rude haha but this topic is about the C binding creator ;) :)

@marksibly
I'm now at the point that the debugger don't give me information what went wrong.
When I run this code:
Using sdl3
Global window:SDL_Window Ptr
Local init:=SDL_Init( SDL_INIT_VIDEO )
	window=SDL_CreateWindow( "SDL2 Window",SDL_WINDOWPOS_UNDEFINED,SDL_WINDOWPOS_UNDEFINED,640,480,SDL_WINDOW_OPENGL )
	While window<>Null
	'	main_loop()
	'	SDL_Delay( 10 )
	Wend


using this file:
https://github.com/gcmartijn/C2MX2/tree/master/example/sdl3

Ted will say:
Parsing...
Semanting...
2 error(s):
Expression is not a type expression
Expression is not a type expression
Done.


would love to know what line that is.

Edit: I know its something inside the sdl3.monkey2 file but what ?


Danilo(Posted 2015) [#34]
> Not to be rude haha but this topic is about the C binding creator ;) :)

Sorry, but how do you want to create such a conversion tool, if decisions like
basic supported data types in MX2 are not decided yet? It does not make sense!

MX2 is still in the pre-planning stage. The most basic things are not even planned yet,
and you want to code a tool for those theoretical language, that's not even fully planned... !?


GC-Martijn(Posted 2015) [#35]
yep, as ypu can see its almost working, and c have a small set of types. its only a small function to create the bindings.

for now a sint,int,uint is just a int.
if monkey2 will get sint, then its one line to change


Danilo(Posted 2015) [#36]
An almost working thing, for a theoretical language that hasn't even fully planned yet? :D


GC-Martijn(Posted 2015) [#37]
yep, did you check the sdl2 example inside monkey2??
then you can see that you could create a game now at this moment using only those c sdl2 bindings.

sdl2 has everything inside to create a game.

only waiting for some monkey2 things.
i agree and understand that monkey2 is not ready, but using extern libs is almost done.
and easy to modify.