Competition Time! (Cash Prize!) :-D

BlitzMax Forums/BlitzMax Programming/Competition Time! (Cash Prize!) :-D

Brucey(Posted 2014) [#1]
Hello Blitzers!

I would really (really!) like a fully-working GLMax2D/GLGraphics module which support OpenGL 2+ and OpenGLES 2 - i.e. one which does away with glBegin() and glEnd(), instead using arrays and shaders.

OpenGLES 2 compatibility gives us direct support for rendering on Raspberry Pi, iOS, Android and Angle (the GL->D3D wrapper that is used by Chrome and Firefox on Windows to render WebGL)

Since I am rubbish at coding up such things myself, I have decided to open up a competition, in the hope that someone will help by writing one for the community :-)

In order to add a little incentive, I am offering 100 GBP (that's pounds Sterling. Actual Dollar values may vary at the time of conversion) to the successful winner, to be paid via PayPal on completion and validation of the project.

The resulting work will be open sourced using the zlib/libpng license for all the use as they want.

I know that some of you in the community think writing such a thing is very, very easy, but you know, it's all very well saying "yeah, I could knock one together over the weekend", without actually knocking one together over the weekend.

Words are just vapour-ware after all.

You can collaborate if you like and I can divide the prize amount between you fairly if you wish.

So, in summary, the community would like :

* A new set of compatible modules (i.e with the same exposed APIs) as BRL.GLMax2D and BRL.GLGraphics, but using much more modern OpenGL APIs
* It should be functionally compatible with OpenGL 2+ and OpenGLES 2.
* zlib licensed
* Running on the 3 core platforms (Windows, OS X, Linux) - If you only have one to build on, I can test on all three for you.
* Something as efficient as the current code would be nice. Something faster would of course be better ;-)

The competition will run to the end of December 2014, or before if someone provides a completed project in that time. After the expiry date, I will sadly have to revoke the prize and feel lost and empty.


Feel free to post questions, comments, idea, etc here, (or via email if you wish. Address is in my profile)

Thanks to everyone's support over the years.

Go Community!!


LT(Posted 2014) [#2]
Very nice of you to offer a cash reward, but I thought you already had something very close to this..?


Brucey(Posted 2014) [#3]
I thought you already had something very close to this..?

No, it's not quite there, and the stuff I was hacking was somewhat messy.
Feel free to take what's already been done and get it fully working.

I'm just after a final, working product.


LT(Posted 2014) [#4]
Okay, but it would help to know what specifically is missing. The only part I'm aware of is that textures do not work properly with multi-frame images.


JoshK(Posted 2014) [#5]
So you need this, working with OpenGLES 2?:
http://www.blitzmax.com/bmdocs/command_list_2d_cat.php?show=Graphics%20-%20Max2D

The hard part is actually initializing an OpenGL context on each OS. The existing GLMax2D will work with OpenGL ES with just a few basic shaders and some small changes.

On Android, for example, I had a huge Java code file with all these C callbacks. I have no idea how you would get something like that running with your setup.


LT(Posted 2014) [#6]
What's wrong with making the SDL context the default?


Brucey(Posted 2014) [#7]
The hard part is actually initializing an OpenGL context on each OS.

SDL can handle the context management on the other platforms.


LT(Posted 2014) [#8]
Does anything need to change in GLGraphics? I can go through the list of commands in GL2Max2D and make sure they all work.


JoshK(Posted 2014) [#9]
OpenGL ES 2.0 always uses shaders, and I don't think it has fixed-function stuff at all. It's easiest to implement an OpenGL 2.1 renderer using the same approach, and then just make a few small changes for ES. This can be done by using some preprocessor macros in the shaders so the same shader can be used for either.

Debugging OpenGLES is painful, so it's best to test with GL on a PC and then add some if statements for the bits that act differently on ES.


LT(Posted 2014) [#10]
This is Brucey's GL2Max2D thread. The new module is already using shaders, but it's incomplete.

http://www.blitzmax.com/Community/posts.php?topic=103193

Besides using shaders instead of fixed function and arrays instead of glBegin and glEnd, I'm not sure of all the requirements.


JoshK(Posted 2014) [#11]
I will do it because GTKMaxGUI saved my ass, and I use a lot of your modules, and MaxIDE 64 for Linux is really nice:
-Timeframe is probably ~3 months.
-I won't deal with SDL, context initialization, or buffer swapping, as these vary from platform to platform.
-I will only implement OpenGL 2.1, the right way, and then make changes for OpenGL ES based on what I think needs to be done. No testing on OpenGLES will be performed.
-The stuff I do will be fairly forward-compatible, and it will all use the fast-track path, but maintain maximum flexibility. No textures atlases, but it will use VBOs.

I already have this stuff written in Leadwerks 3.0 so it's not hard. Does your compiler support IncBin? I'd prefer to pack those shaders into the build, since they are pretty simple.


Wiebo(Posted 2014) [#12]
If I would have any clue about openGL then I would help, but alas!!


Derron(Posted 2014) [#13]
@ Others:

Do not stop tinkering around with OGL/EGL even if JoshK wrote he would "do it". It is a competition and it would be not that nice for BMX NG if people stop coding just because there was one who wrote "I do it" as the first.

@JoshK
If you think of doing it regardless of the money, I would enjoy to see that code on a github project - just to see it evolving (and maybe people would be able to assist with things .. writing tests or so).
I know you might be afraid of "people could steal my work" but in the case of you doing it for the "thanks", it should not be a problem.

Of course above is just valid under the assumption of you doing it for the thanks, not for the 100 gbp (I doubt that this is the reason to do it).


bye
Ron


LT(Posted 2014) [#14]
I got started on it yesterday, but not for the money. I'm not interested in duplicating effort, but three months seems like a long time for this. The existing module is pretty close (already 2.1, afaik) and just needs some clean up work.


zoqfotpik(Posted 2014) [#15]
As is mentioned elsewhere I would be very interested in a hook for a postprocessing pixel shader. I think this is one of those things that newer games are doing (like those blurry watercolor looking backgrounds, though that isn't really postprocessing.)

I don't just want to throw out a feature wish that isn't really directly related to the core issue but it is at least tangentially related.


Pingus(Posted 2014) [#16]
I hardly understand what you are doing here guys but if the aim is to port a bmax code to run on Android platform, I think it worths quite some bucks, and rather than a 'contest', why not setup a kickstarter (or similar) which would be redistributated within the people who worked on ?


LT(Posted 2014) [#17]
Pingus, it's just one part of that, not worthy of a Kickstarter by itself.


Pingus(Posted 2014) [#18]
Ok but is'nt the whole thing would worth a kickstarter ? And therefore sub-parts would be financially handled also ?


Kryzon(Posted 2014) [#19]
Remember the PayPal taxes on those 100 GBP.


LT(Posted 2014) [#20]
Should this be a drop in replacement for GLMax2D or should there be a flag to use the old GLMax2D functionality? I have something that is mostly working - Digesteroids and Breakout are 100% - just have to finish the DrawOval and DrawPoly functions and clean up a little more.


Derron(Posted 2014) [#21]
If it is a "drop in replacement" there is no need for a "flag" - just use either the new or the old module to have a specific behaviour.

Cant await to test it.


bye
Ron


Brucey(Posted 2014) [#22]
It needs to be a drop-in replacement. So when I DrawText or whatever in my app, it works as before, except of course using the more modern stuff.

Obviously one could write something much more efficient if they were to drop direct compatibility with Max2D - forcing the user to do things in certain ways - and in the future I think we should aim for such a module. But for now, I think it's important that this can be used as a direct replacement.


LT(Posted 2014) [#23]
Technically, it's a drop-in replacement either way. What I meant was that I could have a Global called GL_USE_FIXED or something that would cause it to use the old functionality. That way, there would be no need for two separate modules.

However, it's easier if I don't bother, so I won't. ;)

if they were to drop direct compatibility with Max2D
Yeah, I have a separate module for my engine that does things differently. I won't be using Max2D at all, but anything I can do to help this process along... :)


LT(Posted 2014) [#24]
Speaking of efficiency...

I'm not actually sure if it's better to batch the primitives, which requires that the vertex colors be sent to the shader, or to simply pass the color and alpha and draw each primitive separately. The former is the current setup, but the latter was done in the original GLMax2D and it allows for the use of things like GL_TRIANGLE_STRIP and GL_TRIANGLE_FAN.


Derron(Posted 2014) [#25]
Best is to aim for 100% compatibility to BlitzMax Max2D.

Efficiency improvements could be done later (or using an extending module - MaxExt2D).


bye
Ron


LT(Posted 2014) [#26]
Compatibility is not the problem. Batching requires sending everything as PLAIN_TRIANGLE and passing the vertex colors. That's a lot of extra vertices being sent to the graphics card in exchange for fewer draw calls. It is admittedly simpler, though.


Brucey(Posted 2014) [#27]
I'm not actually sure if it's better to batch the primitives, which requires that the vertex colors be sent to the shader, or to simply pass the color and alpha and draw each primitive separately.

I've no idea. Which is why we're here in this thread ;-)

Ideally, if the user were to make several draws of the same kind in succession, the "engine" would be able to batch those together, thereby in theory, making things work faster.
Obviously, if he/she were to draw different, incompatible things in succession, then one wouldn't gain the "batching advantage".

Whether or not it is easy to code such a thing into the "engine", I've no idea.


LT(Posted 2014) [#28]
Well, the simplest is to send PLAIN_TRIANGLES so every primitive is kept discrete (can't use TRIANGLE_STRIP or TRIANGLE_FAN as part of a batch). And also to send vertex colors, even for DrawImage, so that's what I'll do for the time being.

However, I've just found that this module doesn't play nicely with my engine, even though the old GLMax2D does. So I'll have to look into that before I can call it a true replacement. :(

EDIT: It's very likely the shader initialization is conflicting with my engine's. I'm not sure if this can work as a drop-in replacement without separating that part.


Derron(Posted 2014) [#29]
@LT

Any progress? Just find it nice to read about progress, problems, success stories (you know...keeping up the excitement).


bye
Ron


LT(Posted 2014) [#30]
Any progress?
I haven't worked on it since Saturday morning...got sidetracked with getting my engine updated to GL 2.0 (to make sure they would play nicely together). The issues turned out to be minor ones in the module - had to do a side-by-side comparison with the old GLMax2D to figure that out.

Anyway, what I have is just an update of Brucey's GL2Max2D module, but it should work as advertised and it will be cool to see Digesteroids working on mobile devices. Still have a few things to do, but I'll try to get it posted in a day or two. Incidentally, my version extends the old GLMax2D and can be switched to use the old functionality very easily. I needed that for testing and it seems to me it's not doing any harm, but it's fine with me if you want to just drop it altogether.


Derron(Posted 2014) [#31]
Glad to hear that there is no serious showstopper laying in the middle of the todo-road.

I also cannot await to see BlitzMax(NG) extending to other platforms (graphically).


bye
Ron


LT(Posted 2014) [#32]
Here's the new brl.GL2Max2D. I've tested it with the Breakout and Digesteroids samples.

I'm not sure what to do with the DrawPixmap() functionality, so I've left it, for now. DrawOval() and DrawPoly() use TRIANGLE_FAN instead of batching.

Set GLMAX2D_USE_LEGACY = True, if you want to use old functionality.

**** LAST EDIT 11/10/14 ****



A simple test program.



juankprada(Posted 2014) [#33]
Just tested it in my own game. It works right now, but it is very very slow. I'll be looking at the src and see where can it be optimized, Thanks LT


LT(Posted 2014) [#34]
Hmm, DrawImage() is not using batches right now, so that could be changed easily enough by switching to PLAIN_TRIANGLE. However, it would have to keep track of the texture id and use that as a Flush() criterion. Currently, extra vertices are being sent to the card to provide color information. Another option is to remove that and send a Uniform color, but that also would have to be used as a Flush() criterion.

Digesteroids runs quite fast on my fairly old computer (albeit with a pretty fast graphics card). How does it run for you?


Brucey(Posted 2014) [#35]
Would it be better if there were different sets of arrays/shaders/etc for the different things to draw things, which could just be batched together as required, and then on the Flip, all pushed through together by running the different shader programs?

Perhaps some different Objects that collate everything during all the draws?

Although I suppose draw order would be a problem then... hmm.


Derron(Posted 2014) [#36]
Maybe have a look hoe cocos 2d etc do it with their new render pipelines...and how they order draw calls.

Id observation will be a first step in optimization...at least the code will be useable to test bmx ng on other platforms.

Bye
Ron


Brucey(Posted 2014) [#37]
at least the code will be useable to test bmx ng on other platforms

That would depend on what platforms you are talking about.
For example, GL_UNPACK_ROW_LENGTH is not supported on OpenGLES 2, so one would need a different UploadTex() implementation to start with.


LT(Posted 2014) [#38]
I suppose draw order would be a problem then
Priority numbers (or layer) could be another criterion. The Flush needs to be separate from Flip so that it will play nicely with other renderers.

Making sure DrawImage uses batches will probably help. How often does anyone use plain colored rectangles and polygons in conjunction with textured sprites? Not often, I'll bet. My main concern is how this will run on mobile devices, for which I've heard that draw calls are rather costly.

It would nice to know how Digesteroids performs on various devices.


Derron(Posted 2014) [#39]
See above about unsupported commands in egl. you might need to substitute unpackrows with other approaches (there are many answers for this on stackoverflow).

Bye
Ron


LT(Posted 2014) [#40]
so one would need a different UploadTex() implementation
I've yet to find a comprehensive resource that tells me exactly what is and what is not available for specific GL versions. I left that function as is, since I have no way to test alternatives.


Brucey(Posted 2014) [#41]
Fairly comprehensive :

https://github.com/bmx-ng/pub.mod/blob/master/opengles.mod/extern.bmx

:-)


LT(Posted 2014) [#42]
Well, that reference is handy, thanks!

Even after reading about GL_UNPACK_ROW_LENGTH (unavailable) and GL_UNPACK_ALIGNMENT (available), their purpose is still sort of unclear to me. I found that using one or the other made no difference, or even removing them altogether. :/


Derron(Posted 2014) [#43]
Think as soon as you have to use "glPixelStoreX" you will need that function.

Eg. for getting portions of a texture ("subtexturing")
http://stackoverflow.com/questions/205522/opengl-subtexturing

Another one:
http://stackoverflow.com/questions/9483945/looking-for-alternative-to-gltexsubimage2d-with-data-offset-support

So this might be used for sprite atlases or so?



bye
Ron


LT(Posted 2014) [#44]
No, it's not for sprite atlases - it's more low-level than that. It affects functions like glReadPixels, but how is not entirely clear to me. Using one, the other, or neither makes no difference on my machine. My guess is that it has an effect on performance, though.

It occurs to me that the texture load functionality on mobile platforms should probably just use SDL's texture functions.

EDIT: The textures are already powers of two, might have something to do with why it SEEMS to work regardless.


zzz(Posted 2014) [#45]
Looks like its for defining row sizes that differs from image sizes. Ie same as pixmap pitch. So pow2 textures probably happened to match whatever increments the pixmap uses for row length.
EDIT: Row length would be offset from first pixel in one row to first pixel in the next row, and alignment would be for what alignment (1,2,4,8) the first pixel in the next row will be at.


LT(Posted 2014) [#46]
This might work as a replacement for UploadTex and GL_UNPACK_ROW_LENGTH.



EDIT: The version with UNPACK_ROW_LENGTH should only be used with higher GL versions. Probably should remove it from this module.


zzz(Posted 2014) [#47]

Would it be better if there were different sets of arrays/shaders/etc for the different things to draw things, which could just be batched together as required, and then on the Flip, all pushed through together by running the different shader programs?

Perhaps some different Objects that collate everything during all the draws?

Although I suppose draw order would be a problem then... hmm.



There are a few ways to (theoretically) improve rendering speed for quads. Either using index resets to batch draw quads using triangle strips, which should result in faster drawing on the gpu, or using a premade elements array to reduce data transfer. I havent tried either of them, so I have no idea if its actually worth implementing or not.

EDIT: Or just utilize both.. Interesting enough to test it out I guess :) (Apparently ES wont do restart indices, but using drawelements gave about 15% increased performance for quad rendering on my system)


LT(Posted 2014) [#48]
using index resets to batch draw quads using triangle strips
That's what it is doing now; results in a draw call per image.


GaryV(Posted 2014) [#49]
I will do it because GTKMaxGUI saved my ass, and I use a lot of your modules, and MaxIDE 64 for Linux is really nice:
-Timeframe is probably ~3 months.
-I won't deal with SDL, context initialization, or buffer swapping, as these vary from platform to platform.
-I will only implement OpenGL 2.1, the right way, and then make changes for OpenGL ES based on what I think needs to be done. No testing on OpenGLES will be performed.
-The stuff I do will be fairly forward-compatible, and it will all use the fast-track path, but maintain maximum flexibility. No textures atlases, but it will use VBOs.

I already have this stuff written in Leadwerks 3.0 so it's not hard. Does your compiler support IncBin? I'd prefer to pack those shaders into the build, since they are pretty simple.


I look forward to seeing what you turn out. I am sure it will be a great addition.


zzz(Posted 2014) [#50]
What I meant was using the glPrimitiveRestartIndex along with whatever else it requires to set it up. It would basically allow you to draw a lot of triangle strip quads with a single glDrawElements call. The ES version Brucey want to support wont do it though.

I put the test code for the second suggestion in the wrong mod, but its straightforward enough. I made a completely separate set of arrays, but besides the elements array its probably better to just reuse the ones the batcher in gl2max2d already uses.



			Case PRIMITIVE_PACKED_QUAD
				glDrawElements( GL_TRIANGLES, elem_index, GL_UNSIGNED_INT, quad_element )




Derron(Posted 2014) [#51]
it should result in 1 drawcall per texture used.

Couldnt "triangle strips" get used to send multiple "rectangular" shapes in one call? so 2 triangles form a rectangle ... to move to the next, you use a "zero area"-triangle (some kind of a "line") to move to the next 2-times-triangle-rectangle and so on.

As long as they share the same texture, this should avoid to call that whole thing multiple times.


Hmm as I have no clue about OGL I assume that something like "DrawArray(triangles)" exists and is already somehow optimized.

Also it might not be as fast as possible as you set the used shader to "0" - which according to
https://github.com/mattdesl/lwjgl-basics/wiki/ShaderProgram-Utility

should not be needed in current implementations ... but I do not worry as long as it just "works" ... improvement could be done later on.



bye
ron


zzz(Posted 2014) [#52]
Well yes, but I think we were both on the track of reducing data transfer to the gpu. Since the quads will be disjoined if using strips or whatever anyways, I dont think there will be any difference in rendering speed compared to the example I posted.

If I understand the strips correctly it would take an additional two vertices to move to the next quad, which isnt really desireable, even if the gpu probably wouldnt bother rendering that part at all.

EDIT: Well one vertex, but thats still one 5 instead of 4 vertices per quad, and using triangle strips wont be faster then plain triangles. (Would it be possible to have 3 vertex quads, and have some clever shader or vertex code figure out where the fourth one should be?)


LT(Posted 2014) [#53]
Couldnt "triangle strips" get used to send multiple "rectangular" shapes in one call?
Without a glPrimitiveRestartIndex call, no.

I don't see how extra vertices would help. The point of a strip is a continuous set of triangles. You can't have two separate quads using strips (without an index reset).

In any case, sending them as PLAIN_TRIANGLE will work, but it will require sending more vertices. Using glDrawElements can reduce that a bit; it's what I use in my own engine. Also, tracking changes like color and alpha and simply passing those values instead of passing vertex colors should make it faster.


Derron(Posted 2014) [#54]
The point of a strip is a continuous set of triangles.


Might be the case... but it can and is used to have multiple "quads" (two triangles of course) in one call -- they then are connected via "zero area" triangles. They call it "degenerate triangle".

Nonetheless I do not want to disturb your tinker time .. sorry for the noob posting some rubbish here.


bye
Ron


Brucey(Posted 2014) [#55]
Perhaps we can introduce something like LoadAtlasImage(), where we already have a similar LoadAnimImage().
It could take an array of coords for each sub image, and you draw with it in the usual way, passing in 'frame' for the particular sub image you want to draw?

Freetype-gl has a nice shader-based atlas implementation that we could maybe borrow from ?

Although I'm not sure how you apply origin, translation, scale and rotation via the modelview matrix... (which I assume is the place you are meant to apply such things?)


Derron(Posted 2014) [#56]
If keeping things working as they work with BlitzMax vanilla, means also not to introduce new commands.

The engine "itself" should recognize if you kindly ask to draw from the same texture multiple times in a row.


Others plan to do batching this way:
http://www.cocos2d-x.org/wiki/Cocos2d_v30_renderer_pipeline_roadmap
(pay attention to the "reference"-links)

Seems they also just use "IDs" to decide wether this starts something new or could be done with the "previous" command. Of course "id" is a mixture of blendmodes, textureIDs etc. so they call it "key".


bye
Ron


LT(Posted 2014) [#57]
Well, yes, using some kind of key will make sense in future versions. Also, it is possible to use a single array to store all of the vertex data for all of the primitives and pass offsets into the draw function. I'm not sure that would be faster, though.


Brucey(Posted 2014) [#58]
There are no GL_QUADS in ES, and I think one probably shouldn't be using glBegin and glEnd either - they definitely don't exist in ES.


Brucey(Posted 2014) [#59]
I made a completely separate set of arrays

Arrays are very inefficient - in comparison to accessing, for example, a Float Ptr by index.


Brucey(Posted 2014) [#60]
@ render pipeline.

Essentially you need to throw away the old GL 1.2 code (all that stuff that uses Begin and End), and replace it with something else entirely (Shaders? I assume that's the best way to do everything?)


LT(Posted 2014) [#61]
Arrays are very inefficient - in comparison to accessing, for example, a Float Ptr by index.
You pass a pointer into DrawElements, also. Something related to UpdateBuffers was causing my (Windows) engine to grind to a halt. I replaced the DrawArrays functions with DrawElements and now it plays much nicer. I've been using a pre-built index array for doing geometry picks for some time.

NOTE: I don't know what performance will be like on mobile devices, but once the SDL context and compile for Android options are available, I'll be happy to test on my Nexus 7.


LT(Posted 2014) [#62]
New version is now available - UPDATED SOURCE ABOVE. Changed DrawArrays to DrawElements and implemented batching for textured tris. Blend and texture states are now saved and used as batching criteria. Also changed the texture upload function so that it doesn't use GL_UNPACK_ROW_LENGTH.


Derron(Posted 2014) [#63]
@batching criteria

In a later stage I think the "target" is another criteria (render to texture).


@updated source
maybe add a "edit DATE"-line in that post.


bye
Ron


Brucey(Posted 2014) [#64]
New version is now available

Cool. We're making some progress :-)

Now that I've got input (keyboard and mouse) working on the Pi (seems to be useful if you want to interact with it!), I can quit the app without having to do a reboot...

So, what we have rendering so far is the line, the oval and the polygon. No cross, text or rectangles.
(I'd do a screenshot but we are rendering from the console, rather than X11, so no access to the usual grabbers).

Any ideas re missing things? :o)


LT(Posted 2014) [#65]
No idea. The missing rectangle is especially puzzling considering that it uses the exact same rendering method as the oval. I wonder if the rectangle would draw instead if you reversed the order...

EDIT: The only difference I can see with the rectangle is that the floats are defined inside of an array using # instead of .0 ...


Derron(Posted 2014) [#66]
Will check tomorrow how RasPI-emulation works atm (virtualbox without arm, and QEMU with arm emulation) ... maybe it works even with another GPU getting emulated (works if same errors as brucey happen).


bye
Ron


Brucey(Posted 2014) [#67]
The missing rectangle is especially puzzling considering that it uses the exact same rendering method as the oval

No, DrawPoly and DrawOval are working.
Plot, DrawRect and DrawImage are not rendering anything. Those DrawXXX functions appear to use glDrawElements(). Dunno if that has anything to do with it. I've tried re-ordering the indices, to no avail - although it still draws everything in OS X.

Still, it's nice to see *something*.

I also have been testing zzz's version of the module, which he sent me, and it renders everything as expected on the desktop, but I couldn't get anything to work at all on the Pi.

The hardest part is just getting everything set up *right*. Once we've done that, it's pretty much plain sailing ;-)


Brucey(Posted 2014) [#68]
The other (minor) issue is that of the screen resolution. The Pi will *only* open a single, boot specified screen size for rendering. In my case, it's a standard 1920x1080 screen.
Whatever you ask via "Graphics x, y" it will only open that sized screen.
I suppose what you want is some kind of automatic scaling so that it *looks* like you are getting what you are asking for?

btw, I had to change SetResolution() to the following, because GL_PROJECTION is not available :
	Method SetResolution( width#, height# )

		u_pmatrix.SetOrthographic( 0, width, 0, height, -1, 1 )

	End Method

whereas yours was :
	Method SetResolution( width#, height# )

		glMatrixMode( GL_PROJECTION )
		glLoadIdentity()
		glOrtho( 0, width, height, 0, -1, 1 )
		glMatrixMode( GL_MODELVIEW )

	End Method



LT(Posted 2014) [#69]
Hmm, maybe the Pi doesn't like glDrawElements, but the setup is pretty standard and I'd expect it to work. :/


LT(Posted 2014) [#70]
I had to change SetResolution() to the following
I think I just commented it out temporarily when I was testing. 'Meant to put it back.

NOTE: The old Type is still there with its own SetResolution method.


Brucey(Posted 2014) [#71]
Fixed DrawRect and DrawImage :o)

Came across this post.
So I changed the QUAD_INDS and friends to use Short instead of Int, and text and rects are rendering now. Yay!

I dropped BATCHSIZE to 32767 - that's the max size for a Short isn't it?

Now we just need to get Plot to... plot. Still no little cross.


LT(Posted 2014) [#72]
I've tried re-ordering the indices
A quick way to be sure is to use glDisable( GL_CULL_FACE )...in case you haven't done that. I believe it is off, by default, but who knows on the Pi.


LT(Posted 2014) [#73]
Fixed DrawRect and DrawImage
Sweeeet. :) I'm very curious to see if Digesteroids will be fast enough to be playable.


Derron(Posted 2014) [#74]
@ missing plot

Maybe "gl_PointSize" needs to get defined in the shader?



bye
Ron


Brucey(Posted 2014) [#75]
I'm very curious to see if Digesteroids will be fast enough to be playable.

Heh... more or less!
It's all working, anyway. Even the sound - albeit a bit crackly :-)

Framerate... not sure, a wee bit slower than full speed in-game.
The Instructions screen is very laggy - on account of DrawText being *very* (very!) inefficient. This is where my new FreeType-GL module will come in useful, as text is rendered through its own custom shader.

So, some work to make things more efficient, and we'll be there I think.

All-in-all, awesome!


LT(Posted 2014) [#76]
on account of DrawText being *very* (very!) inefficient
Oh yeah, I noticed that each character uses its own texture id, which makes the batching useless. :/

Good news, in any case!


Brucey(Posted 2014) [#77]
Texture Atlas is the way to go. Should make everything very zippy.


LT(Posted 2014) [#78]
It can help DrawText, but anything else will need new commands.


Brucey(Posted 2014) [#79]
anything else will need new commands.

Since DrawImage is the one that's going to do 99% of the work, that's where I thought we may be able to do something without changing too much?

As I mentioned before, perhaps we could simply add a new Load command, leaving the rest as is, and then you use DrawImage with the frame parameter - as you do with an anim image. Except in our case it would be rendering from a proper atlas.
Perhaps it's more complicated than I think it sounds like it should be? *shrug*

But if we need to add something to make things work *better*, then we need to add something. No big deal.
If everything else works the way it did before, then no-one loses out?


LT(Posted 2014) [#80]
It's possible, but no frame number should be required in DrawImage (for compatibility). The image should know which atlas it belongs to and render accordingly.

In any case, I'm not sure you're going to see much improvement in "Rasperoids" even with the atlas. It's already batching textured quads - I'd be surprised if the frame rate improves by more than 10% or so.


Derron(Posted 2014) [#81]
@ Drawing Points

Did you try to add the "gl_PointSize"-definition in the shader? Seems to be needed somehow. According to some stackoverflow postings


http://stackoverflow.com/questions/24055683/drawing-gl-points

http://stackoverflow.com/questions/24715097/gles-2-0-draws-gl-points-as-squares-on-android

This seems to be needed by "default" for EGL. And then you need GL_POINT_SMOOTH on OGL to get it "rounded" again.


Sorry if you already tried that and it was useless.


@Raspberry
I tried "qemu" to emulate the raspi - compilation is working but the emulation misses the GPU of the raspberry and therefore I cannot execute things using OpenGL (/dev/vchiq missing).
A pitty.



bye
Ron


GaryV(Posted 2014) [#82]
Josh: Any progress to report?


Brucey(Posted 2014) [#83]
Did you try to add the "gl_PointSize"-definition in the shader?

That's sorted it thanks :-)
Adding "gl_PointSize = 1.0;" renders the cross now.


Derron(Posted 2014) [#84]
Cool thing ... did you try "atlasing" things?

(I mean drawing portions of a texture ?)

Another thing to test (albeit I think they will work) are the blend modes (lightblend, shadowblend ...).

When drawing from atlas, check if rotation/scaling does odd things.


bye
Ron


zzz(Posted 2014) [#85]
It should be perfectly possible to have an automated atlas system just put in new images in sheets or whatever youd call it as they are loaded. The problem with that though is that performance might be unreliable for dev.

Consider doing whatever project, and your two most used images sits nicely on the same sheet. If youd then add something else inbetween, this might push one of these images onto another sheet, which could (depending on how much you actually draw) be quite noticeable since youd be back to constantly flushing because of changing texture ids.

It could probably be solved by tracking usage of each texture, but it would either have to cause random hiccups as atlases are rearranged on the fly, or require additional commands that gives the user more control.


Derron(Posted 2014) [#86]
I wouldnt manipulate textures at all.

IF someone wants an engine to do the batching, they should use a custom "TBatchImage" which auto-reorganizes.

Another option is grouping ... grouped together items could get automatized by the engine (eg TBatchGroup.Add(image1) and so on).

The author/coder will know the best in which order he draws his sprites - and how this sprites are arranged on textures (I for example have all my figures-sprites on one texture, all gui elements on one etc - sometimes not perfect, but this reduced texture switches by somewhat).


Bringing in "automatism" or "intelligence" needs one of two things:
- simplifying things (only batch if things happen to reuse the same ressources)
- needs help of the coder (BatchGroup-Manager etc.)


The first thing is done using the "key"-approach (all things being disjoint to each other create something unique which can get used as reference whether a new batch-group will start).
After this step you will have multiple objects ordered by this key - within the elements of the same key you will have to group by "manual assigned batch groups".
Of course you could do it vice versa: first sort by "groups" and within that groups sort by key, but I think this adds more overhead - while it allows more influence of the coder.

bye
Ron


zzz(Posted 2014) [#87]
Yeah but all that requires additional commands. I was pondering if it would be worthwile to put in some texture batching behind the scenes, but it could result in some unpredictable performance gains. (Which would feel more like random performance loss really)

@Brucey
Regarding the resolution issue. Do you have fbset or something similar available on your pi? (have no idea about what you are running on it, or much else regarding those things really :) )

Doing the scaling "manually" in the driver is probably a bad idea, since the pi seem to have pretty poor fillrate. Especially if it will require filling a full hd sized buffer.


Derron(Posted 2014) [#88]
@additional commands
like said first do the simple things (auto texture batching)

if everything has settled down, one could extend the render pipeline with new commands (this would then be needed for all renderers - dunnow what happens to DX, or this gets replaced with "angle" then).


fbset should be available on the raspi.


bye
Ron


Brucey(Posted 2014) [#89]
Regarding the resolution issue. Do you have fbset or something similar available on your pi?

There's a binary called "tvservice", which you can use to query/change the video mode of the gpu. If you run this with the appropriate arguments, you can change modes to something suitable for your game.
Interestingly, the app is open source, so it's not impossible to imagine re-using the Broadcom APIs that it calls in your own code so that tvservice is no longer required to do the switching.
Although I believe fbset is still required afterwards to tell the console that the screen resolution has changed.


Brucey(Posted 2014) [#90]
I've been thinking about all the drawing stuff in Max2D, and how it does all the rotation/scaling/translating.

Wouldn't it be more efficient for the GPU to do this through matrices as raw data is passed to it?
I've no idea really how shaders work though, so I don't know if, when you push some vectors, you can also push the current rotate/scale/transform values too?

:o)


juankprada(Posted 2014) [#91]
As far as I know there are a couple of ways to do that. One would be matrix manipulation of the model, But I think (i am very ignorant here)batch rendering here would depend on model matrix too, otherwise all objects in the batch would be rotated/scaled/translated the same. Another would be to actually specify vertex positions before being sent to the gpu depending on rotation and scaling (I use the second approach)


Derron(Posted 2014) [#92]
A short "research" on this topic said manipulating the vertex positions is way slower than manipulating the matrix of the batch renderer.

But I think all of them are faster if they use some kind of "batching" (and the app utilizes it ).

bye
Ron


zzz(Posted 2014) [#93]
I agree with juankprada on this one. It would introduce another flush state (or a ton more of data to move) which would probably break the auto-batcher more or less. At least in every scenario I can think of where I would benefit from the batching as it is now.

The shaders can either be supplied data on a per-primitive (ie vertices) or a per-drawcall basis. Even though the four verts in a quad uses the same transform values it would have to be sent to the gpu once for every vertex. Do it on a per-drawcall basis instead and you must flush the batcher every time the transform values change.

Think of for example a particle system using quads, in the current mod code it will most likely be rendered with a single drawcall, but if you use individual rotation or scaling etc (which would still use one drawcall in current code) the worst case scenario would become one drawcall per particle. Which would mean we are back at where we started performance-wise.


LT(Posted 2014) [#94]
Passing a matrix per quad is pretty expensive. Particle systems avoid it, if they can. In a 2d system, sending three values like position x and y and rotation r is not so bad. Add a fourth for distance, if you like.

Like zzz said, it has to be done for every vertex, which is what makes it so inefficient. This gets crazy fast with geometry shaders, alas...


Derron(Posted 2014) [#95]
When having a look how other frameworks handle it:
https://github.com/libgdx/libgdx/blob/master/gdx/src/com/badlogic/gdx/graphics/g2d/SpriteBatch.java

(they also have a CpuSpriteBatch.java)

It seems they just use CPU-Matrix for transformations.


According to some forums DX10-gpus might have problems with Geometric Shaders (up to make them slower than with the "old approach").


As the common case is NOT a particle system, I would prefer some kind of manual "batchSprite" object and options.
The default usage will batch as many things until a state change (then flushing and waiting for the next thing) but the advanced usage might then be to call a custom command ("EnableGPUTransformationMode(true)") - so it is up to the user if he wants to use a mode which might be slower for many texture bindings/draw calls.


bye
Ron


juankprada(Posted 2014) [#96]
LibGDX doesnt use matrix to rotate/scale/translate sprites. If you take a look at line 223 you will see the method that actually adds a texture and vertices to the batch. You will notice there that rotation is done per vertex and the vertex are sent to the GPU already "rotated" (line 270)/"scaled" (line 243)/"translated" (lines 235 and 236) but without matrix manipulation. The model matrix is always the identity matrix. I know that because I replicated their spritebatch in Java with a different opengl wrapper (JOGL instead of LWJGL)


Derron(Posted 2014) [#97]
https://github.com/libgdx/libgdx/blob/master/gdx/src/com/badlogic/gdx/graphics/g2d/CpuSpriteBatch.java

Isnt the cpubatchsprite working with a matrix?

I did not read /checkout when which variant is used.


Bye
Ron


Brucey(Posted 2014) [#98]
This is why it's better for people who know how stuff works to come up with ideas :-)


So, as it stands, are we happy with the way the shader-based module is currently working?

Is there anything obvious, that if it were to be implemented, would provide an order-of-magnitude improvement to renderings?

I'm personally happy with the state of things, as it's rendering stuff correctly on previously unavailable platforms (eg. OpenGL ES 2.0 targets)


Derron(Posted 2014) [#99]
Especially the mobile targets will show performance wise bottle necks.

"Graphics intense" (read "many sprites") apps might be problematic - in that case we should think about additional "helper" classes - to enforce specific caching behaviour etc. I did not check if "basic caching" is done already - as it was suggested before - so there is a potential spot to optimize without much needed "intervention" on your side.


bye
Ron


zzz(Posted 2014) [#100]
Its probably pretty good to go.

[edit]

The one thing that might be worth looking into is the flush/batch drawing code (ie where the batcher code decides if it needs to draw or not). Most of it is my own code, but I never put much thought into it as it was just for testing. Could probably be a bit more lean :)

Were also uploading the projection matrix on every drawcall (unless i had a brainfart why peeking through the code). Its pretty big, and only needs to be uploaded when we switch shaderprograms or change resolution.

Wouldnt expect an order-of-magnitude performance improvement from any of it, but should still be worth looking into.


LT(Posted 2014) [#101]
Other than zzz's suggestion of limiting the passing of the projection matrix, I don't know how else to make it faster and maintain the command set. Keep in mind that we are setting array data on every DrawThis() call. A better system would be to create persistent objects that only modify the array when they are changed, but that would require a different set of functions.