BufferedGLMax2D

BlitzMax Forums/BlitzMax Programming/BufferedGLMax2D

N	(Posted 2009) [#1]

BufferedGLMax2D is my Max2D implementation that crams data into vertex arrays and then draws using those. Nothing is drawn until either Flip or GrabPixmap is called (the latter is not implemented). I don't currently have speed tests, I don't know yet if this implementation is faster than Brl.GlMax2D. I should hope it is, especially when dealing with rendering complex scenes, but that has yet to be seen. There's no glBegin/glEnd in sight. The module also requires cower.renderbuffer, since I've decided to split the code for that off into a new project, so grab that too.

So far, most of the driver is implemented, there are still a few things left out - DrawOval, GrabPixmap, DrawPixmap - mostly the useless functions that I don't care about. Comments are few and far inbetween and cryptic if not completely nonsensical.

The module should be able to replace GLMax2D for most uses right now… unless you happen to use DrawOval or GrabPixmap, in which case you should re-evaluate your life decisions if it came to that. I haven't personally encountered any bugs in testing it, I have no doubt there are bugs, it's probably not thread-safe and I have no intention of making it thread-safe because it's all inherently unsafe in the first place.

So, if you griped about GLMax2D's use of glBegin/glEnd or anything else, do hack on BufferedGLMax2D. It's open source under the MIT license, it's on Github so you can easily fork it and contribute however you may. However, the difference between this and GLMax2D is you can make changes and they will get added to the code if I think they're good enough. So, if you start whining about how this doesn't have something or other and it's within your ability to change that and you don't, I will personally add clauses to the license to restrict your use of the module.

This post feels so disorganized.

By the way, if you find a bug and can't fix it, but can reproduce it please add an issue to the project on github. It doesn't take long, it's free, etc. Keeping the issues on the project page lets me mark issues complete when making commits and gives me a way to keep discussion of each individual issue separate from another, making it a heck of a lot easier to parse any information about a problem. As usual, include operating system, graphics chipset, and if possible whether your GPU has discrete video memory or if it's shared with system memory (this tends to not affect anything, but just to be safe).

Note: This isn't a modified GLMax2D - do not move it to Module Tweaks. Cheers. Oh, and this isn't really a showcase thing, so don't move it there either - it's incomplete, this is more of a "look at this, please contribute" sort of deal.

Nate the Great

(Posted 2009) [#2]

:( my internet has been acting up lately, I get an "This page has become unresponsive" error :/

N	(Posted 2009) [#3]

Works fine for me, and my connection is about on par with a 14.4kbit/s modem right now.

Nate the Great

(Posted 2009) [#4]

well mine is like 1 or 2 mbps so idk :/ do I need git to see it? could that be my problem? regardless sounds interesting.

Pete Rigz

(Posted 2009) [#5]

hmm, very interesting :) will give this a proper look tomorrow, I didn't get very far with a very quick test:

SuperStrict

Import cower.bufferedglmax2d

SetGraphicsDriver BufferedGLMax2DDriver()

Graphics 800, 600

Cls

Flip

WaitKey()

It throws an EXCEPTION_ACCESS_VIOLATION and highlights line 157 of the module:

state.Bind()

I have an nVidia 8800gt running windows 7 64bit. Looking forward to trying this out in more detail with TimelineFX.

N	(Posted 2009) [#6]

Unusually, but it seems my use of an array of callbacks has really, really pissed off BlitzMax. Might be a bug with the compiled code, who knows. Either way, it seems the address of the state variable was being changed - how the heck that works is beyond me, but whatever, I have a reasonable idea of how to fix it now.

dmaz	(Posted 2009) [#7]

I did the same thing a long while back and found the speed gains negligible in the tests I did. my hope that older drivers and hardware would show larger gains but that didn't materialize. Now, some people swear that this should give you large gains... maybe that would be the case with tens of thousands of vertexes? I got the greatest speed gain (for 2d)from keeping texture swapping to a minimum.

more power to you though as I hope I just did something incorrectly.

N	(Posted 2009) [#8]

Fixed the above bug to the best of my knowledge. Dinner is in order.

more power to you though as I hope I just did something incorrectly.

Considering most folks using BlitzMax are making corny little match-3 games (or whatever the trend is now), I expect any speed gain would be negligible. The most gain would probably be with heavy use of particle systems and other effects where you'll probably render the same image many, many times over per frame, allowing you to have minimal state change and probably some gain by batch-rendering all of the particles at once.

The main benefit this module gives me is that the renderer is a lot more flexible in some respects than the existing GLMax2D implementation. I can render 3D objects, complex shapes, just about anything I want without worrying too much about screwing up state because I can go through the TRenderBuffer object (if I make it public). It makes some of the work I'm doing in my spare time a lot easier as a result (the render buffer stuff was from a previous project and was a lot simpler back then, but the way it works hasn't changed that much).

DavidDC

(Posted 2009) [#9]

Sounds great Noel, thanks for making it available.

Nate the Great

(Posted 2009) [#10]

The most gain would probably be with heavy use of particle systems and other effects

thats where I need the most speed! in my game flood, it displays 3000 particles sometimes and it takes several millisecs of precious cpu time with max2d... would be interesting to do a speed test to prove yours is faster

N	(Posted 2009) [#11]

If you get around to testing it, I'd be interested to see how that turns out. It might be feasible to have a certain threshold where the renderer switches over to vertex buffer objects instead of vertex arrays as well, possibly gaining more speed... but then I don't think you'd really see much of a speed increase in doing that.

Nate the Great

(Posted 2009) [#12]

its kinda hard when that page wont load ;p

but when it does ill definitely try.

N	(Posted 2009) [#13]

Yeah, far as I can tell you're the only one having problems seeing it.

Nate the Great

(Posted 2009) [#14]

Ill try safari...

edit: works in safari! stupid google chrome addon I didnt know about was giving an error on firefox and ie... how on earth did it get there? idk

N	(Posted 2009) [#15]

That's... bizarre. I didn't know Chrome could use addons yet - the Mac version has them hidden for now.

Nate the Great

(Posted 2009) [#16]

impressive but im a little confused as to how to use it, a short example would be great and I dont know how navigate github :/

edit: no I wasnt using addons in chrome, it was something that made sites that only work for chrome work for ie and firefox so I can stick with 2 browsers lol

N	(Posted 2009) [#17]

Click the "download" button next to the project name, grab zip file, extract source code in BlitzMax/mod/cower.mod/bufferedglmax2d.mod, run 'bmk makemods cower.bufferedglmax2d', and in your code import the module and call "SetGraphicsDriver(BufferedGLMax2DDriver())".

Nate the Great

(Posted 2009) [#18]

so can I just use the draw image and all those commands as normal after I do that?

N	(Posted 2009) [#19]

Yes. It's just a Max2D driver, not a new API.

xlsior

(Posted 2009) [#20]

Interesting.

Tommo

(Posted 2009) [#21]

A real time texture packer can help speed up max2d too, especially DrawText with imagefont.

I use a binary tree structure for this, it looks like this:

And the generated texture is like this (9 textures(256x256) generated):

Because you don't have a unlimited single texture, so a texture pack group is used.
When you add a new texture, it will try insert the texture into available packs (older first, to reduce space wasting), if no space in all packs, a new texture pack will get created.

I've integrated this into my customed GLMax2D driver. It works fine so far.

N	(Posted 2009) [#22]

That's too intelligent for me. I'll let someone else concern themselves with hacking that in.

Edit: Ok, just fiddled around with packing - it's simple if you don't try to do anything intelligent. Would work well enough for reducing the creation of multiple textures where possible.

MGE	(Posted 2009) [#23]

Please create a benchmark test, drawing 2000 tiles (or allowing the user to press a key to add more, etc, etc.) in each version so we can see if there is any speed increase. Thanks for taking this on, this is the test alot of us would love to see running...once and for all. ;)

N	(Posted 2009) [#24]

What's stopping you from writing the test?

Grey Alien

(Posted 2009) [#25]

Sounds pretty cool, I throw around a lot of particles in my "corny little match-3 games" so a faster render is always welcome. It would be neat to have some nice buffering like this in DX9 too.

Tommo

(Posted 2009) [#26]

Would work well enough for reducing the creation of multiple textures where possible.

And it also uses less memory than plain max2d images.
The only drawback is that you can't use GL_REPEAT flag with packed texture, but I think there's no problem for max2d.

Here is the code of the tree I'm using.
You can use sNode.draw()(it's commented out) to see the result visually.

Strict

Module tommo.stree

Import brl.linkedlist

Type sNode
	GLOBAL EDGE_TOLERENCE = 10
	function setEdgeTolerence(t:Int)
		EDGE_TOLERENCE = t
	End Function
	
	Field parent:sNode
	Field x:Float, y:Float
	Field w:Float, h:Float
	Field dia:Float
	
	Field childA:sNode
	Field childB:sNode
	
	Field isSplited
	
	Field rotated:Int
	Field childOccupied:Int
	Field isOccupied:Int
	
	Method init(x:Float, y:Float, w:Float, h:Float)
		Self.w = w
		Self.h = h
		Self.x = x
		Self.y = y
		dia = w * w + h * h
	End Method
	
	Method splitH:sNode(splitW)
		
		If splitW > w Return Null
		If splitW = w Return Self
			
		
		Assert Not isSplited
		
		childA = New sNode
		childA.init(Self.x, Self.y, splitW, h)
		childA.parent = Self
		
		Local ww = w - splitW
		childB = New sNode
		childB.init(Self.x + splitW, Self.y, ww, h)
		childB.parent = Self
		
		If ww < EDGE_TOLERENCE
			childB.isOccupied = 3
		End If
		
		isSplited = True
		Return childA
	End Method
	
	Method splitV:sNode(splitH)
		If splitH > H Return Null
		If splitH = h Return Self
		
		Assert Not isSplited
		
		childA = New sNode
		childA.init(Self.x, Self.y, w, splitH)
		childA.parent = Self
		
		childB = New sNode
		Local hh = h - splith
		
		childB.init(Self.x, Self.y + splitH, w, hh)
		childB.parent = Self
		If hh < EDGE_TOLERENCE
			childB.isOccupied = 3
		End If
		
		isSplited = True
		Return childA
	End Method
	
	Method updateChildOccupied()
		If issplited
			Local i = childA.isOccupied + childB.isOccupied
			Select i
			Case 2, 4
				Self.setOccupied(True)
				
			Case 0, 3
				Self.isSplited = False
				Self.setOccupied(False)
				
			Case 1
				Self.setOccupied(False)
				
			EndSelect
		End If
	End Method
	
	Method setOccupied(s:Int = True)
		If isOccupied = s Return
		isOccupied = s
		If Not parent Return
		parent.updateChildOccupied()
	End Method
	
	Method similar:Float(w:Int, H:Int)
		Local p:Float
		If Self.w = w
			p = w
		End If
		Return p
	End Method
	
	Method findNode:sNode(w:Int, h:Int)
		If Self.isOccupied Return Null
		If Not (Self.w >= w And Self.h >= h) Return Null
		
		If isSplited
			Local n1:sNode = childA.findNode(w, h)
			Local n2:sNode = childB.findNode(w, h)
			
			If n1 And n2
				If n1.dia < n2.dia
					Return n1
				Else
					Return n2
				EndIf
			EndIf
			
			If n1 Return n1
			If n2 Return n2
			
		Else
			Return Self
		EndIf
		
		Return Null
	End Method
	
'	Method draw()
'	
'		If isOccupied
'			If rotated
'				SetColor(0, 255, 0)
'			Else
'				SetColor(255, 0, 0)
'			EndIf
'			SetAlpha(0.5)
'			
'			If isOccupied = 3
'				SetColor(0, 0, 255)
'			End If
'			
'			DrawRect(x, y, w, h)
'		EndIf
'		
'		SetColor(255, 255, 255)
'		SetAlpha(0.5)
'		drawBox(x, y, x + w, y + h)
'		
'		If isSplited
'			childA.draw()
'			childB.draw()
'		End If
'		
'		Function drawBox(x1, y1, x2, y2)
'			DrawLine(x1, y1, x2, y1)
'			DrawLine(x2, y1, x2, y2)
'			DrawLine(x2, y2, x1, y2)
'			DrawLine(x1, y2, x1, y1)
'		End Function
'	End Method
	
End Type

Type sTree
	Field root:sNode
	Field occupiedNodes:TList = New TList
	Field w:Int, h:Int
	Method insert:sNode(w:Int, h:Int, allowRotate:Int = True)
		
		Local N:sNode = root.findNode(w, h)
  	Local n1:sNode
		Local rot:Int
		
		If allowRotate And w <> h
			n1 = root.findNode(h, w)
		  If n1
				If (n And (n1.dia < n.dia)) Or (Not n)
				 	n = n1
					Local c = w
					w = h
					h = c
					rot = True
				EndIf
			EndIf
		End If
		
		If n
			If n.w > n.h
				n = n.splitH(w)
				n = n.splitV(h)
			Else
				n = n.splitV(h)
				n = n.splitH(w)
			EndIf
			
			n.setOccupied(True)
			occupiednodes.AddFirst(n)
			If rot n.rotated = True
			Return n
		EndIf
		
		Return Null
	End Method
	
	Method init(w:Int, h:Int)
		Self.w = w
		Self.h = h
		Self.clear()

	End Method
	
	Method clear()
		root = New sNode
		root.init(0, 0, w, h)
		occupiednodes = New TList
	End Method
	
	Method reinsert()
		Function compare:Int(n1:Object, n2:Object)
			Local a1:Float = sNode(n1).dia
			Local a2:Float = sNode(n2).dia
			If a1 > a2 Return 1
			If a1 < a2 Return - 1
			Return 0
		End Function
		Local l:TList = occupiednodes
		
		l.Sort(False, compare)
		
		Self.clear()
		
		For Local n:sNode = EachIn l
			If n.rotated Self.insert(n.h, n.w) Else Self.insert(n.w, n.h)
		Next
	End Method
End Type

Sorry for lacking of comment.:)

N	(Posted 2009) [#27]

Looks sort of similar to the code I just put together to test stuff:

Strict

'Import Cower.BufferedGLMax2D
'SetGraphicsDriver(BufferedGLMax2DDriver())

Function Swap(A% Var, B%Var)
	Local C% = A
	A = B
	B = C
End Function

Type Pack
	Field width:Int, height:Int
	Field filled:Int = False
	Field right:Pack
	Field bottom:Pack
	Field parent:Pack
	Field cr%,cg%,cb%
	
	Method New()
		cr = Rand(128,255)
		cg = Rand(128,255)
		cb = Rand(128,255)
	End Method
	
	Method GetHeight:Int()
		Local h:Int
		Local i:Pack = Self
		While i
			h :+ i.height
			i = i.bottom
		Wend
		Return h
	End Method
	
	Method GetWidth:Int()
		Local w:Int
		Local i:Pack = Self
		While i
			w :+ i.width
			i = i.right
		Wend
		Return w
	End Method
	
	Method Area%()
		Return width*height
	End Method
	
	Method Root:Pack()
		Local r:Pack = Self
		While r.parent
			r = r.parent
		Wend
		Return r
	End Method
	
	Method Fill%(w%, h%)
		If filled Then
			Local r:Int = False
			If right And bottom Then
				Local ra:Int, ba:Int
				ra = right.Area()
				ba = bottom.Area()
				
				If ra < ba Or bottom.width = w Then
					r = bottom.Fill(w,h)
					If Not r Then r = right.Fill(w,h)
				Else
					r = right.Fill(w, h)
					If Not r Then r = bottom.Fill(w, h)
				EndIf
			ElseIf right
				r = right.Fill(w,h)
			ElseIf bottom
				r = bottom.Fill(w,h)
			EndIf
			Return r
		EndIf
		
		If width < w Or height < h Then
			Return False
		EndIf
		
		right = New Pack
		bottom = New Pack
		If height-h < width-w Then
			right.width = width-w
			right.height = height
			bottom.width = w
			bottom.height = height-h
		Else
			right.width = width-w
			right.height = h
			bottom.width = width
			bottom.height = height-h
		EndIf
		
		If right.width = 0 Or right.height = 0 Then
			right = Null
		Else
			right.parent = Self
		EndIf
		
		If bottom.width = 0 Or bottom.height = 0 Then
			bottom = Null
		Else
			bottom.parent = Self
		EndIf
		
		width = w
		height = h
		filled = True
		
		Return True
	End Method
	
	Method XY%(x% Var, y% Var)
		Local p:Pack = parent
		Local lp:Pack = Self
		x = 0
		y = 0
		While p
			If p.right = lp Then
				x :+ p.width
			Else
				y :+ p.height
			EndIf
			lp = p
			p = p.parent
		Wend
	End Method
End Type

Local p:Pack = New Pack

Graphics 800, 600, 0, 0

p.width = GraphicsWidth()
p.height = GraphicsHeight()

Local misses:Int = 100
Local t:Int = Millisecs()

Function DrawAndRecurse(x%, y%, p:Pack)
	If Not p Then Return
	
	SetBlend(AlphaBlend)
	
	SetAlpha(.8)
	If p.filled Then
		SetColor(p.cr, p.cg, p.cb)
	Else
		SetColor(32,32,32)
	EndIf
	DrawRect(x, y, p.width, p.height)
	
	SetColor(255, 255, 255)
	SetAlpha(1)
	DrawLine(x,y, x+p.width,y)
	DrawLine(x+p.width,y, x+p.width,y+p.height)
	DrawLine(x,y+p.height, x+p.width,y+p.height)
	DrawLine(x,y, x,y+p.height)
	
	DrawAndRecurse(x+p.width, y, p.right)
	DrawAndRecurse(x, y+p.height, p.bottom)
End Function

Local go:Int = False

'buildopt:gui
While Not AppTerminate()
	Cls
	
	go = go Or KeyHit(KEY_SPACE)
	
	DrawAndRecurse(0, 0, p)
	If Millisecs()-t > 25 And go Then
		p.Fill(2^Rand(4,7), 2^Rand(4,7))
		t = Millisecs()
	EndIf
	
	Flip
Wend

(Press space)

Mine's not pretty code, nor even immediately useful, but it gives me a decent idea of how to go about doing it. I like the rotation bit in your code, that's not something I'd considered, so I'll probably borrow that idea.

Tommo

(Posted 2009) [#28]

Looks neat. :)

If you plan to go with non 2^n sized texture, you'd better set a minimal split size (like EDGE_TOLERENCE in my code). To abandon the areas (by mark it as used) which are too small can save a lot of searching time, because these small pieces are always located very deep in the tree.

MGE	(Posted 2009) [#29]

"What's stopping you from writing the test? "

If I was the author of the module, I would make available a benchmark so developers could see any potential benefit before taking the time to download, code, test, retest, etc.

N	(Posted 2009) [#30]

So, what's stopping you? If I were the author, and I am, and I wouldn't be interested in writing benchmarks when I've got the module itself to work on - you can see above there's still work to do, including the possible addition of packing multiple images into single textures.

Benchmarks seem to interest you. How about you write the benchmarks? You should be able to write a working one with current Max2D implementations and use it with this as well to determine whether or not something is implemented properly and if there's a speed benefit.

My point is that this is a free and open-source implementation and something I am currently doing alone. If you want something that is not currently in my list of goals, and benchmarks/example code (I'm not sure what example code anyone would want though - there is only one function of interest) are not. Like I say in the initial post, hack it yourself and I'll look at adding any patches submitted. See a typo in the code and fix it? Send me a patch, I'll check it, add it if it passes. Want to add comments around the code? Send patch, I check, I add if it passes.

Want to add a benchmarks/ directory to the module? Go on github, fork the module, make your changes/additions, and send me a pull request so I know to look. Hopefully I've made it abundantly clear that if you want something and I'm not working on that, you should be doing it yourself for the benefit of you and everyone else.

MGE	(Posted 2009) [#31]

Isn't the entire idea of your project based on one possible statistic which is "increased performance"? I would think a benchmark would be your first order of business.

And I'll be the first to say "wow...hell yah..thank you..where's the tip jar" if the end result is stable and improves graphics rendering performance.

So yah...get a benchmark going that proves your tech, and you'll probably be the envy of the entire forum! And if you don't realize it, this is one of the forum's most anticipated requests so fingers are crossed you pull it off! ;)

slenkar

(Posted 2009) [#32]

any chance of a setimagevertices command? to position image vertices?

Ill put it in myself if I can, but I thought id ask first if its a simple job for you.

Brucey

(Posted 2009) [#33]

If I was the author of the module ...

Ah, but Noel is.
Do you really want to be walking down that dark path?

Best to just leave it and move on, MGE... :-)

N	(Posted 2009) [#34]

Isn't the entire idea of your project based on one possible statistic which is "increased performance"?

Quoting myself:

The main benefit this module gives me is that the renderer is a lot more flexible in some respects than the existing GLMax2D implementation.

So no, you go get a benchmark going and you'll probably be the envy of the entire forum (that sounds like a horrible punishment).

ImaginaryHuman

(Posted 2009) [#35]

I think I probably would include a demo/test also. Nilium your posts sound kind of bitter. I'm not sure why you're sharing this with the public because what you've said has undertones of not really caring about what's important to other people. Your initial post almost sounds like `here, figure it out yourselves, it's your fault if it doesn't work`. I agree with MGE that if you're going to offer something helpful to people it would be further helpful to make it easier for them to know if it's worth using, rather than like `bite me`.

On another note I have found that vertex arrays can be twice as fast as immediate mode rendering when you're doing a larger number of objects, and especially when you're doing lots of particles. But it seems to depend on the gpu/driver.

MGE	(Posted 2009) [#36]

No thanks, I don't have time either. Perhaps someone else will take the time to download, install, compile your code, test, create a benchmark test so we can all see how it performs.

N	(Posted 2009) [#37]

I think I probably would include a demo/test also.

Here's a demo/test:

Import Cower.BufferedGLMax2D
SetGraphicsDriver(BufferedGLMax2DDriver())

Put it in your code, see if it works, test/demo complete. This isn't a new API, it's a Max2D implementation, you don't need demos to see what this does, because nothing will visibly change (unless you use DrawOval/GrabPixmap/DrawPixmap).

Nilium your posts sound kind of bitter.

Frustrated would be better. The code is open source for the purpose of being able to modify it and contribute to it. MGE wants something (benchmarks), which isn't what I'm interested in doing, so I ask why he won't do them. The code is there, you want a change? Make it. You want a test? Write it.

What I would like to see is all the people requesting something they think will make everything better and help others to make an effort to actually do it themselves. Don't just sit on your hands and go "this sure sounds like a nice idea."

I'm not sure why you're sharing this with the public because what you've said has undertones of not really caring about what's important to other people.

I care about what's important to other people, but I would like much more to see people help themselves. The code's open source for a reason, I want to see people working to improve things and sharing those changes openly to benefit everyone else, as I've tried to do in releasing the code for the module. It's not that I'm going to go "bite me" to people who want something, but I don't want to just be the suggestion box that everyone lines up to and goes "I want shaders" or "I want benchmarks." The code is right there, go do it! You're programmers for Christ's sake, what's stopping you?

After you've decided that you're capable of doing something and start writing code and making changes, I will come in and help. I'm not just saying "you're on your own," ask questions about what something does if you need to, if something seems to not work, mention it and maybe file an issue. This is a work in progress, so the main point I'm trying to get across is that I want to see open-source work in this community (I'm an idealist, I would like to believe that you would sacrifice your time for the benefit of others). I don't want to be the only one writing code while everyone else does nothing while expecting everything.

Brucey

(Posted 2009) [#38]

I don't want to be the only one writing code while everyone else does nothing while expecting everything.

But I like sitting on my hands, doing nothing.

btw, you aren't doing yourself any favours :-)
I'd suggest you might want to market yourself a little differently, but I know, you're a programmer, for Christ's sake, it's not your job to market yourself. :-p

Just say something like, "Well, I'm busy at the moment, but you can simply try it out by dropping it in place of the usual module, and give it a go. Let me know if you see any improvements."

Alienating the potential audience may not be what you really want? or perhaps it is... :-)

GaryV

(Posted 2009) [#39]

Noel: No good deed goes unpunished ;)

theHand

(Posted 2009) [#40]

Improvements are always welcome; that's just their way of saying thank you. Open source! :)

Pete Rigz

(Posted 2009) [#41]

All working fine now here! Here's an extremely crude bench test thingy:

SuperStrict

Import cower.bufferedglmax2d

SetGraphicsDriver BufferedGLMax2DDriver()
'SetGraphicsDriver GLMax2DDriver()
'SetGraphicsDriver D3D7Max2DDriver()
'SetGraphicsDriver D3D9Max2DDriver()

Graphics 800, 600

Local pixmap:TPixmap = CreatePixmap(128, 128, PF_RGBA8888)
For Local x:Int = 0 To pixmap.width - 1
	For Local y:Int = 0 To pixmap.height - 1
		WritePixel(pixmap, x, y, (255 Shl 24) | (x Shl 16) | (y + 128 Shl 8) | x + y)
	Next
Next

Local image:TImage = LoadImage(pixmap)

Global FPSLASTUPDATETIME:Int
Local time:Int = MilliSecs()

SetBlend ALPHABLEND

While Not KeyDown(KEY_ESCAPE)
	Cls
	For Local c:Int = 1 To 10000
		SetScale 0.1, 0.1
		SetAlpha 0.75
		DrawImage(image, Rnd(GraphicsWidth()), Rnd(GraphicsHeight()))
	Next
	
	time = MilliSecs()
	SetScale 2, 2
	
	DrawText "FPS: " + FPS(), 0, 0
	
	Flip 0
	FPSLASTUPDATETIME = FPSLASTUPDATETIME + (MilliSecs() - time)
Wend

Function FPS:Float(frequency:Int = 200)
	Global FPSFRAMECOUNT:Int
	Global FPSLASTCOUNT:Float
	Local time:Int = MilliSecs()
	FPSFRAMECOUNT=FPSFRAMECOUNT+1
	Local elapsed:Float = time - FPSLASTUPDATETIME
	If elapsed>=frequency
		FPSLASTCOUNT=FPSFRAMECOUNT/elapsed*1000.0
		FPSFRAMECOUNT=0
		FPSLASTUPDATETIME=time
	EndIf
	Return FPSLASTCOUNT
End Function

rem out the drivers as necessary. Not the greatest test by any stretch of the imagination but gives a rough idea :). I got:

BufferedGLMax2DDriver(): 100fps
GLMax2DDriver() : 160fps
D3D7Max2DDriver() : 83fps
D3D9Max2DDriver() : 137fps

brl.gl still fastest for me, it will be interesting if this can be optimised further, as it seems to me that vertex arrays would be faster in theory. Admittedly though, my knowledge of opengl has a lot of holes :)

plash

(Posted 2009) [#42]

Using Pete Rigz code:
BGLM2D: ~95fps
GLM2D: ~51fps
D3D7: ~37fps
D3D9: ~37fps

N	(Posted 2009) [#43]

I managed to get a 100fps speed increase over GLMax2d's count in release mode after making some changes to both modules. I'll push the commits later once I clean some of the changes up.

Debug mode suffers, however, at 20fps less than GLMax2D. Not sure why, but I'm willing to accept a lower FPS in debug mode if it means performance is improved in release (where it would hopefully count for something).

N	(Posted 2009) [#44]

Some changes pushed that should show a speed increase. I think the 100fps was lost somewhere and somehow - not sure, but at the very least it's now faster on my system, although only in release mode (I can only guess that mine makes more function/method calls, resulting in more strain coming from the debugger).

Dreamora

(Posted 2009) [#45]

The benchmark showed what I've seen back in 1.3.2 days when I created a buffered driver. If the cpu is weak enough so the drivers can't do optimization on the sent data, then the buffered driver is able to perform better.
But if the cpu is somewhere current, the buffered driver can only lose as the stuff is handled twice.
Potentially important to mention that I'm on an NVIDIA card and current drivers ( win7 x64, card is a gtx280, just to give others the possibility to estimate performance of cards inbetween)
But you did a great job on this module :)
Will later edit in benchmarks from osx where the drivers are known to be a magnitude crappier and see what impact it has.

BGLM2D: ~333FPS
GLM2d: ~375FPS
D3D7: ~135FPS
D3D9: ~230FPS

Here are also numbers for the updated benchmark below that does not try to kill the driver with the state settings in the 10000 draw loop, which shows that the driver is only one component in performant rendering.

BGLM2D: ~475FPS
GLM2d: ~545FPS
D3D7: ~170FPS
D3D9: ~230FPS

SuperStrict

Import cower.bufferedglmax2d

'SetGraphicsDriver BufferedGLMax2DDriver()
SetGraphicsDriver GLMax2DDriver()
'SetGraphicsDriver D3D7Max2DDriver()
'SetGraphicsDriver D3D9Max2DDriver()

Graphics 800, 600

Local pixmap:TPixmap = CreatePixmap(128, 128, PF_RGBA8888)
For Local x:Int = 0 To pixmap.width - 1
	For Local y:Int = 0 To pixmap.height - 1
		WritePixel(pixmap, x, y, (255 Shl 24) | (x Shl 16) | (y + 128 Shl 8) | x + y)
	Next
Next

Local image:TImage = LoadImage(pixmap)

Global FPSLASTUPDATETIME:Int
Local time:Int = MilliSecs()

SetBlend ALPHABLEND

While Not KeyDown(KEY_ESCAPE)
	Cls		
	SetScale 0.1, 0.1
	SetAlpha 0.75
	For Local c:Int = 1 To 10000
		DrawImage(image, Rnd(GraphicsWidth()), Rnd(GraphicsHeight()))
	Next
	
	time = MilliSecs()
	SetScale 2, 2
	
	DrawText "FPS: " + FPS(), 0, 0
	
	Flip 0
	FPSLASTUPDATETIME = FPSLASTUPDATETIME + (MilliSecs() - time)
Wend

Function FPS:Float(frequency:Int = 200)
	Global FPSFRAMECOUNT:Int
	Global FPSLASTCOUNT:Float
	Local time:Int = MilliSecs()
	FPSFRAMECOUNT=FPSFRAMECOUNT+1
	Local elapsed:Float = time - FPSLASTUPDATETIME
	If elapsed>=frequency
		FPSLASTCOUNT=FPSFRAMECOUNT/elapsed*1000.0
		FPSFRAMECOUNT=0
		FPSLASTUPDATETIME=time
	EndIf
	Return FPSLASTCOUNT
End Function

EDIT: OK here now OSX 10.6.2 data (mbp with 8600GT)

BGLM2D: ~340FPS
GLM2d: ~270FPS

N	(Posted 2009) [#46]

Interesting that Mac OS performs better while Windows (seems to be specific to Windows Vista and up?) doesn't. I wonder what the cause of that is...

Edit: On the upside, I am very, very happy to see both OpenGL drivers significantly outperforming the Direct3D drivers.

Tommo

(Posted 2009) [#47]

Result here (Win32 Release mode):

BGL: ~230 fps
GL: ~178 fps

D3D7: ~45 fps
D3D9: ~45 fps

plash

(Posted 2009) [#48]

Using Dreamora's code under Ubuntu 9.10 (release):
BGL: ~175-190fps
GL: ~135fps

MGE	(Posted 2009) [#49]

So far so good! Does setting a different blend mode or drawing images from other textures, before every DrawImage cause any problems?

ImaginaryHuman

(Posted 2009) [#50]

One thing to bear in mind is that when you draw with a vertex array you can't change textures mid-draw, so if you want to draw lots of individual images from different textures it won't be able to be any faster. But if you are re-drawing the same image or portions of the same texture it should be good.

N	(Posted 2009) [#51]

When I get the image packing for textures implemented, that should save on texture changes by quite a bit as well. I have some ideas for sorting/merging batches based on texture/whether or not alpha is used (the latter is not as useful as it sounds, since most people don't seem to switch to solid blending when alpha is unneeded, but is still required) and then having OpenGL sort the rendered primitives using simple depth testing.

MGE	(Posted 2009) [#52]

IH - Perhaps some kind of code based solution of a "StartBatch", "EndBatch" function could be used then. You start/end every time you switch textures.?

N	(Posted 2009) [#53]

The renderbuffer code already handles stuff like that to minimize state changes and so as much as possible (without sorting the data) can be rendered in a single call to MultiDrawArrays. More can be done, so the current system is doing pretty well given that it's not as optimized as it could be.

ImaginaryHuman

(Posted 2009) [#54]

Does the MultiDrawArrays let you apply a different texture to each array range?

N	(Posted 2009) [#55]

No, my point was that if you draw the same image a bunch of times in a row without changing blend mode or something, that's one call to MultiDrawArrays.

Robert Cummings

(Posted 2009) [#56]

Not sure if this is helpful but aligning along a boundary sure sped up my ogl iphone stuff up:

A structure similar to:

struct Vertex
{
GLshort x , y , z;
GLshort pad1; //padding for optimised alignment
GLubyte r,g,b,a;
GLshort u,v;
};

Sorry if its something you've covered.

N	(Posted 2009) [#57]

Not sure what you're talking about, considering it's not applicable- additionally, interleaved arrays are sort of bad, and it appears you're using them.

Dreamora

(Posted 2009) [#58]

On the iphone interleaved arrays are much more usefull than on the desktop due to the different hardware (OpenGL ES, tile based, ...), especially as there is no cpu and driver to help in the back.

But the other reason that padding there helps is the 32bit alignement of the things on the hardware. Again a thing where the driver on desktops jumps in.

N	(Posted 2009) [#59]

Ah, my apologies, then - I'm not familiar with OpenGL ES or developing for the iPhone beyond simple GUI applications.

N	(Posted 2009) [#60]

Just wrote/pushed the texture packing code (despite this being finals week - I'm really bad at focusing on what I'm supposed to be doing). For an interesting test, take this modified copy of Dreamora's code:

SuperStrict

'buildopt:release
'buildopt:gui

Import cower.bufferedglmax2d

SetGraphicsDriver BufferedGLMax2DDriver()
'SetGraphicsDriver GLMax2DDriver()
'SetGraphicsDriver D3D7Max2DDriver()
'SetGraphicsDriver D3D9Max2DDriver()

Graphics 800, 600

Local pixmap:TPixmap = CreatePixmap(128, 128, PF_RGBA8888)
For Local x:Int = 0 Until pixmap.width
	For Local y:Int = 0 Until pixmap.height
		WritePixel(pixmap, x, y, (255 Shl 24) | (x Shl 16) | (y + 128 Shl 8) | x + y)
	Next
Next

Local images:TImage[64]
For Local i:Int = 0 Until images.Length
	images[i] = LoadImage(pixmap)
Next

Global FPSLASTUPDATETIME:Int
Local time:Int = MilliSecs()

SetBlend ALPHABLEND

While Not KeyDown(KEY_ESCAPE)
	Cls		
	SetScale 0.1, 0.1
	SetAlpha 0.75
	For Local c:Int = 1 To 10000
		DrawImage(images[Rand(0, images.Length-1)], Rnd(GraphicsWidth()), Rnd(GraphicsHeight()))
	Next
	
	time = MilliSecs()
	SetScale 2, 2
	
	DrawText "FPS: " + FPS(), 0, 0
	
	Flip 0
	FPSLASTUPDATETIME = FPSLASTUPDATETIME + (MilliSecs() - time)
Wend

Function FPS:Float(frequency:Int = 200)
	Global FPSFRAMECOUNT:Int
	Global FPSLASTCOUNT:Float
	Local time:Int = MilliSecs()
	FPSFRAMECOUNT=FPSFRAMECOUNT+1
	Local elapsed:Float = time - FPSLASTUPDATETIME
	If elapsed>=frequency
		FPSLASTCOUNT=FPSFRAMECOUNT/elapsed*1000.0
		FPSFRAMECOUNT=0
		FPSLASTUPDATETIME=time
	EndIf
	Return FPSLASTCOUNT
End Function

The only changes are that it will load 64 textures and randomly draw them each frame (also some syntactical thingies that bugged me). Chances are GLMax2D will be a bit crippled when you run it with that driver, by the way. Your results will vary, but with my driver, I get a ~300 FPS increase over GLMax2D.

Just for the hell of it, could someone run that with the D3D drivers as well?

Dreamora

(Posted 2009) [#61]

MBP with 8600GT:

BGLM2D: ~130fps
GLM2D: ~30fps

so a performance raise of over 400%

plash

(Posted 2009) [#62]

Win32, with ATI Radeon 2600:

BGLM2D: ~150fps
GLM2D: ~30fps
D3D7: ~20fps
D3D9: ~17fps (fairly solid)

N	(Posted 2009) [#63]

Results on my MBP (using the 9400M, not the 9600M GT) were about ~305 FPS for BGLM2D and ~40fps for GLMax2D, so I'm getting over 750% on my end.

Edit: Direct3D results are hilarious.

xlsior

(Posted 2009) [#64]

bglm2d: ~242
glmx2d: ~40
d3d7: ~55
d3d9: ~88

Interesting.

Unlike most of the other people that posted numbers here, I'm getting a higher FPS in DX7 and DX9 than in the unbuffered GLM2D driver... Yet the buffered one blows then all out of the water.

Windows 7 x64, ATI Radeon HD4670

plash

(Posted 2009) [#65]

Unlike most of the other people that posted numbers here, I'm getting a higher FPS in DX7 and DX9 than in the unbuffered GLM2D driver...

Has to do with the video card mostly, I think.

EDIT: Well, that, and I think Win7 is optimized for DirectX somehow.

N	(Posted 2009) [#66]

A buffered D3D9 driver would probably beat the buffered GL driver for Windows, but I'll leave the task of writing that nightmare to someone else (mainly because I don't have any code editors for Windows that are as good as TextMate anymore - don't suggest alternatives, they won't be good enough). A buffered D3D7 driver is probably hopeless, and I'm not entirely sure why the D3D7 implementation is even still around. Probably all of the paranoid people who think it's critical to have for older systems.

Jur	(Posted 2009) [#67]

bglm2d: ~36
glml2d: ~43
dx7 and dx9 : ~1

Vista, Amd X2, NVidia 7300

N	(Posted 2009) [#68]

Something is wrong with your system if you're getting rates like that - especially with D3D9 under Vista. Either that or you're compiling in debug mode.

Tommo

(Posted 2009) [#69]

Win32 Release

bgl: ~187
gl: ~38

dx9: ~22
dx7: ~22

Nice result, isn't it? :)

MGE	(Posted 2009) [#70]

Will this work with the new draw commands?

DrawSubImage
DrawSubImageRect

Jur	(Posted 2009) [#71]

Yep, I compiled in debug mode.

Here are the release mode results:

bglm2d: ~144
glm2d: ~147
dx7 and dx9: ~1

Looks that I have installed one interesting driver... great OGL (for my gfx card) and bad DX.

Vista, Amd X2, NVidia 7300

Armitage 1982

(Posted 2009) [#72]

I'm sorry but I would like to try this but I get this compiling error :

bmk makemods cower.bufferedglmax2d
Compiling:bufferedglmax2d.bmx
Compile Error: Overriding method differs by type
[C:/BlitzMax/mod/cower.mod/bufferedglmax2d.mod/bufferedglmax2d.bmx;73;2]
Build Error: failed to compile C:/BlitzMax/mod/cower.mod/bufferedglmax2d.mod/buf
feredglmax2d.bmx

I'm using BlitzMax 1.34 and I have downloaded both modules :
mod/cower.mod/renderbuffer.mod
mod/cower.mod/bufferedglmax2d.mod

EDIT Hum... I suppose it's for 1.36 only :(

Tommo

(Posted 2009) [#73]

The Max2D has been modified since 1.35 for virtual resolution and subimage support.
If you have to stick with 1.34, you can tweak the code easily. They are just some small changes.

Armitage 1982

(Posted 2009) [#74]

Yeah
(I really should upgrade to 1.36 but I read so many bugs everywhere...)

I replace the draw method at 73;2

Method Draw(x0#, y0#, x1#, y1#, tx#, ty#, sx#, sy#, sw#, sh#)

with :

Method Draw(x0#, y0#, x1#, y1#, tx#, ty# )
	local sx#=0.0
	local sy#=0.0
	local sw#=_texture._pwidth
	local sh#=_texture._pheight

I suppose it's ok...

The example provided here work well (a little faster than normal GLMAX2D)

I had to change any TGLImageFrame casting with TGLBufferedImageFrame
and of course every reference to UV in _texture

But what is exactly _gseq (GraphicsSeq) and how to use glBindTexture now since I need a reference to the texture ?
Is this _texture._owner._name ?

GrabImage isn't available anymore too since grabPixmap isn't implemented :(
(But I know that render to texture isn't correctly supported on every graphic card, what's the solution then ?)

Anyway, even without including GrabImage in my game (and maybe because of my poor hack on Draw()) many things are incorrectly rendered and my fps drop from 330 to 260:-(

Maybe there is things to do with setTransformation() ? everything was incorrectly scaled with.

N	(Posted 2009) [#75]

From what I could see, GraphicsSeq is a way of determining when the driver/resolution/context/etc. has changed in such a way that your resources will be lost. Could be wrong, but this stuff isn't documented, so not a lot to do about that other than trust that using it the same way the default implementations would is correct.

To set the texture, you have to use TRenderState.SetTexture(_texture._Name()). You absolutely should not call glBindTexture. If you want to render something using OpenGL, you should first call driver._buffer.Render(); driver._buffer.Reset(), or use the driver's TRenderBuffer to do the drawing (the latter will be faster).

If you're calling glBindTexture or changing GL state without reverting it to its previous state, and/or (if you're using glBegin/glEnd to draw stuff) not telling the driver's RenderBuffer instance to draw and reset itself, or you've modified other code that breaks something, things will be drawn incorrectly. This is how things work with GLMax2D, more or less (the texture packing changes things a bit), and it's not going to change for the buffered driver.

I don't plan on supporting render to texture/framebuffers since 1) Max2D does not work well with them and 2) I see no reason to. The issue isn't hardware support - I haven't seen any modern hardware that doesn't support render to texture in some form. I would highly discourage any use of GrabPixmap because that will be a bottleneck regardless of the driver you use.

SetViewport is also not implemented, which depending on your use of it may also introduce artifacts. You decided not to show a picture to illustrate exactly what is incorrect, so I can't really do much more than guess.

Armitage 1982

(Posted 2009) [#76]

Thanks
It probably come from the SetViewport since I use it often (artifacts indeed).
Probably from a wrong direct call to OpenGL too :)

Well I'm not using RTT because I only grabImage some specific constructed texture at the level loading.
No need for real time RTT for now, but I will probably search a solution for that.

Pete Rigz

(Posted 2009) [#77]

Got around to taking another look at this and threw another test at it. I made a particle effect to give it plenty of different state changes to think about and whatever. Here is the download link: http://www.rigzsoft.co.uk/files/GFXTest.zip It inlcudes an exe for windows and the source, but you'll need my timelinefx module to run it. I'll upload a mac version in a bit.

Here are my results:

brl.opengl: 155
brl.dx7: 81
brl.dx9: 132
cower.bufferedgl: 44

I'm quite surprised by those results, but quite a bit a bit slower. I guess it could be that if you want to take advantage of the buffered mode, a little rethinking about how you render stuff is in order? The engine isn't doing anything special, just setting states then drawing each particle as necessary. Hmm, will test on the mac, maybe it's a windows 7 thing.

N	(Posted 2009) [#78]

No, doesn't seem to be a Windows 7 thing. Curious.. I don't have time to look at this now, but I'll have to keep it in mind. In the meantime, if anyone wants to cram some profiling code into the module, that'd be handy.

N	(Posted 2010) [#79]

I'm not exactly fond of bumping my old posts, but this may or may not warrant it.

I've updated the renderbuffer module with a new implementation that isn't actually written in BlitzMax - it's C++ now. The reason for this is mainly because dealing with a lot of data in BlitzMax doesn't work in practice. There is an upper limit to how much you can throw at BlitzMax before either the GC or the tons of allocations made will bog it down (this is why the last implementation of renderbuffer avoided arrays like the plague, because the GC would take a huge toll on the speed). So, for the most part, the renderbuffer module is C++. There are some types that wrap the C++ API, and that's it. The TRenderBuffer wrapper only gets allocated once. There may be some overhead with it being a wrapper and making function calls that way, I don't know for certain since I didn't feel like swapping out the existing API for the C++.

Also added is scissor testing to the renderbuffer code, but that's not really significant in any way. It's faster, or should be faster. I'm debating about whether or not I'll remove the use of the STL altogether right now, since there's probably some overhead associated with using std::vector (the typename says deque in the source code, you can browse through the commits to see why that is). I don't know for certain yet, but it helped get the implementation done quickly, so that's what's important. I have a lot of bad things to say about the way the the new operator works in C++, but that's for another time and only after a lot of testing (my fixation on malloc is probably unhealthy).

There are a few bug fixes and what-not. I guess those would just be expected. Anyhow, if you're still even vaguely interested in this, give it a try. There ought to be an improvement with Rigz's test, although how much of an improvement you'll see will obviously vary between systems. There's a bit more parity between implementations in that particular case, but I'm not really sure how to address the situation there in the first place.

xlsior

(Posted 2010) [#80]

Nice!

N	(Posted 2010) [#81]

Added GrabPixmap/DrawPixmap and fixed another bug. Nothing really interesting right now. Edit: Also tweaked some other things, at least in the TimelineFX test and on my system, both drivers seem to be more or less even. Whether or not this will always be the case is, as usual, indeterminable without actually seeing the module in use somewhere.

LAB[au]

(Posted 2010) [#82]

Altough the renderbuffer and bufferedglmax2d modules compile fine, I have this error while importing the module in the test code:

 Building test
Compiling:test.bmx
flat assembler  version 1.68  (1947950 kilobytes memory)
3 passes, 3792 bytes.
Linking:test.exe
C:/Program Files/BlitzMax/mod/cower.mod/renderbuffer.mod/renderbuffer.release.win32.x86.a(renderbuffer.cpp.release.win32.x86.o):renderbuffer.cpp:(.text+0x103e): undefined reference to `_imp____glewLockArraysEXT'
C:/Program Files/BlitzMax/mod/cower.mod/renderbuffer.mod/renderbuffer.release.win32.x86.a(renderbuffer.cpp.release.win32.x86.o):renderbuffer.cpp:(.text+0x1081): undefined reference to `_imp____glewUnlockArraysEXT'
C:/Program Files/BlitzMax/mod/cower.mod/renderbuffer.mod/renderbuffer.release.win32.x86.a(renderbuffer.cpp.release.win32.x86.o):renderbuffer.cpp:(.text+0x114d): undefined reference to `_imp____glewMultiDrawArrays'
Build Error: Failed to link C:/Program Files/BlitzMax/mod/cower.mod/bufferedglmax2d.mod/test.exe
Process complete

Using latest Blitzmax and recommanded MinGW, I have a fairly old graphic card (NVidia GeForce 7600GT).

Any idea?

Thanks

N	(Posted 2010) [#83]

I'm afraid I've tried dealing with this bug before and don't have a fix. You'll have to work on fixing it yourself since I'm really far too busy right now.

Armitage 1982

(Posted 2012) [#84]

I know it's old but I step into the same issue and was wondering if nobody ever found a solution for this before ?
I don't even know where comes the _imp____glewLockArraysEXT and _imp____glewMultiDrawArrays I run into.

Would be useful to have something like this for particles engine rendering.

AdamRedwoods

(Posted 2012) [#85]

he may be using the glew library, which isn't with mingw. you can try finding a prebuilt glew.dll (may not be the real filename) somewhere and that may work.

Derron

(Posted 2012) [#86]

Armitage - have you found a solution?
Does the module work for you in threaded mode (it crashes the app on my linux build)

Until it compiles in windows:

?Win32
SetGraphicsDriver GLMax2DDriver()
?not Win32
SetGraphicsDriver BufferedGLMax2DDriver()
?

bye
Ron

Armitage 1982

(Posted 2012) [#87]

I stop trying this early and never test anything in threaded mode yet.
I changed so many things here and there that I'm no more able to use most of the tweak found for the rendering part of Max2D.
I wonder if learning OpenGL and forget about Max2D is not the best solution for speed improvement.