Quick acess to pixels - PBO

BlitzMax Forums/BlitzMax Programming/Quick acess to pixels - PBO

Casaber(Posted 2016) [#1]
I finally got some time on my hands so I laid the time to try to implement an PBO in BMax.
It got slow. I´m not sure why I´m sure I missed something in the implementation because it's not easy to remember the OpenGL implementation correctly. I need to work some more.
But it´s a good update and step forware that I thought I'll share.

' Init Graphics
Global w:Int = DesktopWidth() ; h:Int = 1080 ; HideMouse
SetGraphicsDriver GLGraphicsDriver() ; Graphics w,h,32,60 ; glewinit

' Init Variables
Local pixels:Int[w*h] , pointer:Byte Ptr , pbo:Int

' Init PBO, Note: Uses single PBO, which makes the greatest difference, 2 PBO's gives a bit more boost, and 3 PBO's seem not to give much additional speed at all. 1 or 2 PBO's does.
glGenBuffers 1,Varptr pbo
glbindbuffer GL_PIXEL_UNPACK_BUFFER,pbo
' glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Varptr pixels[0],GL_STREAM_DRAW ' You could use other hints such as GL_STATIC_DRAW, GL_DYNAMIC_DRAW.
glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Null                 ,GL_DYNAMIC_DRAW ' Upload empty buffer to prevent stall later.

Repeat

		' ---------------------------------------------------------------
		' TRIG DATA TRANSFER 
		  glbindbuffer GL_PIXEL_UNPACK_BUFFER,pbo
                ' glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Pointer(0),GL_STREAM_DRAW
		  gldrawPixels w,h,GL_BGRA,GL_UNSIGNED_BYTE,Null ' This returns immediately, it triggers an asynchrounous DMA transfer,
		'  Null should actually be an integer offset when used liked this, BMax seem Not To like it.
		' ---------------------------------------------------------------

		' Do a frame worth of work here while data is transferred.
		  Delay 1 ; Flip 1

		' ---------------------------------------------------------------
		' WAY 1, ACCESSING DATA using pointer
		' glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Null,GL_STATIC_DRAW
		' pointer = glmapbuffer(GL_PIXEL_UNPACK_BUFFER,GL_WRITE_ONLY) ' GL_WRITE_ONLY is one of the hints available.
		' a = a + 1 ; b = 64 * Sqr(a)*Cos(a) ; For y=0 Until h ; For x=0 Until w ; c = x + b * b Shr 8 * x+yy ; pointer[x*4 + y*4*w] = c ; Next ; Next
		' glunmapbuffer GL_PIXEL_UNPACK_BUFFER
		' ---------------------------------------------------------------
		
		' ---------------------------------------------------------------
		' WAY 2, ACCESSING DATA using GLBUFFERSUBDATA
	 	  a = a + 1 ; b = 64 * Sqr(a)*Cos(a) ; For y=0 Until h ; For x=0 Until w ; c = x + b * b Shr 8 * x+yy ; pixels[x + y*w] = c ; Next ; Next
		  For x=0 To 511 ; For y = 0 To 511 ; pixels [z+x + y*w] = 655350 Next ; Next ; z=z+1
	          glbuffersubdata GL_PIXEL_UNPACK_BUFFER,0,w*h*4,Varptr pixels[0]
		' ---------------------------------------------------------------

 		glbindbuffer GL_PIXEL_UNPACK_BUFFER,0
Until KeyHit(KEY_ESCAPE)



dw817(Posted 2016) [#2]
Hi Casaber:

It's definitely running smoothly. Have you benchmarked it against the classics of writepixel(), plot, set array.pixels[] and company ?


Casaber(Posted 2016) [#3]
Simple test to see the difference

Normal writepixels

' Init Graphics
Global w:Int = DesktopWidth() ; h:Int = 1080 ; HideMouse
SetGraphicsDriver GLGraphicsDriver() ; Graphics w,h,32,60 ; glewinit

' Init Variables
Local pixels:Int[w*h]

Repeat
		For temp=1 To 5000000 ; Next ' Payload

                  a = a + 1 ; b = 64 * Sqr(a)*Cos(a) ; For y=0 Until h ; For x=0 Until w ; c = x + b * b Shr 8 * x+yy ; pixels[x + y*w] = c ; Next ; Next
		  For x=0 To 511 ; For y = 0 To 511 ; pixels [z+x + y*w] = 655350 Next ; Next ; z=z+1
		  gldrawPixels w,h,GL_BGRA,GL_UNSIGNED_BYTE,pixels ' Write fullscreen 4 times
		  gldrawPixels w,h,GL_BGRA,GL_UNSIGNED_BYTE,pixels
		  gldrawPixels w,h,GL_BGRA,GL_UNSIGNED_BYTE,pixels
		  gldrawPixels w,h,GL_BGRA,GL_UNSIGNED_BYTE,pixels
		  Delay 1 ; Flip 1

Until KeyHit(KEY_ESCAPE)


Here's the same, using PBO, there should be a visible boost pushing it up.
' This example is geared towards iMac's 2010, but you could try it and if you got a more powerful machine just increase the payload and the number of screen writes on both
just about till about the normal pixelwrite dies. Me personally I get almost x2 boost with this single PBO. And most of all there's much higher chance to keep smoothness as the
OS won´t interfere with anything as much. No jitter.

Bonus is that you may use this for threads, update graphics in a separate thread suits this perfectly. This is valuable and I'm sure lot of you know what I´m talking about.
It will proabably be the perfect match for Monkey to get smooth graphics.

' Init Graphics
Global w:Int = DesktopWidth() ; h:Int = 1080 ; HideMouse
SetGraphicsDriver GLGraphicsDriver() ; Graphics w,h,32,60 ; glewinit

' Init Variables
Local pixels:Int[w*h] , pointer:Byte Ptr , pbo:Int

' Init PBO
glGenBuffers 1,Varptr pbo
glbindbuffer GL_PIXEL_UNPACK_BUFFER,pbo
' glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Varptr pixels[0],GL_STREAM_DRAW ' You could use other hints such as GL_STATIC_DRAW, GL_DYNAMIC_DRAW.
glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Null                 ,GL_DYNAMIC_DRAW ' Upload empty buffer to prevent stall later.

Repeat

		' ---------------------------------------------------------------
		' TRIG DATA TRANSFER 
		  glbindbuffer GL_PIXEL_UNPACK_BUFFER,pbo
                ' glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Pointer(0),GL_STREAM_DRAW
		  gldrawPixels w,h,GL_BGRA,GL_UNSIGNED_BYTE,Null ' This returns immediately, it triggers an asynchrounous DMA transfer,
		'  Null should actually be an integer offset when used liked this, BMax seem Not To like it.
		' ---------------------------------------------------------------

		' Do a frame worth of work here while data is transferred.
		For temp=1 To 5000000 ; Next ' Payload
		  Delay 1 ; Flip 1

		' ---------------------------------------------------------------
		' WAY 1, ACCESSING DATA using pointer
		' glBufferData GL_PIXEL_UNPACK_BUFFER,w*h*4,Null,GL_STATIC_DRAW
		' pointer = glmapbuffer(GL_PIXEL_UNPACK_BUFFER,GL_WRITE_ONLY) ' GL_WRITE_ONLY is one of the hints available.
		' a = a + 1 ; b = 64 * Sqr(a)*Cos(a) ; For y=0 Until h ; For x=0 Until w ; c = x + b * b Shr 8 * x+yy ; pointer[x*4 + y*4*w] = c ; Next ; Next
		' glunmapbuffer GL_PIXEL_UNPACK_BUFFER
		' ---------------------------------------------------------------
		
		' ---------------------------------------------------------------
		' WAY 2, ACCESSING DATA using GLBUFFERSUBDATA
	 	  a = a + 1 ; b = 64 * Sqr(a)*Cos(a) ; For y=0 Until h ; For x=0 Until w ; c = x + b * b Shr 8 * x+yy ; pixels[x + y*w] = c ; Next ; Next
		  For x=0 To 511 ; For y = 0 To 511 ; pixels [z+x + y*w] = 655350 Next ; Next ; z=z+1
	          glbuffersubdata GL_PIXEL_UNPACK_BUFFER,0,w*h*4,Varptr pixels[0]
		  glbuffersubdata GL_PIXEL_UNPACK_BUFFER,0,w*h*4,Varptr pixels[0]
		  glbuffersubdata GL_PIXEL_UNPACK_BUFFER,0,w*h*4,Varptr pixels[0]
		  glbuffersubdata GL_PIXEL_UNPACK_BUFFER,0,w*h*4,Varptr pixels[0]
		' ---------------------------------------------------------------

 		glbindbuffer GL_PIXEL_UNPACK_BUFFER,0
Until KeyHit(KEY_ESCAPE)



Casaber(Posted 2016) [#4]
If you want, make a benchmark, I think this is mighty interesting for everyone.

Especially for all money making app programmers who wants smooth graphics.
This could be your ticket.


Casaber(Posted 2016) [#5]
I´m not at all happy with WAY1 * though so don´t be alarmed about that one if you try it, it has some bug in it. Dead slow and not even visually the same as it should, but I´m closing down for today. I´ve been debugging for hours, it's one of those stupid misstakes where you need a break to see it.


dw817(Posted 2016) [#6]
Hi Casaber. Have a good rest. I've been busy working on this.

Sat down and wrote a pretty comprehensive Benchmark. Now, because you are using GLGraphicsDriver(), no normal plotting commands are available, so you can't see the frames per second in realtime.

Yet, I am calculating it. If you hit [ESC] you can see the final results.

With this code, we can finally see what is what !

Here was one of the messages I got, still trying to post this one bit of code.



Definitely something screwy with the server here.


dw817(Posted 2016) [#7]


Going to have to give up, every time I try to paste the whole thing and hit UPDATE I get this:



An excess of 14-times today so far !

I have no idea what's going on. Find the source HERE:

https://www.mediafire.com/?jmqb5s742xx3x9z


Derron(Posted 2016) [#8]
Your post might screw the server. There are some blitzmax codes which you cannot post here without erroring out a 500...
Do this three times (literally) and the server crashes...which leads to an outtime of 20-30 minutes.

Its surely known but I doubt that the webmasters really care (else we would have jsbased syntax highlighting, dynamically adjusted widths of codeboxes... standard stuff for more than just 5 years now)

Bye
Ron


dw817(Posted 2016) [#9]
Lovely. Well, now I know. I get one 500 page and I'll be posting my code to Mediafire instead, Ron.

Thanks for the info ...


kfprimm(Posted 2016) [#10]
dw817, if it's just snippets, use GitHub. The gist section is perfect.

Sign up for an account so you can edit them later.

https://gist.github.com/


Derron(Posted 2016) [#11]
pastebin.com will do too (if you do not care for syntax highlighting).

@github
you could even create a simple "mytests"-project, and within "issues" (create new issue) you could drop media files which are automatically uploaded then. After this upload you get a useable http(s)-link for the media. You could use this to publish the needed media for snippets you provide. (saves the hassle to use imguri, abload,...)


bye
Ron


dw817(Posted 2016) [#12]
Okay, I'm on GitHub. Here is the link:

https://gist.githubusercontent.com/dw817/e75ebd49c9a3ab822a0d/raw/4152dc1297b0b7344f857d1e07d6baf3e7ce40cf/Benchmark%2520Fastest%2520Pixel%2520%28BlitzMAX%29

I'm curious now. Can you post ANY text there ? What is the limit on length ? Suppose it was just UUEncode ? Would GitHub complain about that ?


Casaber(Posted 2016) [#13]
Dw817 Okay you need text? I'll put it together with my VBO Shader becuase that one can use the GLMAX2d driver easly.

I think it would be the perfect mix. PBO extended VBO's and Shaders. That would allow to draw primitives, quads, images and still have quick pixelaccess (both CPU and GPU via shaders) everywhere.

The important bit is the pixels though. That benchmark crashed my iMac unfortuntley when I downloaded it, I need to check it up.


kfprimm(Posted 2016) [#14]
https://help.github.com/articles/what-is-my-disk-quota/

They are pretty lax. I'm sure you'll get an email if you get excessive.

BTW, if you call it "Benchmark Fastest Pixel.bmx" instead of "Benchmark Fastest Pixel (BlitzMAX)" you should get syntax highlighting.

Also, link to the gist page instead of the raw file. That way others might fork and improve it.

https://gist.github.com/dw817/e75ebd49c9a3ab822a0d


grable(Posted 2016) [#15]
I tried your benchmark Dw817, but im not seeing stellar results.
And i have a beast of a machine and gpu, though it might be the resolution that kills it..
RES: 1920x1080

typ1 = 0
typ2 = 0
	FPS: 1.177
	
typ1 = 0
typ2 = 1
	FPS: 10.1

typ1 = 0
typ2 = 2
	FPS: 12.88
	
typ1 = 0
typ2 = 3
	FPS: 13.33
	
typ1 = 1
typ2 = 0
	FPS: 1.166

typ1 = 1
typ2 = 1
	FPS: 10
	
typ1 = 1
typ2 = 2
	FPS: 12.88
	
typ1 = 1
typ2 = 3
	FPS: 13.33

EDIT: Just to add, i dont really like the windowed "fullscreen". It doesnt play well with the taskbar on Windows 10, i suspect its size is hardcoded?


dw817(Posted 2016) [#16]
Well Grable. Here's the point. I don't think it gets any faster than this. If you want to write a benchmark program to check your own resolution and you can get faster than 28.5714 frames per second (which is what I got), then that would be of considerable interest.

Your resolution is set pretty high. Knock it down to 1024x768 (where I tested mine) and see if that helps to increase the FPS.

And yes, plotting random dots all over the screen will ALWAYS slow down a system. That's why I wrote this program, to see which method does it the quickest.

And if you know of a method faster than Casaber's GlBufferSubData(), I am certainly willing to look at it.


TomToad(Posted 2016) [#17]
If all you want is random dots, you could use shaders. Run this and choose any picture. Use the up/down cursor keys to vary how much picture vs random dot shows.



grable(Posted 2016) [#18]
At 1024x768 i got as high as 35 which seems about maximum i can get seeing as the loop that sets random pixels take roughly 30 milliseconds to complete.
And there is only 1 fps difference between TPixmap.Pixels and the GLSUbBuffer thing too...

I tried an old directx 7 sample i had and it was even slower :/

It seems that the only way to even approach 60 fps one would need to use the GPU for everything as the CPU just cant keep up at higher resolutions.

TomToads recent foray into shaders shares no such limitation :)


Casaber(Posted 2016) [#19]
That benchmark is not accurate. Usually I get 60 fps even on 1024x1024 Pixmaps though it might stutter at times but it's mainly 60fps.

Here in the benchmark I get 5-20 fps and tops 40 fps on anything. It's not easy to make a good benchmark that's why I usually use my eyes.
I think I should build a number crunching benchmark though but I doubt I will use it much.

Right now I´m trying all kinds of full HD screen fillers, and Bmax with CPU is capabable to keep up with most shaders on a 2012 machine.
1.5+Ghz and DDR3 is a must though, or you will shoot in the dark and hope for the best.


dw817(Posted 2016) [#20]
Casaber, you may have a faster computer. Either that or write the model that shows different FPS numbers. This is the best I can do (my coding above) w my current knowledge of programming.

It is also possible that the random number generator is slowing it down. If you want to bench test your graphics, you should take into account that someone somewhere is going to place non-uniform data into your screen. It can't always be a simple formula or calculation.

You also have the added advantage of skipping the routine that displays the FPS as your graphics mode won't allow me to use DrawText(), so you really are getting a bit more juice in your FPS calculation.

And I wouldn't complain. So far your routine is definitely the fastest one out there and you are to be commended on this ! :D

TomToads recent foray into shaders shares no such limitation :)

TomToad, if you would kindly post a model where I can use a For/Next loop to stuff random pixels into your 'screen' and display it, I can certainly add your method to the foray and determine its speed compared w the others.


Casaber(Posted 2016) [#21]
dw817 Boy do I have something for you next time !! :)

About the comparison of the normal pixelwriter and PBO both should should work on an average machine. As long as they have DD3 or better memoeries for main ram.
Computers with integrated computer graphics card are forced to have good speed on their RAM memories, so Integrated chips are always PERFECT for softwarendering.

Maybe DD2 memoery will do but not with high resolutions , that's the reason why some Samsung phones have really bad pixelsaccess (they need the bus and memoryspeed its nothing wrong with their GPU). Samsung and other brands quickly changed it into DDR3 on their new mobiles.
Iphone used software pixel blitting to get their smooth scrolling at the beginning, and they still use a mix of hardware and software. Android copied them with JellylBean onward I'm sure. Hardware software mix is the way to go. :) And that's what I have for you ;) You'll love it.

I have no doubt that this is the way to go at least until Vulcan. From there on I guess maybe there have to be a drastic change, or not.

I just need to get some bugs out of the way and perhaps a nice demo this time ? I' lazy about those things. I want to get to the essentials as quick as possibl.

So I have CPU rendering + GPU shader going on at the same time with text scaling rotation, textand primitives works
I guess I could write a Quad for you aswell and then all holes you mentioned would be filled. I'm need to find out the OpenGL bible for the bugs though first.

I really need to try get some kind of demo on this. I''m not sure what to do


dw817(Posted 2016) [#22]
Looking forward to seeing it, Casaber !

I'm still experimenting with writing a 6-bit encryption routine. I have need of it in the 750k Carryall program.