Fastest plot function. Help!

BlitzMax Forums/BlitzMax Programming/Fastest plot function. Help!

BinaryBurst

(Posted 2011) [#1]

I would need a function that does only one thing: plot on the screen.
It should look like this:

Function plot(x,y,r,g,b)
'code
endfunction

The problem is that it has to be ultrafast, like 60,000,000*60 plots per second.(that for my 1.6 GHz system)
So, if you could find any sort of this super-duper fast plot function, that would be just awesome! (It can also be in c++, because it works just fine with blitz) Thanks :D

Last edited 2011

GfK	(Posted 2011) [#2]

>>The problem is that it has to be ultrafast, like 60,000,000*60 plots per second.(that for my 1.6 GHz system)
So, if you could find any sort of this super-duper fast plot function, that would be just awesome! (It can also be in c++, because it works just fine with blitz) Thanks :D

Short answer: Aaaahahahahahaha!

Long answer: You're asking for 3.6 billion plots per second, so, no. If you need that many plots, you're doing something wrong.

BinaryBurst

(Posted 2011) [#3]

ok then as fast as can be... but it should be as fast as it can be. :D

BinaryBurst

(Posted 2011) [#4]

Try this:

t=MilliSecs()
For i=1 To 360000000
	a:+1
Next
Print MilliSecs()-t

I get 677ms

So it's kinda possible. :)

Last edited 2011

xcessive

(Posted 2011) [#5]

Thats more pixels than there are on my screen. Your a complete derp.

ima747

(Posted 2011) [#6]

Insulting someone does no good but degrade the community.

The base problem is that presumably you want to accomplish something outside of the scope of any reasonable system given the limitations of modern computers. You could certainly make a computer perform that many operations in a second, but it would likely require multiple processor cores... and you don't need that many operations, you need MANY more to plot a point. And then you have to get all that data to the graphics card which a) is a huge bottle neck and b) can ONLY be done on the main thread under most architectures...

This also doesn't take into account the overhead of your program coming up with those plot points in the first place.

If you want as fast as possible, start with the simplest solution and then try to find ways to optimize it. You are likely to find that if your optimal goal is 2x faster than a single core system is capable of adding that you're going to be so far off your target that it's not worth pursuing and you should really look into another way of attacking the core problem.

As noted before, that's more points than are on a standard monitor by a factor of in my cause roughtly 300... there's no need to plot more points than can be seen, so logical optimization #1 would be find a way to cull plots that don't need display and don't bother trying to plot them... if you can group things so you can batch dismiss plots you could get a highly optimized culling routine that trims your data set.

Speaking of your data set, if you are representing each plot as an ARGB+XY value in memory, in ints, and you need 360 million of them, that's roughly 16gb of ram... just think how long it's going to take to churn through that large a data set, and again, this is not factoring in overhead of an object etc. Presumably you're not saving the data set, but rather are dynamically generating it which means less ram but WAY more processor time to generate it...that means less time available to actually plot the points...

You haven't addressed the base reasons for needing to plot that many points in the post, but I think that would be critical. In a real world practical sense, crunching the numbers, no matter where they come from or where they go, you will never get close to real time performance out of them. As a result you need to identify if WAAAY slower is still a viable option for your project, or if your concept is fundamentally untenable on current commercial hardware.

BinaryBurst

(Posted 2011) [#7]

hmmm... :) Thanks

Jesse

(Posted 2011) [#8]

I went in to the core of the module and was able to extract this from the DX7 module:

Strict

SetGraphicsDriver D3D7Max2DDriver()

Graphics 800,600

Global gfx:Tmax2dGraphics = Tmax2dGraphics.Current()
Global driver:TD3D7Max2DDriver = TD3D7Max2DDriver(gfx.driver())
Global device:IDirect3DDevice7 = driver.device
driver.SetActiveFrame Null
Local m:Int = MilliSecs()
For Local y = 1 To 400
	For Local x = 1 To 400
		driver.cverts[0]=x+.5001
		driver.cverts[1]=y+.5001
		device.DrawPrimitive(D3DPT_POINTLIST,D3DFVF_XYZ|D3DFVF_DIFFUSE,driver.cverts,1,0)
	Next
Next

this code is about 1/9 faster than the high level plotting because it reduces the number of function calls and is probably as fast as it will get when programming in BlitzMax.

I didn't bother to check the dx9 module because it's a bit more complicated to get to its core and I am not interested in digging any further.

as you you will be able to see even a 400x400 takes quite a while to process and I highly doubt that it's even possible to get the 3.6 billion pixels per second you are looking for on any graphics card.

[edit]
tested on my Macbook under windows

Last edited 2011

Oddball

(Posted 2011) [#9]

Your best bet would be to plot to a pixmap and then draw that to the screen, but you'll still be short of the speed you need.

SuperStrict

Graphics 800,600

Global pm:TPixmap=CreatePixmap(800,600,PF_RGB888)
Global time:Int, count:Int, x:Int, y:Int

Repeat
	count=0
	time=MilliSecs()
	Repeat
		WritePixel pm,x,y,$FF0000
		count:+1
		x:+1
		If x>=800
			x=0
			y:+1
			If y>=600 Then y=0
		EndIf
	Until MilliSecs()-time>=1000
	
	DrawImage LoadImage(pm,0),0,0
	
	DrawText "Number of pixels plotted in 1 sec: "+count,0,0
	
	Flip
Until AppTerminate() Or KeyHit(KEY_ESCAPE)

Last edited 2011

Armitage 1982

(Posted 2011) [#10]

Maybe you could explain what you are trying to achieve?
One nice thing with graphics is that you can most of the time fake the result you aim to.

ImaginaryHuman

(Posted 2011) [#11]

Nah you can do faster than individual plots.

But you need to get into vertex arrays .... create a vertex array of GL_POINTS for example, squirt it over the bus and it'll render a lot more points in less time than individual plots which are all like GLVertex() function calls.

An even faster way is to store the position data for the points in a texture and use a shader to draw it - way way faster.

AdamRedwoods

(Posted 2011) [#12]

An even faster way is to store the position data for the points in a texture and use a shader to draw it - way way faster.

Snap!

BinaryBurst

(Posted 2011) [#13]

wow thanks you all! :D

xcessive

(Posted 2011) [#14]

Had a thought -- using specialized shaders, you could write something massively parallelized for the GPU. A GPU has 10s to 100s or cores, and if you can work out a good way to do this a parallel fashion you might just make it work. There is for example real time ray tracing which has to do a ray calculation for EVERY PIXEL IN THE SCREEN using these techniques -- and its still real time. The key is make sure the application scales with more cores. A good example is the version of merge sort that runs in O(1) time on a GPU since it creates a merge thread for every pair to be sorted/merged.

ImaginaryHuman

(Posted 2011) [#15]

I just said that ;-) .... there are particle systems that run entirely on the GPU with millions of particles.

H&K	(Posted 2011) [#16]

Lets be serious here for a moment, 60 Million pixels every 60th of a second is so many too many that there is a problem with the underlying idea / concept.
People can make semi useful comments along the lines of "Well only have 30 frames a second" or those above, but the basic problem is that whatever you are trying to do isnt do able. Oh maybe you can fudge some way with the shader idea to draw 60 million per frame (Im not good enough with shaders to say you couldn't), but you would still have to pass 60million discrete points, and calculate movement on 60 million points.

Basicly your idea is overly ambitious. And the Best help you can get ATM is to outline your idea and ask for feedback on scope and implementation.

However... Very few people are willing to put a good idea to there peer group, and being part of that peer group I can say there IS often a genuine reason for this, because if it is a good idea you will have clones of it all over the place as we all practice/theorise it.

When it comes down to it you need to give better info than "I need 60million plots per frame", but balance that with the justifiable fear, that although we here probably wouldn't "steal" you idea ... we might sorta program our own version

ImaginaryHuman

(Posted 2011) [#17]

You could feasibly do 60,000,000 plots per second if its 1024x768, at about 75Hz, where every pixel on the screen plots once per frame... but SIXTY times that much, is like drawing 60 1024x768 images per frame, which would require a significantly high-end beefy GPU just to do filled quads, yet alone doing it as individual pixels with particle animation.

zzz	(Posted 2011) [#18]

As others have already asked, why do you need this kind of performance? While you can get to ~25 megapixels worth of screen area with an eyefinity setup im very interested in what you are going to do with this. (Your ordinary 1080p display lands at ~2 megapixel as a comparison.)

The tiny benchmark you posted pretty much proves that its impossible to do on your system. A tight loop that dont have to touch anything but registers gives you ~300ms to spare. Expanding this code to (probably) manipulating and dispatching ~15gb of data off-die per second just wont work out.

BinaryBurst

(Posted 2011) [#19]

I just need as many particles as possible. :D That's all
I just want to write everything in a pixmap (pixel by pixel) and draw a single pixmap after that, because i know for sure that plotting on the screen with plot(x,y) function is 100 times slower.

Last edited 2011

ImaginaryHuman

(Posted 2011) [#20]

DrawPixmap is 10 times slower than DrawImage btw.

But I'm not sure if drawing pixmap pixels + drawimage is slower than a vertex array of points.

BinaryBurst

(Posted 2011) [#21]

well.. after some research i got this:
(my sis spec: 1.6ghz, intel gma 128mb vram, 2gb ram)

Global gw=1200,gh=780
Local a[gw,gh]
Local pix:TPixmap=CreatePixmap(gw,gh,pf_rgba8888)
ClearPixels(pix)

'write the screen 12 times
t=MilliSecs()
For i=1 To gw*gh*12
	a[30,30]=-1 'white
Next
Print "write screen 12 times: "+(MilliSecs()-t)+" ms"

'copy the pixels to the pixmap
t=MilliSecs()
For y=0 Until pix.height-1
	Local p:Int Ptr=Int Ptr (pix.pixels+y*pix.pitch+0*4)
	For x=0 Until pix.width-1
		p[x]=a[x,y]
	Next
Next
Print "copy pixels: "+(MilliSecs()-t)+" ms"

Graphics gw,gh
Cls
Delay(200)
'draw the pixmap
t=MilliSecs()
DrawPixmap(pix,0,0)
Flip
Print "draw pixmap: "+(MilliSecs()-t)+" ms"

for gw=1200 and gh=780 i get after 3 runs: 29,13,36 ms
for gw=320 and gh=240 i get after 3 runs: 2,0,2 ms

the second example means that you can draw the whole screen 84 times at 60 fps. soo... what do you think? :D

Last edited 2011

Nate the Great

(Posted 2011) [#22]

I'm not sure if its a problem with your code or my laptop (this is a pretty crappy one that im on right now) but it is refusing to draw the pixmap at the end and always throws an exception access violation... anyway I always use gl points when i want to draw lots of pixels or particles on the screen very quickly... I used it to make a small tank game where each pixel could be destroyed/created on an 800x600 screen and it worked really well for that