Writepixelfast....way to slow

BlitzPlus Forums/BlitzPlus Programming/Writepixelfast....way to slow

Mr Brine(Posted 2003) [#1]
Hi Blitzers!


Im been doing some programming with the writepixelfast command and basicly it seems to be pretty slow, drawing 19200 dots is using about 10% of my processor time (running at 50fps) on my p4 2.0ghz + gforce fx5600 gfx card, which to me seems pretty excessive, basicly I was wondering if its possible to write directly to video memory? Im fairly certain It aint possible with just blitz+, but I was thinking would it be possible to write a dll in c++ to write some kind of replacement writepixelfast function. Bascily I wanna know if it would be possible to do? and if anybody has done anything like this before? how sucsesful is it? (in terms of speed increase, or decrease). Any help would be greatly appriciated!

Ta

Mr Brine


Kevin_(Posted 2003) [#2]
Are you locking your graphics buffer first? Try this and see what happens. Also try it full screen mode and compare the differences.

Regards


Kevin_(Posted 2003) [#3]
http://www.blitzbasic.com/Community/posts.php?topic=26589


injeevious(Posted 2003) [#4]
I've also tried to get this command working at an acceptable speed and found it laughable that a raycaster can run on a 25mhz(386) smoothly (wolfenstien 3d), yet my 550 struggles on my raycaster (and a lot of other peoples) when ANY pixel writing command is used, even when the images are stored in a memory bank instead of readpixelfast. Until this is fixed I'm moving to c++.


Paradox7(Posted 2003) [#5]
Please Mark, give us more 2D power! fudge 3D! :P

Pixel Power to the max! Give us more speed, ungodly speed :D

ah well, may never happen, as they no longer care one dang bit about 2D, its all about 3D these days, for shame :(


skidracer(Posted 2003) [#6]
The LockedPixels command was suppose to keep you flatlanders happy. The bottle neck with the following is actually the slow PokeShort command in the UpdateBank function (move call to UpdateBank outside mainloop to see difference).

; fast bank to backbuffer refresh
; by skidracer

Const DWIDTH=640
Const DHEIGHT=480

Global tick

Function CopyBankToBuffer(bank,buffer)
; lock the image for byte transfer
	LockBuffer buffer
; test locked imagebuffer is 16 bits per pixel
	If LockedFormat(buffer)<>1
		UnlockBuffer buffer
		Notify("this test is 16 bit display mode only")
		End
	EndIf
; cache buffer variables	
	imagebank=LockedPixels(buffer)
	pitch=LockedPitch(buffer)
; copy bank to image line by line	
	For y=1 To DHEIGHT
		CopyBank bank,srcoff,imagebank,destoff,DWIDTH*2
		srcoff=srcoff+DWIDTH*2
		destoff=destoff+pitch
	Next
	UnlockBuffer buffer
End Function

Function UpdateBank(bank)
	argb=tick
	tick=tick+1
	o=0
	For y=1 To DHEIGHT
		For x=1 To DWIDTH
			argb=x+y+argb
			PokeShort bank,o,argb 
			o=o+2
		Next
	Next
End Function

Graphics DWIDTH,DHEIGHT,16

bank=CreateBank(DWIDTH*DHEIGHT*2)

While Not KeyHit(1)
; calc fps
	fcount=(fcount+1)
	If fcount=20
		t=MilliSecs()
		fps#=20000.0/(t-ftime)
		ftime=t
		fcount=0
	EndIf
; update screen
	UpdateBank(bank)
	CopyBankToBuffer(bank,BackBuffer())
	Text 0,0,"fps="+fps
	Flip 0
Wend

End



Bremer(Posted 2003) [#7]
In my demo called Internal Disaster I do a Keftale rutine that does 307,200 WPF each frame with 40fps on my 2.4ghz GeforceMX. So its not that slow. Here is a link so you can see it in action.

http://zac-interactive.com/demos/internal-disaster-demo.zip

If you feel like spending a few minutes then check my website for the other demos I have made in 2D, one of which has a 512x512 pixel realtime rotozoom that runs about 38fps on my machine.

Here's the website http://www.zac-interactive.com


Mr Brine(Posted 2003) [#8]
Thanks to everyone who posted a reply!

I tried skid racers routine out on my computer and this seems way faster then the routine I was using, drawing 307200 in less then 9% of availalbe processor time!!! WOW :-) (with out the update bank function) or 27% (with update bank function), this a serious speed improvement. The only prob being that my game is running in 32 bit color mode. But Im sure with a little tinkering I'll solve this problem, so big up skid racer!!!!! youre the dude!!!

Thanks

Mr Brine


skidracer(Posted 2003) [#9]
here's the 32 bit version:

; fast bank to backbuffer refresh
; by skidracer

Const DWIDTH=640
Const DHEIGHT=480

Global tick

Function CopyBankToBuffer(bank,buffer)
; lock the image for byte transfer
	LockBuffer buffer
; test locked imagebuffer is 32 bits per pixel
	If LockedFormat(buffer)<>4 
		UnlockBuffer buffer
		Notify("this test is 32 bit display mode only")
		End
	EndIf
; cache buffer variables	
	imagebank=LockedPixels(buffer)
	pitch=LockedPitch(buffer)
; copy bank to image line by line	
	For y=1 To DHEIGHT
		CopyBank bank,srcoff,imagebank,destoff,DWIDTH*4		
		srcoff=srcoff+DWIDTH*4
		destoff=destoff+pitch
	Next
	UnlockBuffer buffer
End Function

Function UpdateBank(bank)
	argb=tick
	tick=tick+1
	o=0
	For y=1 To DHEIGHT
		For x=1 To DWIDTH
			argb=x+y+argb
			PokeInt bank,o,argb 
			o=o+4
		Next
	Next
End Function

Graphics DWIDTH,DHEIGHT,32

bank=CreateBank(DWIDTH*DHEIGHT*4)

While Not KeyHit(1)
; calc fps
	fcount=(fcount+1)
	If fcount=20
		t=MilliSecs()
		fps#=20000.0/(t-ftime)
		ftime=t
		fcount=0
	EndIf
; update screen
	UpdateBank(bank)
	CopyBankToBuffer(bank,BackBuffer())
	Text 0,0,"fps="+fps
	Flip 
Wend

End



Mr Brine(Posted 2003) [#10]
Thanks Skid Racer, If I ever finish me game I'll be sure to credit you for youre help!


_Skully(Posted 2003) [#11]
Hey, learn something new every day... I didnt even know about those lockedpitch etc commands...sweet!

Nice Amiga style Demo Skidracer.. I used to love downloading those ;)

Skully


Bremer(Posted 2003) [#12]
Great code Skidracer.


Warren(Posted 2003) [#13]
I've also tried to get this command working at an acceptable speed and found it laughable that a raycaster can run on a 25mhz(386) smoothly (wolfenstien 3d), yet my 550 struggles on my raycaster (and a lot of other peoples) when ANY pixel writing command is used, even when the images are stored in a memory bank instead of readpixelfast. Until this is fixed I'm moving to c++.

If you need raw speed, what are you doing using a BASIC dialect anyway? Right tool for the right job, dude.


BlitzSupport(Posted 2003) [#14]
Whoa, Simon -- nice!


FlameDuck(Posted 2003) [#15]
If you need raw speed, what are you doing using a BASIC dialect anyway?
Does that even make sense to you?


Warren(Posted 2003) [#16]
Does that even make sense to you?

Yes. Why? Are you going to argue that C++ and other natively compiled languages are slower than BASIC?


FlameDuck(Posted 2003) [#17]
Yes. Why? Are you going to argue that C++ and other natively compiled languages are slower than BASIC?
Surely it depends on the *compiler*, not the language. I will argue that equally badly written compilers will compile equally slow code, regardless of the language. The reason C is percieved as a 'fast' language is because for the better part of 20 years, everyone who's written a thesis in compiler design, has done so using C (and more specificly the UNIX cc or GNU gcc) as a point of reference, not because the language is inherintly 'faster'.

Oh, and by the way, every modern BASIC I can think of is 'natively compiled'.


TeaVirus(Posted 2003) [#18]
Actually, if you use an array the FPS is almost as fast as when you take UpdateBank() out of the loop.

; fast bank to backbuffer refresh 
; by skidracer 

Const DWIDTH=800 
Const DHEIGHT=600 

Global tick 

Graphics DWIDTH,DHEIGHT,32 

Dim image(DWIDTH,DHEIGHT) 

While Not KeyHit(1) 
	; calc fps 
	fcount=(fcount+1) 
	If fcount=20 
		t=MilliSecs() 
		fps#=20000.0/(t-ftime) 
		ftime=t 
		fcount=0 
	EndIf 
	; update screen 
	UpdateArray() 
	CopyArrayToBuffer(BackBuffer()) 
	Text 0,0,"fps="+fps 
	Flip 
Wend 

End 

Function CopyArrayToBuffer(buffer) 
	; lock the image for byte transfer 
	LockBuffer buffer 
	; test locked imagebuffer is 32 bits per pixel 
	If LockedFormat(buffer)<>4 
		UnlockBuffer buffer 
		Notify("this test is 32 bit display mode only") 
		End 
	EndIf 
	; cache buffer variables 
	imagebank=LockedPixels(buffer) 
	pitch=LockedPitch(buffer) 
	; copy bank to image line by line 
	
	For y=0 To DHEIGHT-1 
		yoff=y*pitch 
		For x=0 To DWIDTH-1 
			PokeInt imagebank,yoff+(x*4),image(x,y) 
		Next 
	Next 
	
	UnlockBuffer buffer 
End Function 

Function UpdateArray() 
	argb=tick 
	tick=tick+1 
	For y=0 To DHEIGHT-1 
		For x=0 To DWIDTH-1 
			argb=x+y+argb 
			image(x,y)=argb 
		Next 
	Next 
End Function 



skidracer(Posted 2003) [#19]
Arrays it is then! Now all we need is a userlib version of CopyArrayToBuffer...


BlitzSupport(Posted 2003) [#20]
Hmm, that array version is 30 FPS slower here (55 vs 85)... I feel compelled to state that I'm temporarily (honest) using a GF2MX here. The CPU's an Athlon 2600...


TeaVirus(Posted 2003) [#21]
BlitzSupport:

I've just noticed that in the array version I posted I left the resolution set to 800x600 whereas the origional was 640x480 which could account for the difference you are seeing. Although, when I tested I got about 26 FPS with banks at 640x480 and 38 FPS with arrays at 800x600. This is on a laptop with Rage Mobility and 1ghz P3.


Simon S(Posted 2003) [#22]
Good stuff, It should definitly be in the manual as well. No sense in hiding Blitz+ best features is there?

I'm trying to figure out how to convert x and y to the location I need to plot into the bank. I thought it would be very easy, but I was wrong. I must be missing something obvious.

Can anyone clue me in and put me out my misery?


TeaVirus(Posted 2003) [#23]
Simon S:

It's in this bit of code:

	imagebank=LockedPixels(buffer) 
	pitch=LockedPitch(buffer) 
	
	; copy bank to image line by line 
	For y=0 To DHEIGHT-1 
		yoff=y*pitch 
		For x=0 To DWIDTH-1 
			PokeInt imagebank,yoff+(x*4),image(x,y) 
		Next 
	Next 


For the "Y increment" you need to use the pitch (LockedPitch(buffer)) of the screen buffer instead of Graphicswidth()*4 as these two values can vary. This is the problem I ran into when first experimenting with LockedPixels.


skidracer(Posted 2003) [#24]
If you are poking directly into the buffer's bank you need to take into account the LockedPitch-

PokeInt(lockedbank,y*pitch+x*4,argb)

If you are poking into the user bank then use

PokeInt(bank,(y*DWIDTH+x)*4,argb)

For 16 bit change the *4's to *2.


BlitzSupport(Posted 2003) [#25]
Sorry, Bryan, I didn't notice that it was in 800 x 600 -- doh! In fact, the two methods appear to be exactly the same speed here (hovering between 84-85 FPS)...


Mr Brine(Posted 2003) [#26]
you could speed it up a little more by replacing the *4's with "x shl 2" and the *2's with "x shl 1" where x is the value you wanna multipy


Mark Tiffany(Posted 2003) [#27]
On arrays, how about using fixed arrays? (square brackets, not curvy ones) Does that make it any faster?

I've never actually seen a massive speed increase with converting to SHL. I've suspected Blitz might optimise that out for you...


TeaVirus(Posted 2003) [#28]
Aren't Blitz arrays only available from within a type? If so, I think it would be quite a bit slower.


Simon S(Posted 2003) [#29]
And as Mark T mentions, I'm sure Mark Sibly said the compiler automatically makes all power of 2 multiplications and divisions into SHL SHR operations.

Can anyone else confirm this?


Mr Brine(Posted 2003) [#30]
I did some tests 'shift v's multiplcation / division' and I'd have to agree that blitz optimizes for powers of 2. I performed a math operation on a variable 50000 times using a for next to control the loop. The test was performed at 50fps. Please note theirs a another thread 'force refresh rate' that disputes the way I force the fps. Any how on with the results:

cw = cw / 3 (13%)
cw = cw / 4 (5%)
cw = cw shr 2 (5%)
cw = cw sar 2 (5%)
cw = cw * 5 (5%)
cw = cw shl 2 (5%)
cw = cw * 3 (6%)


Paradox7(Posted 2003) [#31]
So all this time, changing powers of 2 on over to shl/shr, ect ect, was just a big waste of time? and made the code a little uglier and a little more cryptive all for nothing!?! uh oh


Anthony Flack(Posted 2003) [#32]
Yup.

Well, no, actually I think sometimes SHR and SHL make good sense - if you're in a binary kind of mindset at the time. But not as an optimisation.


Imphenzia(Posted 2003) [#33]
Hmm, I'm not sure I follow. I've tried the LockedPixels and LockedPitch in BlitzPlus and it still doesn't come close to the performance of WritePixelFast on Locked buffers in Blitz2D. Where am I going wrong?


Matty(Posted 2006) [#34]
Yes I know this topic is 2 years old but I have to say:

THIS IS FANTASTIC - I've had blitzplus for a while, and have only started using it recently having become interested in 2d and losing interest in 3d, but never understood the limited documentation for 'lockedpixels' (ie the lack of examples)

In my current isometric engine which uses a zbuffer I could handle 30,000 pixels drawn per frame whereas with this method I can draw changing 800x600 screen 10 times per frame at a decent frame rate (50fps) . Boy I feel silly for not finding this before.
That is 4,800,000 pixels per frame that can be drawn! --->160 x number of pixels from before

Certainly changes things for my program quite a bit...