Writepixelfast....way to slow
BlitzPlus Forums/BlitzPlus Programming/Writepixelfast....way to slow
| ||
Hi Blitzers! Im been doing some programming with the writepixelfast command and basicly it seems to be pretty slow, drawing 19200 dots is using about 10% of my processor time (running at 50fps) on my p4 2.0ghz + gforce fx5600 gfx card, which to me seems pretty excessive, basicly I was wondering if its possible to write directly to video memory? Im fairly certain It aint possible with just blitz+, but I was thinking would it be possible to write a dll in c++ to write some kind of replacement writepixelfast function. Bascily I wanna know if it would be possible to do? and if anybody has done anything like this before? how sucsesful is it? (in terms of speed increase, or decrease). Any help would be greatly appriciated! Ta Mr Brine |
| ||
Are you locking your graphics buffer first? Try this and see what happens. Also try it full screen mode and compare the differences. Regards |
| ||
http://www.blitzbasic.com/Community/posts.php?topic=26589 |
| ||
I've also tried to get this command working at an acceptable speed and found it laughable that a raycaster can run on a 25mhz(386) smoothly (wolfenstien 3d), yet my 550 struggles on my raycaster (and a lot of other peoples) when ANY pixel writing command is used, even when the images are stored in a memory bank instead of readpixelfast. Until this is fixed I'm moving to c++. |
| ||
Please Mark, give us more 2D power! fudge 3D! :P Pixel Power to the max! Give us more speed, ungodly speed :D ah well, may never happen, as they no longer care one dang bit about 2D, its all about 3D these days, for shame :( |
| ||
The LockedPixels command was suppose to keep you flatlanders happy. The bottle neck with the following is actually the slow PokeShort command in the UpdateBank function (move call to UpdateBank outside mainloop to see difference).; fast bank to backbuffer refresh ; by skidracer Const DWIDTH=640 Const DHEIGHT=480 Global tick Function CopyBankToBuffer(bank,buffer) ; lock the image for byte transfer LockBuffer buffer ; test locked imagebuffer is 16 bits per pixel If LockedFormat(buffer)<>1 UnlockBuffer buffer Notify("this test is 16 bit display mode only") End EndIf ; cache buffer variables imagebank=LockedPixels(buffer) pitch=LockedPitch(buffer) ; copy bank to image line by line For y=1 To DHEIGHT CopyBank bank,srcoff,imagebank,destoff,DWIDTH*2 srcoff=srcoff+DWIDTH*2 destoff=destoff+pitch Next UnlockBuffer buffer End Function Function UpdateBank(bank) argb=tick tick=tick+1 o=0 For y=1 To DHEIGHT For x=1 To DWIDTH argb=x+y+argb PokeShort bank,o,argb o=o+2 Next Next End Function Graphics DWIDTH,DHEIGHT,16 bank=CreateBank(DWIDTH*DHEIGHT*2) While Not KeyHit(1) ; calc fps fcount=(fcount+1) If fcount=20 t=MilliSecs() fps#=20000.0/(t-ftime) ftime=t fcount=0 EndIf ; update screen UpdateBank(bank) CopyBankToBuffer(bank,BackBuffer()) Text 0,0,"fps="+fps Flip 0 Wend End |
| ||
In my demo called Internal Disaster I do a Keftale rutine that does 307,200 WPF each frame with 40fps on my 2.4ghz GeforceMX. So its not that slow. Here is a link so you can see it in action. http://zac-interactive.com/demos/internal-disaster-demo.zip If you feel like spending a few minutes then check my website for the other demos I have made in 2D, one of which has a 512x512 pixel realtime rotozoom that runs about 38fps on my machine. Here's the website http://www.zac-interactive.com |
| ||
Thanks to everyone who posted a reply! I tried skid racers routine out on my computer and this seems way faster then the routine I was using, drawing 307200 in less then 9% of availalbe processor time!!! WOW :-) (with out the update bank function) or 27% (with update bank function), this a serious speed improvement. The only prob being that my game is running in 32 bit color mode. But Im sure with a little tinkering I'll solve this problem, so big up skid racer!!!!! youre the dude!!! Thanks Mr Brine |
| ||
here's the 32 bit version:; fast bank to backbuffer refresh ; by skidracer Const DWIDTH=640 Const DHEIGHT=480 Global tick Function CopyBankToBuffer(bank,buffer) ; lock the image for byte transfer LockBuffer buffer ; test locked imagebuffer is 32 bits per pixel If LockedFormat(buffer)<>4 UnlockBuffer buffer Notify("this test is 32 bit display mode only") End EndIf ; cache buffer variables imagebank=LockedPixels(buffer) pitch=LockedPitch(buffer) ; copy bank to image line by line For y=1 To DHEIGHT CopyBank bank,srcoff,imagebank,destoff,DWIDTH*4 srcoff=srcoff+DWIDTH*4 destoff=destoff+pitch Next UnlockBuffer buffer End Function Function UpdateBank(bank) argb=tick tick=tick+1 o=0 For y=1 To DHEIGHT For x=1 To DWIDTH argb=x+y+argb PokeInt bank,o,argb o=o+4 Next Next End Function Graphics DWIDTH,DHEIGHT,32 bank=CreateBank(DWIDTH*DHEIGHT*4) While Not KeyHit(1) ; calc fps fcount=(fcount+1) If fcount=20 t=MilliSecs() fps#=20000.0/(t-ftime) ftime=t fcount=0 EndIf ; update screen UpdateBank(bank) CopyBankToBuffer(bank,BackBuffer()) Text 0,0,"fps="+fps Flip Wend End |
| ||
Thanks Skid Racer, If I ever finish me game I'll be sure to credit you for youre help! |
| ||
Hey, learn something new every day... I didnt even know about those lockedpitch etc commands...sweet! Nice Amiga style Demo Skidracer.. I used to love downloading those ;) Skully |
| ||
Great code Skidracer. |
| ||
I've also tried to get this command working at an acceptable speed and found it laughable that a raycaster can run on a 25mhz(386) smoothly (wolfenstien 3d), yet my 550 struggles on my raycaster (and a lot of other peoples) when ANY pixel writing command is used, even when the images are stored in a memory bank instead of readpixelfast. Until this is fixed I'm moving to c++. If you need raw speed, what are you doing using a BASIC dialect anyway? Right tool for the right job, dude. |
| ||
Whoa, Simon -- nice! |
| ||
If you need raw speed, what are you doing using a BASIC dialect anyway? Does that even make sense to you? |
| ||
Does that even make sense to you? Yes. Why? Are you going to argue that C++ and other natively compiled languages are slower than BASIC? |
| ||
Yes. Why? Are you going to argue that C++ and other natively compiled languages are slower than BASIC? Surely it depends on the *compiler*, not the language. I will argue that equally badly written compilers will compile equally slow code, regardless of the language. The reason C is percieved as a 'fast' language is because for the better part of 20 years, everyone who's written a thesis in compiler design, has done so using C (and more specificly the UNIX cc or GNU gcc) as a point of reference, not because the language is inherintly 'faster'.Oh, and by the way, every modern BASIC I can think of is 'natively compiled'. |
| ||
Actually, if you use an array the FPS is almost as fast as when you take UpdateBank() out of the loop.; fast bank to backbuffer refresh ; by skidracer Const DWIDTH=800 Const DHEIGHT=600 Global tick Graphics DWIDTH,DHEIGHT,32 Dim image(DWIDTH,DHEIGHT) While Not KeyHit(1) ; calc fps fcount=(fcount+1) If fcount=20 t=MilliSecs() fps#=20000.0/(t-ftime) ftime=t fcount=0 EndIf ; update screen UpdateArray() CopyArrayToBuffer(BackBuffer()) Text 0,0,"fps="+fps Flip Wend End Function CopyArrayToBuffer(buffer) ; lock the image for byte transfer LockBuffer buffer ; test locked imagebuffer is 32 bits per pixel If LockedFormat(buffer)<>4 UnlockBuffer buffer Notify("this test is 32 bit display mode only") End EndIf ; cache buffer variables imagebank=LockedPixels(buffer) pitch=LockedPitch(buffer) ; copy bank to image line by line For y=0 To DHEIGHT-1 yoff=y*pitch For x=0 To DWIDTH-1 PokeInt imagebank,yoff+(x*4),image(x,y) Next Next UnlockBuffer buffer End Function Function UpdateArray() argb=tick tick=tick+1 For y=0 To DHEIGHT-1 For x=0 To DWIDTH-1 argb=x+y+argb image(x,y)=argb Next Next End Function |
| ||
Arrays it is then! Now all we need is a userlib version of CopyArrayToBuffer... |
| ||
Hmm, that array version is 30 FPS slower here (55 vs 85)... I feel compelled to state that I'm temporarily (honest) using a GF2MX here. The CPU's an Athlon 2600... |
| ||
BlitzSupport: I've just noticed that in the array version I posted I left the resolution set to 800x600 whereas the origional was 640x480 which could account for the difference you are seeing. Although, when I tested I got about 26 FPS with banks at 640x480 and 38 FPS with arrays at 800x600. This is on a laptop with Rage Mobility and 1ghz P3. |
| ||
Good stuff, It should definitly be in the manual as well. No sense in hiding Blitz+ best features is there? I'm trying to figure out how to convert x and y to the location I need to plot into the bank. I thought it would be very easy, but I was wrong. I must be missing something obvious. Can anyone clue me in and put me out my misery? |
| ||
Simon S: It's in this bit of code: imagebank=LockedPixels(buffer) pitch=LockedPitch(buffer) ; copy bank to image line by line For y=0 To DHEIGHT-1 yoff=y*pitch For x=0 To DWIDTH-1 PokeInt imagebank,yoff+(x*4),image(x,y) Next Next For the "Y increment" you need to use the pitch (LockedPitch(buffer)) of the screen buffer instead of Graphicswidth()*4 as these two values can vary. This is the problem I ran into when first experimenting with LockedPixels. |
| ||
If you are poking directly into the buffer's bank you need to take into account the LockedPitch- PokeInt(lockedbank,y*pitch+x*4,argb) If you are poking into the user bank then use PokeInt(bank,(y*DWIDTH+x)*4,argb) For 16 bit change the *4's to *2. |
| ||
Sorry, Bryan, I didn't notice that it was in 800 x 600 -- doh! In fact, the two methods appear to be exactly the same speed here (hovering between 84-85 FPS)... |
| ||
you could speed it up a little more by replacing the *4's with "x shl 2" and the *2's with "x shl 1" where x is the value you wanna multipy |
| ||
On arrays, how about using fixed arrays? (square brackets, not curvy ones) Does that make it any faster? I've never actually seen a massive speed increase with converting to SHL. I've suspected Blitz might optimise that out for you... |
| ||
Aren't Blitz arrays only available from within a type? If so, I think it would be quite a bit slower. |
| ||
And as Mark T mentions, I'm sure Mark Sibly said the compiler automatically makes all power of 2 multiplications and divisions into SHL SHR operations. Can anyone else confirm this? |
| ||
I did some tests 'shift v's multiplcation / division' and I'd have to agree that blitz optimizes for powers of 2. I performed a math operation on a variable 50000 times using a for next to control the loop. The test was performed at 50fps. Please note theirs a another thread 'force refresh rate' that disputes the way I force the fps. Any how on with the results: cw = cw / 3 (13%) cw = cw / 4 (5%) cw = cw shr 2 (5%) cw = cw sar 2 (5%) cw = cw * 5 (5%) cw = cw shl 2 (5%) cw = cw * 3 (6%) |
| ||
So all this time, changing powers of 2 on over to shl/shr, ect ect, was just a big waste of time? and made the code a little uglier and a little more cryptive all for nothing!?! uh oh |
| ||
Yup. Well, no, actually I think sometimes SHR and SHL make good sense - if you're in a binary kind of mindset at the time. But not as an optimisation. |
| ||
Hmm, I'm not sure I follow. I've tried the LockedPixels and LockedPitch in BlitzPlus and it still doesn't come close to the performance of WritePixelFast on Locked buffers in Blitz2D. Where am I going wrong? |
| ||
Yes I know this topic is 2 years old but I have to say: THIS IS FANTASTIC - I've had blitzplus for a while, and have only started using it recently having become interested in 2d and losing interest in 3d, but never understood the limited documentation for 'lockedpixels' (ie the lack of examples) In my current isometric engine which uses a zbuffer I could handle 30,000 pixels drawn per frame whereas with this method I can draw changing 800x600 screen 10 times per frame at a decent frame rate (50fps) . Boy I feel silly for not finding this before. That is 4,800,000 pixels per frame that can be drawn! --->160 x number of pixels from before Certainly changes things for my program quite a bit... |