Fade Screen speed challenge!

BlitzPlus Forums/BlitzPlus Programming/Fade Screen speed challenge!

Grey Alien(Posted 2005) [#1]
OK I have this code to fade the current screen (which is 800x600x32bits) on my PC.



The problem is that it is really slow (takes 1/4 second or so)with debug On or Off (i.e. fullscreen or windowed) even in a compiled .exe. My PC is a P4 3.2Ghz with 1GB RAM and a Radeon 9800XT so it ain't slow.

I wanted to call this function in a loop from 255 to 0 to get a full screen colour fade as the user changes screens in my game (simple huh?).

I tried copying the backbuffer to a tempimage, using setbuffer to point to the tempimage and performing the fade on that but it was just as slow. Then I realised that BackBuffer is probably the same dynamic video memory as any image made with CreateImage anyway, doh.

My challenge for you guys (and girls?) to make it faster or show me another way!

Also as a matter of interest, if you have drawn a screen onto the backbuffer, and then used flip, how can you make a copy of what is now displaying on-screen? I say this because after flip, backbuffer is no longer what you just drew, it is the previous frame! Sounds like frontbuffer used to be support but is no more. I know the obvious solution is to copy backbuffer BEFORE calling flip, but I uh don't want to due to the way my code is structured.

Sorry for the long post, but I eagerly await your replies.


Grey Alien(Posted 2005) [#2]
Hmm, what about using gamma settings? Or would I have to set a new value for every possible colour ie. 255*255*255?!


Bremer(Posted 2005) [#3]
One way of gaining some speed would be to do one loop before the fading which reads the pixels of the backbuffer into an array and then use that to read the pixels from as its faster than doing readpixel each update.

If you want to save your backbuffer, then you can either use copyrect or grapimage.


WolRon(Posted 2005) [#4]
ReadPixelFast is SLOW. Don't use it.


sswift(Posted 2005) [#5]
This should work:

.Main

	Current_Time = MilliSecs()
	
	Repeat
			
		Last_Time    = Current_Time
		Current_Time = MilliSecs()
		Time_Passed  = Current_Time - Last_Time
		Seconds_Passed# = Time_Passed / 1000.0
		
                ; Take two seconds to fade to black.
		Fade(2.0, Seconds_Passed#)

		Flip True 
	
	Until KeyHit(1)
	
End	
	
Function Fade(Time_To_Fade#, Time_Elapsed#)

	Local X, Y
	Local RGB
	Local R, G, B
	Local Multiplier#

	Multiplier# = 1.0 - (Time_Elapsed# / Time_To_Fade#) 
	If (Multiplier# < 0) Then Multiplier# = 0
	
	LockBuffer 
	For Y = 0 To ScreenHeight-1
		For X = 0 To ScreenWidth-1

			RGB = ReadPixelFast(X, Y) 

			R = (RGB Shr 16) And 255 
			G = (RGB Shr 8 ) And 255
			B = (RGB       ) And 255		
			
			R = R * Multiplier#
			G = G * Multiplier#
			B = B * Multiplier#
			
			RGB = (R Shl 16) Or (G Shl 8) Or B
			
			WritePixelFast X, Y, RGB
				
    	Next
	Next 
	UnlockBuffer 
	
End Function



First, that function will be faster because the code is more optimzied, and second, it will be faster because it skips frames.

There is no need to draw every change in shade if the screen is going to fade out in half a second, and your monitor won't even be able to display more than perhaps 75 changes in shade a second because that's how many times the screen buffers get flipped.

But this really isn't the best way to do it. I would make a copy of the image you want to fade, and then always work off the colors in that, multiplying them by a value that I interpolate from 1 to 0 over the specified period of time.

But this should be suitable for your needs.

Btw, I'm sure the code archives or Blitzcoder probably had a function to do this already. :-)


One more thing!


If you copy the image to a set of three arrays before you start working on it, then you can grab the R G B values for each pixel from those arrays and multiply those instead, and it should be MUCH faster, because as someone said, readpixel is slow.

Since you're working at a fixed resolution, it's feasable for you to use arrays like this.


Grey Alien(Posted 2005) [#6]
Glad you rose to the challenge! The code I posted was a simplified segment to get across the point about speed (or lack of!) I was already calling my routine 5 times with fade values from 0 to 250 (step 50) rather than one frame per single shade change, and I was using a copy of the BackBuffer as the "source" to fade onto the real BackBuffer each time, so I am glad you concur SSwift :-) Your optimisations are good, I was waiting for someone to do some binary shifting, didn't know it could be done in BB, takes me back to my assembly days. Also the interpolation is a nice touch as I was worried that the fade would run at different speeds on different machines.

It seems like a good idea to use an array if ReadPixelFast is so slow! Although the slowest part may be creating the array in the first place. This may show as a small pause before the fade actually begins. The only problem with arrays is I have to define a large amount of memory that never gets freed until the program exits because you can't dynamically free arrays can you? What about using a bank? Maybe this would be even faster (or as fast) and it can be freed up later.

Wolron ... is there anything faster than ReadPixelFast then? I presume not. What about WritePixelFast, is that slow too, even on a locked buffer. Some tests I did seemed to have positive results. Is reading just a lot slower than writing? You'd think it would be the other way round, but then I guess video memory is designed for rapid output rather than input.

Oh yeah I checked Blitzcoder already and couldn't find anything. There was some fading coding that my routine was based on but my aim was to be able to ultimately just say FadeScreen() and for it to do it quickly and smoothly.

*** Anyone still tempted to answer my second question about how to get a copy of the frontbuffer (previous backbuffer) after calling flip?


sswift(Posted 2005) [#7]
"It seems like a good idea to use an array if ReadPixelFast is so slow! Although the slowest part may be creating the array in the first place. This may show as a small pause before the fade actually begins. The only problem with arrays is I have to define a large amount of memory that never gets freed until the program exits because you can't dynamically free arrays can you? What about using a bank? Maybe this would be even faster (or as fast) and it can be freed up later."

It's less than a megabyte. Most people have 256 minimum. I think you'll live. :-)

You could use a bank, but they are not as fast as arrays.

Actually I think there is a way to redimension an array in Blitz now that I think of it... But I think you can't then access X,Y with it, you will have to do Y*Width + X, which actually would be another speed boost because you can move the Y*Width outside the X loop. Heck, you can get rid of it entirely if you just increment the offset every pixel now that I think of it. But one multiply every row isn't a big problem.

As for the pause... copying the screen once to an array at the start is probably not going to cause a visible pause.

As for something faster than readpixelfast, there is in Blitzplus, but to use it you have to find out the color depth, and the "pitch" of the screen, and it's just a nightmare to use, and the speed boost is like 33% in 24bit color mode I think. Don't ask me how to use it. :-)

Oh, and it is my understanding that Blitzplus has no frontbuffer you can access. There is only a backbuffer, and frontbuffer() accesses it.


aab(Posted 2005) [#8]

As for something faster than readpixelfast, there is in Blitzplus, but to use it you have to find out the color depth, and the "pitch" of the screen

Do you mean LockedPixels()?

LockedFormat just tells you what to read eg:
3=24 bit so every 3 bytes of lockedpixels is of a BlueByte a GreenByte and a Red Byte.
with a 16 bit format you can use binary to get R,G and B from RGB eg:
g= (RGB_Short and %0000001111100000)shr 5


sswift(Posted 2005) [#9]
LockedFormat tells you the color format.

LockedPixels gives you a pointer to a bank where you can read said pixels from.

LockedPitch tells you how many bytes are in each row. This number will depend on the number of pixels and how many bytes each pixel takes up.

But there might be more to it than that. It might be possible to have a pitch which is larger than the number of bytes needed to represent one row of pixels at your current image size. I don't know if that is the case or not.

I;m also pretty sure that it is fastest to peek longints from this buffer. So to achieve max speed you'd need to peek multiple longints and save the values to create every 2nd and 3rd pixel or something.

Like I said, it's complicated. :-)


Qube(Posted 2005) [#10]
readpixelfast, writepixelfast.. kind of an oxymoron when expecting to do full screen operations.

I wish there was a faster way in blitz to do full screen pixel manipulation.


sswift(Posted 2005) [#11]
In C when you want to manipulate the screen, you don't manipulate the screen buffer directly, you manipulate a bank of memory. But in Blitz, manipulating a bank of memory seems to be relatively slow, because Poke and Peek seem to be function calls, and not converted directly to memory accesses. And that is why Poke and Peek aren't really any faster than WritepixelFast either. WritePixelFast after all is really writing to a buffer in memory, which is then sent to the video card later for rendering, as far as I know. So they both just access a buffer in memory. But it's the way that they do it that is slower than doing the same thing with an array.

You might be able to redimension an array in Blitz to take advantage of the array speed... I think you can do it in functions, but maybe those have to be a fixed size per function... Hm... But even if you can you still have to copy all your images to arrays to manipulate them like this unless you were able to store all your images in arrays.

So without arrays being able to be redimensioned, and without making ALL your images in array format, then you're stuck. And don't forget either that writepixelfast converts on the fly to the current bitdepth. You'd have to do that yourself too.


WolRon(Posted 2005) [#12]
You'd think it would be the other way round, but then I guess video memory is designed for rapid output rather than input.

Bingo. Never read from video memory unless absolutely necessary.


Bremer(Posted 2005) [#13]
In my experience its better to create an image as large as the screen, and then write to that and do one drawblock instead of using CLS when doing pixel stuff. You can use peekInt and pokeInt with BlitzPlus and with B3D using that userlib that Andreas Blixt have posted in the BlitzCoder showcase. Its faster than using RPF and WPF.


Grey Alien(Posted 2005) [#14]
zawran, any chance you can post a link? I found this one which sounds similar but the .rar that downloads seems to be corrupt.
 http://www.blitzcoder.com/cgi-bin/showcase/showcase_showentry.pl?id=grable07182003232217&comments=no 
The posts infer you may be able to Poke and Peek an image buffer with Ints which sounds great (If it's that simple ;-) ).

"You could use a bank, but they are not as fast as arrays."
OK this saves me some testing time, thanx. although it does seem a bit dumb that direct memory access is slower than arrays, after all what is a CPU for!

"Actually I think there is a way to redimension an array in Blitz now that I think of it... But I think you can't then access X,Y with it, you will have to do Y*Width + X, which actually would be another speed boost because you can move the Y*Width outside the X loop. Heck, you can get rid of it entirely if you just increment the offset every pixel now that I think of it. But one multiply every row isn't a big problem."
Yup, I thought the same thing, increment a counter for each pixel read, store the values in a linear array. In delphi instead of writing var = var + 1 you can write Inc(var) which compiles to assembly which is nice and fast!

I have heard something about redimensioning arrays but I can't think where. A bummer for me is that you can't make a Type with a dimension as a field but on BlitzCoder I read about an undocumented feature called BlitzArray (has anyone used this?). It seems that you CAN use this as part of a Type which means you could make a new Image Type that stores the pixels in a BlitzArray. (I hope Blitz frees it properly!) Then you could write your own handling functions and improve on the "Fast" pixel routines. I'll look into it and post any results back.

P.S. can anyone tell me how to reference some text from another post in that grey box?


sswift(Posted 2005) [#15]

"You could use a bank, but they are not as fast as arrays."
OK this saves me some testing time, thanx.



Always test! I am not infallible. :-) I could be mistaken, or my results could have only applied to my own PC/video card.


As for types..

Type Blah
Field Foo[4]
End type

There's your array. But you can't adjust the size.

You could however have multiple types for different array sizes.

Type SixForty
Field Pixels[640*480]
End Type

Type EightHundred
Field Pixels[800*600]
End Type


But you'd have to have two or more copies of every part of your code which actually needs to access the screen if you did that.

I use bracketed arrays all the time in types, they work just fine.


Bremer(Posted 2005) [#16]
bankhandle = lockedpixels()

The above will give you the handle to the image buffer that you have locked, and with this you can use peek and poke directly with it. This can be done with B3D as well, take a look at this showcase from Andreas:

http://www.blitzcoder.com/cgi-bin/showcase/showcase_showentry.pl?id=andreas_blixt02172004221424&comments=no


Grey Alien(Posted 2005) [#17]
sounds great zawran but all Andreas' links are broke (probably due to his domain expiring!)


Bremer(Posted 2005) [#18]
I am on vacation, otherwise I could have hosted the files, hopefully someone else have them.


Grey Alien(Posted 2005) [#19]
zawran, you must be a forum addict if you are on vacation (not at home?) ;-)


Grey Alien(Posted 2005) [#20]
OK here is my final code which is LOADS faster than my start code. It works best over a duration of 1 second for asmooth transition. Note you can press any key to abort the fade.

The array of pixels makes a huge difference. There is a small delay as it is filled but this really isn't noticeable. This code is still slow in windowed mode due to WritePixelFast (haha) but in full screen mode it rocks. Thanks everyone esp. sswift.



To use it, call ccSetupFadeScreen() first, then all you need to do is call ccFadeScreen(x) where x is the number of seconds that you wish the fade to last. Don't forget to set the screen width and height constants up properly!


Grey Alien(Posted 2005) [#21]
btw, "cc" stands for CommonCode as I am building up a ginat include file of reusable commands and tricks. I'll post the whole thing one day to the showcase.

Also the above "may" work faster if you don't bother using the fstemp image and write to the backbuffer directly, but then again it may not if the video memory is slow, I dunno.


sswift(Posted 2005) [#22]
You know if you just use drawblock, then you won't need to use the cls color or clear the screen before you draw. You are basically drawing your image minus all the black pixels, on top of a black background which fills the back pixels in again.

Also you may get a speed increase if instead of having one array, you have three, and when you initially start the fade you split the RGB components up into these arrays. Then when you're actually fading you won't have to do that conversion for every pixel.

That will waste an additional four megs of ram, but what else were you planning to do with it? :-)


Bremer(Posted 2005) [#23]
When I get home, and if I remember, then I'll host the files for the B3D lockedpixels stuff and post an example of how to directly access the imagebuffers, which will then works with both B+ and B3D.


Grey Alien(Posted 2005) [#24]
thanks zawran.

Also sswift, the DrawBlock is a good idea as I tried to opimised out the cls earlier but couldn't do it! Hmm 5.5MB, do I care about it, maybe not!

I'd love to put the pixel manipulation code in a function but it would crawl. Is there no way to make an inline function in Blitz? i.e. the function compiles to code in the calling function, you know what I mean. (probably not I guess)


sswift(Posted 2005) [#25]
No, there's no way to do inline functions.


aab(Posted 2005) [#26]
You could try accessing the locked pixels, then changing them in a userlib function for direct memory access.
passing the bank handle in would supply it as linear array in another (faster) language.


Zster(Posted 2005) [#27]
Well you can fake inline functions by writing your code in a separate file like "fade.bb" and then writing: 'include "fade.bb"' each time you want to bring in the function. Of course it couldn't actually be a function as this would result in multiple definitions so you'll have to write your code smartly. As far as I know blitz will inlude that piece of code at each spot when it compiles which is similar to the C++ preprocessor. I seriously doubt the function overhead is high enough to consider doing it here.


Bremer(Posted 2005) [#28]
Here is the link to the lockedpixels lib that Andreas Blixt did.

http://zac-interactive.dk/temp/lockedpixels.zip

I haven't had time to make an example, but there are a couple in the zip that Andreas did.


Grey Alien(Posted 2005) [#29]
Zster, that is an interesting idea. Actually the function needs to be called 480,000 per frame so the overhead could be worth considering!

Thanks zawran, hope you had a nice holiday. Where did you go and where u from? btw "passtime"? is this a pun or spelt wrong?


Bremer(Posted 2005) [#30]
My holiday was great. I am originally from Denmark, but have lived and worked in the USA, and now Sweden. I have just resigned my job here though and will be moving back to Denmark at the beginning of April.

"passtime" is how you spell it, as far as I know, its the time you have available outside of work or school.

I am almost done with a redesigned website, where I will be posting a showcase for the demos, tools and games I have coded so far with blitz. The plan is to have some 2d gfx effect tutorials done as well some time in future for people to learn from, if I find the time :)

I will see if I can't get an example posted sometime tomorrow evening, as I am too tired tonight.


Grey Alien(Posted 2005) [#31]
I have been to Copenhagen, it was great and the primitive robot controlled underground is impressive if you sit right at the front! Also went to the . uh . sex museum to find out how you humans do it.

I believe you may mean "pastime" with only one s (although 2 is more logical).

Look forward to the 2D graphics examples.


Bremer(Posted 2005) [#32]
Yes you are probably right about the "pastime" thing, English isn't my first language :)

Its 3am and I couldn't sleep, so here is an example of doing pokeInt directly to the backbuffer and imagebuffers. Using pokeInt directly on the backbuffer doesn't work on all graphics cards from my experience, so its better to use imagebuffers and do drawimage or drawblock for this if you want to be sure.



[edit]
This will work with B3D as well using that lockedpixels lib that Andreas made which I linked to previously.


Grey Alien(Posted 2005) [#33]
Thanks dude I'll give it a go. I find it is best not to program directly before sleep otherwise I'll stay awake thinking of cool things (maybe this is good, but not if you have to get up in the morning, ah when I was on the dole and a batchelor ...). I normally read sci-fi to wind down or "Earth's funny old fashioned ideas" as we aliens think of it.


WillKoh(Posted 2005) [#34]
I hope noone minds that I raise this topic again a bit after it ended. Some forums are very particular about such things. Anyway, how would one go about writing a function in C or assembler that accesses (sp? ugh.. what a word that is) the memory directly? I think I know the languages as such but not how the hardware is designed or what windows will allow you to do...


rogue(Posted 2005) [#35]
I ran the ccFadeScreen routine on my machine and got about 10 fps. I rewrote the code to do the fade in C and now get about 100 fps. I only tested in in 32 bit graphics mode, but once I make sure that the routines can handle 15, 16 and 24 bits as well I will package it up and put it on my web site for everyone to use.

BTW, I modified the way it handles the time so that you can specify the amount of time you want it to fade. The routine calculates a multiplier based on the time for the first frame and the total time you want so that it can smoothly fade over that period of time.

I have two routines: fadeIn() and fadeOut().

Hopefully people can use the routines.


rogue(Posted 2005) [#36]
The source code for the DLL and the sample BlitzPlus demo are available on our website at:

http://www.homebrewsoftware.com/Download/HsFadeForBlitzPlus.zip

The project for the DLL is a Visual C 6.0 project. You can make your DLL using GCC if you don't have VC 6.0.

Enjoy!

- Ken Rogoway
Owner, Homebrew Software
http://www.homebrewsoftware.com


AdrianT(Posted 2005) [#37]
OOPS SORRY THOUGHT I WAS IN B3D FORUMS

get someone to do it in max with B3d pipeline, save an animated B3d where the visibility track is animated to your fade.

You can create a sequence of transitions, for different menu items, and then create sequences from the animated b3d file and set up actions in code to play the frames in the order you require.

Allows for a lot of creativity if you want it, and pretty easy to set up. You can animate anything you like in a flash kind of way WYSIWYG with camera and lights and all the coder does is tell the extensions what camera to use and what anim sequence to play back when.

you can also create hidden pick quads that you use for mouse roll over and click, just like you would for a web page.


Grey Alien(Posted 2005) [#38]
Rogue: did u run ccFadeScreen with debug mode on because when it is in Windowed mode it sucks? Full-screen is really fast.


rogue(Posted 2005) [#39]
Grey:

Yes I was running debug and you are right, your code is much faster in non-debug. The DLL I wrote works in both windowed and full-screen modes at very fast rates. I'll have to benchmark it in both modes. However, since your Blitz code is plenty fast in release and it is always best to use a high level language when possible, it makes good sense for people to use your routine.

I was just trying to help out since there had been a couple of comments about running it in a userlib or with a faster language.


Grey Alien(Posted 2005) [#40]
Rogue: Thanks, it was worth you doing it :-) You could post the DLL for others to have maximum choice?


rogue(Posted 2005) [#41]
Thanks. The link to the DLL is above in an earlier post. People are free to use it any way they want.


WolRon(Posted 2006) [#42]
Thanks for the code Grey Alien. It works great.


Grey Alien(Posted 2006) [#43]
no probs, hope you got the code that was posted halfway down the page as it was the fastest. Warning, it will be a teeny bit slow on v. old PCs.

btw, my earlier post about it being slow in windowed mode is wrong, it is just as fast in windowed mode. It's just not fast with Debug on (of course) and debug was running in windowed mode and I confused the two, I know better 10 months on ;-)