B+, Emulators, and ALPHA Sprites!

BlitzPlus Forums/BlitzPlus Programming/B+, Emulators, and ALPHA Sprites!

cbmeeks(Posted 2003) [#1]
Yeah, that's a weird topic I know..lol

Anyway, here is my thinking and please correct me if I am wrong. (only in theory now)

We all know that B3D (or 3D hardware in general) is much faster at creating transparencies, rotation, scaling, etc than 2D/software.

However, is 2D/software fast enough anyway for smaller sprites?

Let me back up, the other day I was playing Super Metroid on my laptop via the ZSNES emulator (don't worry, I own the game so my rom is legal). I was amazed at how SMOOTH the scrolling was! Even full screen! Also, in the game, there are sections where there appears to be clouds floating around and anything under the clouds is slightly altered in color. Transparency at its greatest.

Then, I got to thinking. ZSNES more than likely just uses Direct Draw. I don't think it uses any 3D. My laptop is only 800Mhz P3 with a crappy 32mb video card (not meant for gaming). Yet, everything was so smooth and nice! I was getting full screen scrolling, great sound, SMOOTH graphics (WITH ALPHA) on a "crappy" laptop using 2D??? So, why can't B+ (or BB) do some of the ALPHA tricks in 2D?? I realize you will never get 60,000 sprites at 500 FPS but come on. Why couldn't I make a game that was 640x480 (same res as the ZSNES) with maybe a layer with transparency? Or maybe a dozen small sprites with transparency?

Does this seem reasonable? I know there are some ALPHA commands out there for Blitz but are any of them any good? Has anyone done anything like this in 2D? Bottom line is this. My goal is to have maybe 4 layers scrolling (platform game) with maybe one of those layers a simple "transparent" layer. Or, have 3 layers with about 10 sprites that appear to be transparent all using B+.

I think it should be possible. I could maybe even include a "Turn ALPHA OFF" command for really slow computers.

What do you programmers think?

Thanks

cb


Imphenzia(Posted 2003) [#2]
I wish this was possible too... Alpha transparency in BB is beyond reach for real time use at the moment =(


fredborg(Posted 2003) [#3]
I think the Extended BB .dll allows for alpha transparency, etc. I'll dig out the thread...hang on...Here you go :)


Bremer(Posted 2003) [#4]
It's possible. Especially if you just want to use transparency on 10-12 sprites every now and then. In my demo vectorized2 I have a big 3D cube done in Blitz+ that rotates and is twisted and transparent at the same time. And when you consider how much the calculations and drawing of the cube itself takes, then you would see that if you were doing something like just drawing a screen full of squares you will have amble time to have a good amount of smaller images moving around on the screen transparent without too many problems. Everything must be done on a locked buffer with writepixelfast, but it can be done. Here is a link to that demo:

http://www.blitzcoder.com/cgi-bin/showcase/showcase_showentry.pl?id=zawran10302003103604&comments=no

Or you can check my demos at zac-interactive.com

To get the best result have a rutine that takes the sprite and puts into an array and then as you move the sprite around read from the array so that you only have to do one readpixelfast for each pixel, because the biggest slowdown is with the readpixel, not as much with the writepixel.


Anthony Flack(Posted 2003) [#5]
That's the thing - DON'T use readpixel. Reads from video memory are slow. It's perfectly possible to get nice full-screen alpha stuff in software, but programs that achieve this do so by keeping everything in system memory, and then dumping the entire buffer into screen memory when it's done.


(don't worry, I own the game so my rom is legal).


Yeah I was so worried. Can't have people admitting to having an illegal copy of something they stopped selling years ago...


cbmeeks(Posted 2003) [#6]
Yeah, I think you two are correct. Reading pixels is SLOOW. Hmmm...what do you guys think about doing everything in an array or bank? This would be like using system memory to do complex blits and then copying that to the backbuffer??

I may experiment with that. What do you guys think?

Listen, here is what I think...back years and years ago, I could take my 7.14Mhz Amiga 500, read from video memory, and plot sprites at 60 fps. SURELY our 500+ MHz machines nowadays can do it....lol

cb


********** EDIT ****************

Ok, I just coded a short example for array blitting. I am not too impressed. I haven't tried hard but I don't think it could get much faster. Normally, I get 100 FPS (vsync) with a standard cls/flip loop but using this code, I only get 47 fps. Jeesh.

I might forget the alpha blitting for a while. Or just stick with other people's code for the handful of sprites that I will need alpha-blitted.

-cb

;This is a test of advance blits using system memory
;cbmeeks


Const SCR_WIDTH = 640
Const SCR_HEIGHT = 480
Global Timer, FPS_Real, FPS_Temp,FPS


;screen array
Dim Screen(SCR_WIDTH,SCR_HEIGHT)

;graphics
Graphics SCR_WIDTH,SCR_HEIGHT,16,1
SetBuffer BackBuffer()

;clear arrays
Clear(0,128,0)


;main loop
Repeat

	;copy arrays to backbuffer
	LockBuffer
	For y=0 To SCR_HEIGHT-1
		For x=0 To SCR_WIDTH-1
			WritePixelFast x,y,Screen(x,y)
		Next
	Next
	UnlockBuffer
	
	DisplayFPS(0,0)
	
	Flip
	
Until KeyDown(1)



Function Clear(r,g,b)
	Local x,y
	For y=0 To SCR_HEIGHT-1
		For x=0 To SCR_WIDTH-1
			Screen(x,y)=GetRGB(r,g,b)
		Next
	Next
End Function


Function GetRGB% ( Red% , Green% , Blue% )  ; Combines Red, Green and Blue values into one RGB value
	Return Red Shl 16 + Green Shl 8 + Blue
End Function


Function DisplayFPS(x#,y#)
	Color 255,255,255
	If Timer + 1000 <= MilliSecs() Timer = MilliSecs() : FPS_Real = FPS_Temp : FPS_Temp = 0
	FPS_Temp = FPS_Temp + 1 : Text x#,y#,"FPS: " + FPS_Real
End Function




Bremer(Posted 2003) [#7]
If you are only doing a few sprites here and there, you don't have to wpf the entire screen. Just put the color values of the sprites into arrays and then only read the pixels that are in the actual location you are drawing to. If you are drawing a 32x32 pixel sprite then that is only 1024 rpf plus another 1024 wpf to write it back. If you have some kind of masking color you might even avoid a bunch of those as well with an if-check. A fullscreen in the above is 307,200 rpf+wpf's which ofcause is somewhat on the slow side. But you can have plenty of smaller sprites moving around no problem.

You could do something like this:

Function drawTransp(x,y,transp)
transpn = 1-transp
LockBuffer BackBuffer()
For yy=0 To 39
	For xx=0 To 39
		mask = spritemaskdata(xx,yy)
		If mask <>$ff0000 Then
		rgbs = spritedata(xx,yy)
		rgbd = ReadPixelFast(x+xx,y+yy,BackBuffer())
		r = (((rgbs Shr 16) And $ff) * transp) + (((rgbd Shr 16) And $ff) * transpn)
		g = (((rgbs Shr 8) And $ff) * transp) + (((rgbd Shr 8) And $ff) * transpn)
		b = ((rgbs And $ff) * transp) + ((rgbd And $ff) * transpn)
		If r > 255 Then r = 255
		If g > 255 Then g = 255
		If b > 255 Then b = 255
		If r < 0 Then r = 0
		If g < 0 Then g = 0
		If b < 0 Then b = 0
		rgbf = r Shl 16 + g Shl 8 + b
		If x+xx > -1 And x+xx < 640 And y+yy > -1 And y+yy < 480 Then WritePixelFast x+xx,y+yy,rgbf
		End If
	Next
Next 
UnlockBuffer BackBuffer() 
End Function


And before you use it put the rgb values of the sprite and mask into the arrays set up for it.


Phil Newton(Posted 2003) [#8]
If this is B+, try using the new lockedpixel etc commands.

I'll try and knock up an exampe to see if it's faster.


Imphenzia(Posted 2003) [#9]
I just tried to poke the R G B data into three different memory banks, then I peeked the values out of the banks and used writepixelfast to blit them with alpha alteration to the screen. This worked quite good.

I achieved around 30'000-35'000 alpha shaded pixels plus the drawing of the background in 640x480x32 per frame at 85Hz. That's around 2.7 million alpha pixels per second running on a 2.6GHz P4 with Radeon 9700 pro.

It could probably be tweaked however.

Downsides:
> Uses a lot of memory
> only allows alpha shading of background and not sprites ontop.
> takes time to create the memory banks
> probably more downsides too =)


cbmeeks(Posted 2003) [#10]

If this is B+, try using the new lockedpixel etc commands.



I saw that command but don't quite understand it. What's it for?

cb


AndyBoy_UK(Posted 2003) [#11]
Does that extended library (or any other for that matter) allow me to have a 2d image and then apply an alpha mask to it (so some parts of the image are more transparent than other sections) - for 2d shadow effects, etc?

Cheers,
A


cbmeeks(Posted 2003) [#12]
I don't think it does but I could be wrong.

Oh, and I found out what the lockedpixels do. In fact, I converted it to use lockedpixels and it almost doubled in speed! However, still too slow for full screen.

But, I think I will write a version for drawing sprites.

I really want to use it to create a "cloud" layer over my maps. That is going to be difficult.

-cb

;This is a test of advance blits using system memory
;cbmeeks


Const SCR_WIDTH = 640
Const SCR_HEIGHT = 480
Global Timer, FPS_Real, FPS_Temp,FPS
Global bank
Const FORMAT_RGB565=1
Const FORMAT_XRGB1555=2
Const FORMAT_RGB888=3
Const FORMAT_XRGB8888=4



;graphics
Graphics SCR_WIDTH,SCR_HEIGHT,16,1
SetBuffer BackBuffer()


;timer

;main loop
Repeat


	LockBuffer
	bank = LockedPixels()

	For y=0 To SCR_HEIGHT-1
		offset=y*LockedPitch()
	
		Select LockedFormat()
		Case FORMAT_RGB565
			For x=0 To 319
				PokeInt bank,offset+x*4,$f800f800
			Next
		Case FORMAT_XRGB1555
			For x=0 To 319
				PokeInt bank,offset+x*4,$7c007c00
			Next
		Case FORMAT_RGB888
			For x=0 To 639
				PokeInt bank,offset+x*4,$00ff0000
			Next
		Case FORMAT_XRGB8888
			For x=0 To 639
				PokeInt bank,offset+x*4,$00ff0000
			Next
		End Select
	Next

	UnlockBuffer

	DisplayFPS(0,0)
	
	Flip
	
Until KeyDown(1)



Function GetRGB% ( Red% , Green% , Blue% )  ; Combines Red, Green and Blue values into one RGB value
	Return Red Shl 16 + Green Shl 8 + Blue
End Function


Function DisplayFPS(x#,y#)
	Color 255,255,255
	If Timer + 1000 <= MilliSecs() Timer = MilliSecs() : FPS_Real = FPS_Temp : FPS_Temp = 0
	FPS_Temp = FPS_Temp + 1 : Text x#,y#,"FPS: " + FPS_Real
End Function



Imphenzia(Posted 2003) [#13]
cb, what FPS do you get on your machine? Your first example runs at 10FPS on mine, the second one at 4FPS??
(P4 2.6GHz, Radeon 9700)

I suspect it may have something to do with the version of B+? I'm using a quite old one, 1.34, that I received for testing CTCC in.


Who was John Galt?(Posted 2003) [#14]
Andyboy - the dlls in that link do indeed allow you to have alpha maps. Download it and take a look at the demos.


cbmeeks(Posted 2003) [#15]
[reply]
cb, what FPS do you get on your machine? Your first example runs at 10FPS on mine, the second one at 4FPS??
(P4 2.6GHz, Radeon 9700)
[/reply]

What?? If anything the second should be faster...

Using VSYNC, I get 47 fps on the first and 95 on the second. WHen I turn VSYNC off, I get over 200 on the second one.

When I do a:

repeat
    cls
    flip false
until keydown(1)


I get almost 1000 fps!

Clearly, plotting the screen pixel by pixel is much slower. Normally, you would never do this anyway. Much faster to copy rows of pixels or lines.

cb


Imphenzia(Posted 2003) [#16]
cbmeeks >> I am extremely interested in where things are going wrong on my machine! I can't get any performance out of B+ and writepixelfast at all. What version are you using??

Edit:
Hmm, I switched off debug mode and it went from 4FPS to 550 FPS ?!? I didn't know the debug mode made THAT much difference??


cbmeeks(Posted 2003) [#17]
I am using 1.37 the newest.

Hey, try this code out. I am curious what FPS you get.

Global Timer, FPS_Real, FPS_Temp,FPS

Graphics 640,480,16,1
SetBuffer BackBuffer()

Repeat
	Cls
	DisplayFPS(0,0)
	Flip False
Until KeyDown(1)

Function DisplayFPS(x#,y#)
	Color 255,255,255
	If Timer + 1000 <= MilliSecs() Timer = MilliSecs() : FPS_Real = FPS_Temp : FPS_Temp = 0
	FPS_Temp = FPS_Temp + 1 : Text x#,y#,"FPS: " + FPS_Real
End Function


I get about 1000 FPS on my P3 800 laptop.

cb


Imphenzia(Posted 2003) [#18]
3820 FPS in BlitzPlus
5200 FPS in Blitz2D


Paradox7(Posted 2003) [#19]
8070 FPS in Blitz2D

Athlon 2000+

Which is a 1.67ghz

... although maybe a pointless test :P


Paradox7(Posted 2003) [#20]
I can't see why Blitz can't have fast enough methods to atleast do alpha at a good speed, you shouldn't need a 3D Card for that.

I know the Main killer is ReadPixelFast, i've done many test with alpha, and the less ReadPixelFasts you do, the faster it runs, No matter if you have double the WritePixelFast, so the wpf is really Fast, but the rpf is really slow :( But as its stated, reading from video memory is slow. If only we could create a Buffer in System Memory, that we do all our drawing commands to just as easliy as we do the backbuffer

maybe instead of SetBuffer Backbuffer()
it can be, SetBuffer SysMemoryBuffer()

but after that, you use it just exactly the same as you would any other buffer, just drawimage's to it, writepixels, what not, and the flip still flips that sysmemorybuffer to the front screen, basically nothing would change, all the changes would be just internal that mark would have to setup, that all draw to system memory, instead of video memory.

The only thing is I don't know how fast it would be for the FLIP to take the image in System Memory, and Put it in The FrontBuffer()

If its fast, then all would be great! Then you could do really fast readpixels from system memory, do alphas, do all kinda magical pixel stuff now its all in sys memory :D

Although if the flip from System Memory to Front Buffer is slow, then its pointless :(


MSW(Posted 2003) [#21]
Whoah...hang on guys, think this through....the SNES was only capable of displaying something like 64 colors at one time...meaning it's a paletted basied graphics system with each pixel being only 4-bits in length (which the GPU then uses the current palette as a look up table to define the color visable on screen)

You arn't seeing true alpha blitting of sprites and such...it's a simple hack like trick of setting up the color palette in such a way as to use logical operators (OR, AND, XOR) when blitting the sprite to the background...say the background pixel is 0010 (or palette index 2) then you use the OR operator with a sprite pixel value of 0001...the result (0010 combined with 0001 useing OR) is 0011 (value of 3) and the color of palette index value 1 (the sprite) is bright red, index 2 (the background) is black...if you had made the color of palette index 3 as a dark red color (3 is the result of the OR operation on both background and sprite) then it would seem that the sprite was alpha blitted onto the background.

When you get into "true" color modes (16-bit, 24-bit, 32-bit, etc..) these sorts of palette tricks don't work the same way...however you can do a "virtual" palette type thing by createing an array to hold the palette values and doing all your blitting in software...then instead of transfering this one software blitted pixel at a time, you use the value of the software pixel as an index into the palette array, and write that color value to the screen buffer...this can increase the speed of your software blitting because you don't need to have each pixel be a direct 16-bit or 32-bit color value.


Paradox7(Posted 2003) [#22]
Yeah, I do miss OR, AND, XOR, including cool stuff like palette rotations, ect ect, if only blitz could do 8bit, and load palettes, and or,and,xor image commands. that alone would speed up such graphics that don't require 16bit colors.

But, as is, would a system like i discribed above work? Would flipping from a system memory buffer, to the front buffer be fast enough? Because if so, doing all graphics in a system memory with pixel commands would be much faster.


MSW(Posted 2003) [#23]
Depends on a number of factors...color depth and image resolution being the keys (a 800 by 600 32-bit color buffer uses approx 1.83MB...thats per full update of the frame, a whole lot of data to transverse from system to video memory...trying to do that at 60 frames per second requires about 110MB of data to transfer from system to video memory each second...not really possable even on modern hardware sense the system to video memory pathway lacks the bandwidth to do this...

However a 320 by 200 16-bit image transfered 60 times a second requires something like 7MB of bandwidth per second which is more reasonable.


MSW(Posted 2003) [#24]
Erm...what about createimage()?

Instead of useing a "true" system memory buffer...use a system memory "image" that you draw to then when finnished use the Blitz copyrect to transfer it to the video buffer?


Who was John Galt?(Posted 2003) [#25]
Just read that BlitzGL can also do 2D alpha sprites....