fastest HDR Filter

Blitz3D Forums/Blitz3D Programming/fastest HDR Filter

jfk EO-11110(Posted 2005) [#1]
Hi. I wonder what is the fastest HDR (aka Bloom aka Gloom) Filter emulation for Blitz3D?

Currently I am using a quad that covers the screen in blendmode 3, containing a blured 128 Texel texture that was made using Renderworld with a small CameraViewPort.

Rendering takes several millisecs, copyrect is pretty quick, somewhere around 1 millisecond. Bluring takes a few millisecs too, depending und the number of passes. I tried to optimize the burring hardly, but it still takes too much time IMHO.

Currently I do something like:

rgb(i,j)=((rgb(i-1,j) and $FEFEFF)shr 1) + ((rgb(i,j) and $FEFEFF)shr 1)
This will mix 2 Pixels 50:50. Since it doesn't need to split RGB, it's pretty quick, comparend to some other blur methods. I do this in 4 directions, the result is nicely blured.

But maybe there are faster methods?

I'd also like to know how to fix the flickerng that is caused by this method.
BTW. I cannot use more blur passes and a bigger texture, it would be too slow. This is not for a fancy FX Demo with a couple tris in a scene, running on a 3 Ghz machine, but for the average pc running a full game level.


Sledge(Posted 2005) [#2]
Faster? Don't blur the image manually; render a smaller texture. Maybe additionally use, say, three dimmer quads about the origin of the one you're currently using.

The method you cite shouldn't flicker - quad too close to the camera?

EDIT: When you say "blur" do you mean "mix frames"? I'd probably be sending the last - rather than current - frame to the quad and that would be that. Extra frames would equal extra quads... I wouldn't do pixel operations on the textures/images.


jfk EO-11110(Posted 2005) [#3]
Thanks. Rendering a smaller texture will make it flicker more. The flickering is caused by the lack of antialiasing, so the Mini-Render is never fully accurate, even when blured. Contours of Objects will jump, and so will the blured "smooth" areas.

I was also thinking about to use more dimmer quads, but since I would need to alpha them, this means also the Renderer has to go trough a number of Alpha Objects, and that IS slow, especially when they cover the entire screen.

Ok, now I had this idea:

1) Render the scene as usual, without the gloom quad.
2) use some tricks to create a mini Render without actually to render the scene.
3) use this minirender as a texture on a quad.
4) render the quad far away from the scene with an extra camera, but (and that's the point) without to erase the background first. I'm not sure right now if blendmode 3 will also add to 2D backround pixels, I will check this out.


Sledge(Posted 2005) [#4]

but (and that's the point) without to erase the background first



A mid-alpha background quad, with the correct camera cls mode, will give a very interesting effect - I use this as a cheap fullscreen blur (which occurs behind all rendered objects) and I suppose you could build a smaller render up into a bloom (to be plastered on top of the scene) in a similar fashion. The only drawback that springs to mind is that delta-timing/frame-skipping tends to make it look like a horrifically obvious series of persistent images rather than the smoothly blended ideal.


The flickering is caused by the lack of antialiasing


That makes sense - I had not considered high-contrast scenes... a considered palette can hide a multitude of sins. Perhaps entitycolor or even SetGamma could even out the lighting on that render.


sswift(Posted 2005) [#5]
A good method of blurring is to use a variable size window and step across the image vertically and horizontally, adding each new pixel that enters the window (which is only one pixel tall when doing the horizontal blur) and subtracting the one that just exited. You need variable size, because the window will be less than whatever your window size is when you are at the edge of the image.

Doing this requires only one add, subtract, and divide per pixel, and two passes, one for each direction, for a box blur.

For a gaussian blur, you need three passses in each direction.

However, blurring in software is going to be slow no matter what. You really should do it in hardware. And for that, I have a function in the code archives which does a hardware blur. Just click my username and then code archive entries to find it. It's not perfect, but it might be adequate for this.

Knowing now though how to do a box blur and a gaussian blur with that window method, I wonder if I could have made a proper hardware blur. Hm... I'll have to think about that.


sswift(Posted 2005) [#6]
Hm... yes, it looks like you could do a real gaussian blur in hardware using a similar technique to the flawed one I used. You would have to choose window sizes which are powers of 2 though. So you could do a 2 pixel box/cubic/gaussian, or a 4 or an 8... But not a 5 or 6. The greater the window size however, the more color accuracy you will lose. I am not sure if this would affect the quality of the resulting blur or not.


jfk EO-11110(Posted 2005) [#7]
Sledge, thanks. There are no delta timing issues since I have no tween frames, the gloom texture is taken every frame. I have tested the method I described and it really works, althouh all in all about 4% slower than the current method.

sswift - not exactly sure if you are talking about the blitz window size - there's no way for me to use a square window size since the game is running in fullscreen and usually in a 4:3 ratio. I tried to find a blur function in your code archive entries, unfort. without success. EDIT: just found it, bluring a texture. I will have to try if this is faster. It could easily be faster, even with the overhead (or what's looking like it) because my current blur function takes 7ms for one pass on my machine (additional passes only take about 1ms since readpixel and writepixel is done only once, no matter how many passes).

Ok, here are the two methods that are head on head right now, concerning speed. If anybody can optimize the speed in a relevant amount, do it!

The first method works like this:
It's taking a small render with the Gloom Quad hidden, 128*96 Pixels. This is then blured and then copied to the texture that is used by the gloom quad. Then the Gloom Quad is unhidden and a full resolution game render is performed. Blendmode 3 of the gloom quad that is sticked to the camera, in front of everything else will now produce the gloom Effect.

In a Scene with 500 cubes onscreen I get about 9ms per frame with no gloom and about 24ms with the gloom turned on (hit space to toggle).
Graphics3D 1024,768,32,1
SetBuffer BackBuffer()

camera=CreateCamera()

light=CreateLight()
RotateEntity light,45,45,45
TranslateEntity camera,0,0,-45
CameraRange camera,.001,100

anz=500
Dim s(anz)
For i=0 To anz
 s(i)=CreateCube()
 PositionEntity s(i),Rnd(-20,20),Rnd(-20,20),Rnd(-20,20)
 EntityColor s(i),Rand(255),Rand(255),Rand(255)
Next

gloo_w#=128
gloo_h#=gloo_w#*(Float(GraphicsHeight())/Float(GraphicsWidth())); 0.75
Dim proc_array(gloo_w#,gloo_h#)
tex=CreateTexture(gloo_w#,gloo_w#,256); Or 16 Or 32)
quad=CreateQuad()
PositionEntity quad,EntityX(camera,1),EntityY(camera,1),EntityZ(camera,1),1
EntityTexture quad,tex
EntityFX quad,1
EntityBlend quad,3
TranslateEntity quad,-(1.0/1024.0),(1.0/1024.0),1.0
;EntityColor quad,0,255,0


gloo_wrel#=GraphicsWidth()/gloo_w
gloo_hrel#=GraphicsHeight()/gloo_h

; ************************************ MAIN ***********************************
While KeyDown(1)=0

 For i=0 To anz
  TurnEntity s(i),1,2,3
 Next


If KeyHit(57)
 do_gloom=do_gloom +1
 If do_gloom>1 Then do_gloom=0

  If do_gloom=0 Then 
   HideEntity quad
  EndIf

  If do_gloom=1 Then 
   ShowEntity quad
   EntityAlpha quad,0.95 ;1.0
  EndIf
 EndIf

 If do_gloom=1
  CameraViewport camera,0,0,gloo_w#,gloo_h#
  HideEntity quad
  RenderWorld()
  quick_blur(gloo_w#,gloo_h#,2)
  ShowEntity quad
  CopyRect 0,0,gloo_w#,gloo_h#,0,(gloo_w#-gloo_h#)/2,BackBuffer(),TextureBuffer(tex)
  CameraViewport camera,0,0,GraphicsWidth(),GraphicsHeight()
 EndIf


 RenderWorld()
 old_tt=tt
 tt=MilliSecs()
 Text 0,0,tt-old_tt
 Text 0,16,do_gloom
 Flip 0
Wend

End


Function quick_blur(w,h,passes=1)
 If passes<1 Then passes=1
 passes=passes-1
 SetBuffer BackBuffer()
 LockBuffer(BackBuffer())
 For j=0 To h-1
  For i=0 To w-1
   proc_array(i,j)=ReadPixelFast(i,j) ;And $FFFFFF
  Next
 Next
 For dbass=0 To passes
  For j=0 To h-1
   For i=1 To w-1
    proc_array(i,j)=((proc_array(i,j)And $FEFEFF)Shr 1)+((proc_array(i-1,j)And $FEFEFF)Shr 1)
   Next
   For i=w-2 To 0 Step -1
    proc_array(i,j)=((proc_array(i,j)And $FEFEFE)Shr 1)+((proc_array(i+1,j)And $FEFEFE)Shr 1)
   Next
  Next
  For i=0 To w-1
   For j=1 To h-1
    proc_array(i,j)=((proc_array(i,j)And $FEFEFF)Shr 1)+((proc_array(i,j-1)And $FEFEFF)Shr 1)
   Next
   For j=h-2 To 0 Step -1
    proc_array(i,j)=((proc_array(i,j)And $FEFEFE)Shr 1)+((proc_array(i,j+1)And $FEFEFE)Shr 1)
   Next
  Next
 Next
 
 For j=0 To h-1
  For i=0 To w-1
   WritePixelFast i,j,proc_array(i,j)
  Next
 Next
 UnlockBuffer(BackBuffer())
End Function


Function CreateQuad(parent=0)
  ; creates a quad, facing to the right side
  mesh=CreateMesh(parent)
  surf=CreateSurface(mesh)
  v0=AddVertex(surf, -1.0,   1.0,0, 0,0 )
  v1=AddVertex(surf,  1.0,   1.0,0, 1,0 )
  v2=AddVertex(surf,  1.0,  -1.0,0, 1,1 )
  v3=AddVertex(surf, -1.0,  -1.0,0, 0,1 )
  AddTriangle(surf,v0,v1,v2)
  AddTriangle(surf,v0,v2,v3)
  UpdateNormals mesh
  Return mesh
End Function



Now method 2 that I described eariler actually works like this:
Render the Scene in full game resolution, as usual. Then save a copy of the backbuffer in an imagebuffer. Then use a special trick to quickly scale the backbuffer to 128*96 Pixels. This trick is using Copyrect. It's fast, but not very accurate. However, after bluring the result is about the same as the method with the small extra render. Ok, then blur this 128*96 pixels on the backbuffer, then move them to a texture that is used by the gloom quad. Then restore the backbuffer that was saved in an imagebuffer before. Now hide the main game camera and unhide a special gloom camera that is located far away from any geometry of the scene. The gloom quad is also located there, adjusted to the gloom camera. The gloom camera is using a clsmode that won't clear the background. So we still have the last full render as background. The gloom cam will now render only one quad, with the old the background.

This method is slighty slower than the first one, but it has some potential, because one of the two renders has a very easy job (onyl one quad). At the other hand both renders are fullscreen renders, compared to the first method that is using a 128*96 pixels render for the second one.
My machine takes about 9ms without gloom and 25ms with gloom, as you can see, this is only one millisecond more than the other method.
Graphics3D 1024,768,32,1
SetBuffer BackBuffer()


gloomcam=CreateCamera()
CameraClsMode gloomcam,0,1
CameraRange gloomcam,1,10
PositionEntity gloomcam,30000,0,30000
HideEntity gloomcam

camera=CreateCamera()

light=CreateLight()
RotateEntity light,45,45,45
TranslateEntity camera,0,0,-45
CameraRange camera,.001,100


anz=500
Dim s(anz)
For i=0 To anz
 s(i)=CreateCube()
 PositionEntity s(i),Rnd(-20,20),Rnd(-20,20),Rnd(-20,20)
 EntityColor s(i),Rand(255),Rand(255),Rand(255)
Next




gloo_w#=128
gloo_h#=gloo_w#*(Float(GraphicsHeight())/Float(GraphicsWidth())); 0.75
Dim proc_array(gloo_w#,gloo_h#)
tex=CreateTexture(gloo_w#,gloo_w#,256); Or 16 Or 32)
quad=CreateQuad()
PositionEntity quad,EntityX(gloomcam,1),EntityY(gloomcam,1),EntityZ(gloomcam,1),1
EntityTexture quad,tex
EntityFX quad,1
EntityBlend quad,3
TranslateEntity quad,-(1.0/1024.0),(1.0/1024.0),1.0
EntityParent quad, gloomcam
;EntityColor quad,0,255,0


gloo_wrel#=GraphicsWidth()/gloo_w
gloo_hrel#=GraphicsHeight()/gloo_h

gloombuffer=CreateImage(GraphicsWidth(),GraphicsHeight())

; ************************************ MAIN ***********************************
While KeyDown(1)=0

 For i=0 To anz
  TurnEntity s(i),1,2,3
 Next


 If KeyHit(57)
  do_gloom=do_gloom +1
  If do_gloom>1 Then do_gloom=0

  If do_gloom=0 Then 
   HideEntity quad
   HideEntity gloomcam
  EndIf

  If do_gloom=1 Then 
   ShowEntity quad
   EntityAlpha quad,0.95 ;1.0
  EndIf
 EndIf


 RenderWorld()

 If do_gloom=1
  CopyRect 0,0,GraphicsWidth(),GraphicsHeight(),0,0,BackBuffer(),ImageBuffer(gloombuffer)
  i=0
  ifloat#=0
  For ifloat=0 To GraphicsWidth() Step 0.0001
   CopyRect ifloat,0,1,GraphicsHeight(),i,0,BackBuffer(),BackBuffer()
   ifloat=ifloat+gloo_wrel#
   i=i+1
  Next
  i=0
  ifloat#=0
  For ifloat=0 To GraphicsHeight() Step 0.0001
   CopyRect 0,ifloat,GraphicsWidth(),1,0,i,BackBuffer(),BackBuffer()
   ifloat=ifloat+gloo_hrel#
   i=i+1
  Next
  quick_blur(gloo_w#,gloo_h#,2)
  CopyRect 0,0,gloo_w#,gloo_h#,0,(gloo_w#-gloo_h#)/2,BackBuffer(),TextureBuffer(tex)
  CopyRect 0,0,GraphicsWidth(),GraphicsHeight(),0,0,ImageBuffer(gloombuffer),BackBuffer()

  HideEntity camera
  ShowEntity gloomcam
  RenderWorld()
  ShowEntity camera
  HideEntity gloomcam
 EndIf


 old_tt=tt
 tt=MilliSecs()
 Color 255,255,255
 Text 0,0,tt-old_tt
 Text 0,16,do_gloom
 Flip 0
Wend

End


Function quick_blur(w,h,passes=1,buffer=0)
 If passes<1 Then passes=1
 passes=passes-1
 If buffer=0 Then buffer = BackBuffer()
 SetBuffer Buffer
 LockBuffer(Buffer)
 For j=0 To h-1
  For i=0 To w-1
   proc_array(i,j)=ReadPixelFast(i,j) ;And $FFFFFF
  Next
 Next
 For dbass=0 To passes
  For j=0 To h-1
   For i=1 To w-1
    proc_array(i,j)=((proc_array(i,j)And $FEFEFF)Shr 1)+((proc_array(i-1,j)And $FEFEFF)Shr 1)
   Next
   For i=w-2 To 0 Step -1
    proc_array(i,j)=((proc_array(i,j)And $FEFEFE)Shr 1)+((proc_array(i+1,j)And $FEFEFE)Shr 1)
   Next
  Next
  For i=0 To w-1
   For j=1 To h-1
    proc_array(i,j)=((proc_array(i,j)And $FEFEFF)Shr 1)+((proc_array(i,j-1)And $FEFEFF)Shr 1)
   Next
   For j=h-2 To 0 Step -1
    proc_array(i,j)=((proc_array(i,j)And $FEFEFE)Shr 1)+((proc_array(i,j+1)And $FEFEFE)Shr 1)
   Next
  Next
 Next
 
 For j=0 To h-1
  For i=0 To w-1
   WritePixelFast i,j,proc_array(i,j)
  Next
 Next
 UnlockBuffer(Buffer)
End Function


Function CreateQuad(parent=0)
  ; creates a quad, facing to the right side
  mesh=CreateMesh(parent)
  surf=CreateSurface(mesh)
  v0=AddVertex(surf, -1.0,   1.0,0, 0,0 )
  v1=AddVertex(surf,  1.0,   1.0,0, 1,0 )
  v2=AddVertex(surf,  1.0,  -1.0,0, 1,1 )
  v3=AddVertex(surf, -1.0,  -1.0,0, 0,1 )
  AddTriangle(surf,v0,v1,v2)
  AddTriangle(surf,v0,v2,v3)
  UpdateNormals mesh
  Return mesh
End Function


You might think there is no reason to copy the backbuffer to an magebuffer, then scale the backbuffer, then restore it from the imagebuffer, instead of simply scaling the imagebuffer instead.

The reason why is, for some reason, doing the scaling in the backbuffer is much faster than in the imagebuffer. Even with the additional restoring of the full backbuffer it's still faster.

So how could I make this faster? Please note, using a smaller Screen resolution or gloom texture is no option. Also, when using only one blur pass instead of 2, you will see the flicker effect I mentioned.


jfk EO-11110(Posted 2005) [#8]
Well I have to say, when you increase the number of cubes to a value that will bring your graphicscard to its knees, then you'll see method 2 is noneless a lot faster! Simply because a lot of geometry takes time to render, no matter how small the CameraViewport is.


jfk EO-11110(Posted 2005) [#9]
Now I just tried to use the bank hack by Tom. Redirecting a bank handle to an imagebuffer allows to access the pixels by PeekInt etc. This way I was hoping to be able not to use the timeconsuming Readpixelfast and Writepixelfast that is used in the blur function.
It really works, I was able to access the pixels using Peekint and Pokeint. Unfortunately the whole process has about the same speed as method 2.

BTW when you use the bank hack and want to access the pixes via the bank handle, you have to Lockbuffer the Imagebuffer of the Image that is the target of the redirected bank. If you don't, there is some missing Data.

well now it seems using the bank hack won't speed thing up, so I don't post this method.


jfk EO-11110(Posted 2005) [#10]
Ok, using 500 Cubes didn't show the diffrences clearly, here's some more significant statistics using 1500 Cubes instead:

method 1:
gloom off: 25ms
gloom on : 57ms

method 2:
gloom off: 25ms
gloom on : 46ms

sswifts hw-accelerated method:
gloom off: 25ms
gloom on : 63ms
( after simply pasting the function to the test app it also didn't scale the texture right, guess I have to fiddle a bit until it fits the screen). I think it's still the overhead of creating a camera and having 4 sprites. (of course I used the lowest blur quality and radius: 1,1)EDIT. ok after resetting the texture scale and position to 1.0 the texture fits the screen again. But it seem this blur method is not very useful for the gloom effect for several reasons.


jfk EO-11110(Posted 2005) [#11]
Now it seems method 2 that is the fastest by now makes some troubles on patterns like eg brick walls. since it does not scale the image, but uses every Nth pixel, moire interference patterns appear on the gloom texture. A bit like silk in TV. (smoothing cannot fix this)

Unfortunately this wasn't obvious in the test app.
So I need a better way to scale. Message to myself: gotta buy a cell phone, so people don't think I'm a freak when I'm talking to myself :)


jfk EO-11110(Posted 2005) [#12]
Latest news: I have a new method that is using a 3rd camera to scale the backbuffer. It's the ScaleImageFast function that I used to post in the archives some time ago. I get about 50ms, that's almost as fast as the method 2, but this time with relative smooth scaling. The only problem is: since the texture that is used to "scale" the image needs to use flag 256 to allow fast copyrect transfer, mipmapping is turned off. With Mipmapping I'd get a much more smoothly scaled image, but it seems I can't get both.

Probably I will use a 256 texel texture, 4 or more blur passes and then simply tell the player to turn off the HDR FX if he feels like the framerate is too low. well I'm not happy with this. Then again, my Graphicscard is worse than what's listed in the category "beginner machine" of some game test magazines. Probably I shouldn't care so much about low end playability.


Sledge(Posted 2006) [#13]

So I need a better way to scale. Message to myself: gotta buy a cell phone, so people don't think I'm a freak when I'm talking to myself :)...The only problem is: since the texture that is used to "scale" the image needs to use flag 256 to allow fast copyrect transfer, mipmapping is turned off. With Mipmapping I'd get a much more smoothly scaled image, but it seems I can't get both.



Dude it's New Year - people are drunk. I have been throwing some ideas around for this and how to effect a fast scale seems destined to be a barrier. My thinking is to carry the scene on a texture and scale it by re-rendering a only a single quad that carries it, after which recapture.


jfk EO-11110(Posted 2006) [#14]
Hi Sledge. No problem. It looks so silly when a thread is full of my postings only.

I tried that method too. The backbuffer is copied to a texturebuffer that is used by a sprite, then the sprite is scaled as required. It works, and it's pretty fast. But since the texturebuffer needs to use flag 256, the texture isn't mipmapped, and the result is once again the flickering interferences I mentioned due to non-smooth scaling.

At the other hand the socalled "method 1" (taking an additional mini render of the scene) will probably give some very decent, non-flickering results, depending on the users rendering settings. If he has turned on antialiasing and HiQuality, the minirender should be pretty good. Unfortunately this also means you don' know how't it's gonna look exactly.


jhocking(Posted 2006) [#15]
HDR isn't the same as bloom.


Bouncer(Posted 2006) [#16]
On my machine, my bloomfilter (the one in my sig, which uses modified SSwift's hw blur) is faster than your method... and I have a laptop with radeon xpress 200m, but a pretty fast processor... so on a decent vidcard it's a lot faster than using software blur.


Pinete(Posted 2006) [#17]
wow!
It seems impressive!
could you share your bloom, Bouncer?

regards!


gosse(Posted 2006) [#18]
uhh, its right in his sig. with the source.


Mustang(Posted 2006) [#19]
HDR

High Dynamic Range. This is a lighting procedure designed to emulate the way that light levels in the real world vary over an enormous range. This is mostly achieved by the use of floating point textures and render targets (as well as using the appropriate lighting algorithm); integer formats do not offer the anywhere near the same range of values. Although visually better, the use of floating point formats can result in a large performance impact on some graphics adapters.


BLOOM (GLOW)

A bloom is a lenticular halo. Blooms are another type of effect seen around lights. While coronas form the fuzzy glow you see around a light at night, or the rays which seem to shoot out from the light of the sun, blooms are a color-banded halo which is usually visible surrounding the corona. They are only seen when a person's pupils are dilated enough.

(And lens flares are a natural phenomena that occurs when light is refracted many times by some sort of lens, creating several images of geometric shapes to appear from a single light source. Given that fact, you'll find that lens flares are often over used and appear where they ought not to in 3D graphics. In other words, you need a compund lens system for a lens flare. They aren't seen purely with the naked eye.)


Shifty Geezer(Posted 2006) [#20]
Actually Integers can handle HDR very well if you use them right. The upcoming Heavenly Sword title for PS3 features a new colour-space created with a conversion step at the end of the pixel shaders, that offers improved quality (less banding and other artefacts in most situations according to the devs) at half the bandwidth consumption of FP16. Traditional FP solutions provide a rather false representation of HDR, as they provide high range for RGB components, whereas HDR specifically relates to a wider range of intensities, and not hues.

I wouldn't be surprised if alternative INT HDR methods become more commonplace seeing as they trade a little shader power for considerable bandwidth gains. Though of course this isn't any help to Blitz3D!


Warren(Posted 2007) [#21]
Bouncer

I was trying to throw your bloom code into a project of mine and the bloom sprite shows up but it's a tiny square in the middle of the window. It looks like it has the right contents but it's sized incorrectly.

I didn't mess with the default values in BloomFilter.bb so I'm confused as to what I might be doing wrong...


Danny(Posted 2007) [#22]
Hi Warren, check the other thread on misalignment for a zoom-fix ;)


bytecode77(Posted 2007) [#23]
thats my version
http://dc.chat-blitz.de/upload/files/Devils%20Child/HDR.zip

it uses sswifts blurtexture function


Pinete(Posted 2007) [#24]
Thanks for sharing Devil!!
:)

regards!


QuickSilva(Posted 2007) [#25]
Slightly off topic but still bloom related. Is it possible to use the drawimage command to place a shape, image etc... and still have it bloomed like a 3D object such as a teapot in bouncers example?

This has puzzled me for a while now and think it should be possible in theory but I just cannot get my head around how it could be done. I do not really want to resort to using sprites as I want to retain the collision commands of the standard drawimage functions but I do not mind having a mixture of 2D and 3D although I know this can cause issues on some cards.

Any help would be great.

Jason.


IPete2(Posted 2007) [#26]
Devils,

Beauty mate!

IPete2.