fastest way to convert a bank to a texture?

Blitz3D Forums/Blitz3D Programming/fastest way to convert a bank to a texture?

Serpent(Posted 2010) [#1]
Hello everyone. This crosses over with another topic that I made earlier but is a completely separate thing. I have a screenshot function which returns a bank of BGR values for the pixels of the image. I was wondering what the fastest possible way of converting this data into a texture would be. Using WritePixelFast, I can get the time down to just under a second. However, ideally, this process should happen instantly - extremely quickly.

Does anyone know a better way to do this?


jfk EO-11110(Posted 2010) [#2]
The bottleneck of Texturebuffers is the directX texture upload handler. You may use the Texture Flag 256, this will speed certain things significantly up. The best you can do IMHO is to CopyRect from BackBuffer to a texturebuffer with the 256 Flag set. Not sure if it's a problem to have to BGR values on the backbuffer.

The Flag 256 should be used carefully, it makes some other things slower, run the following code to see its pros and cons:

Graphics3D 800,600,32,2
SetBuffer BackBuffer()

tex1=CreateTexture(256,256,0) 
tex2=CreateTexture(256,256,256) 
tex3=CreateTexture(256,256,0) 
tex4=CreateTexture(256,256,256) 


Text 0,0, "testing readpixelfast without flag 256"
t1=MilliSecs()
SetBuffer TextureBuffer(tex1)
LockBuffer()
For i=0 To 1000000
 argb=ReadPixelFast(10,10)
Next
UnlockBuffer()
SetBuffer BackBuffer()
t2=MilliSecs()
Text 0,16, (t2-t1)+" ms"


Text 0,32, "testing readpixelfast with flag 256"
t1=MilliSecs()
SetBuffer TextureBuffer(tex2)
LockBuffer()
For i=0 To 1000000
 argb=ReadPixelFast(10,10)
Next
UnlockBuffer()
SetBuffer BackBuffer()
t2=MilliSecs()
Text 0,48, (t2-t1)+" ms"


Text 0,64, "testing writepixelfast without flag 256"
t1=MilliSecs()
SetBuffer TextureBuffer(tex1)
LockBuffer()
For i=0 To 1000000
 WritePixelFast 10,10,0
Next
UnlockBuffer()
SetBuffer BackBuffer()
t2=MilliSecs()
Text 0,80, (t2-t1)+" ms"


Text 0,96, "testing writepixelfast with flag 256"
t1=MilliSecs()
SetBuffer TextureBuffer(tex2)
LockBuffer()
For i=0 To 1000000
 WritePixelFast 10,10,0
Next
UnlockBuffer()
SetBuffer BackBuffer()
t2=MilliSecs()
Text 0,112, (t2-t1)+" ms"


n=200

Text 0,128, "testing copyrect, flagwise 0 to 256"
t1=MilliSecs()
For i=0 To n
 CopyRect 0,0,256,256,0,0,TextureBuffer(tex1),TextureBuffer(tex2)
Next
t2=MilliSecs()
Text 0,144, (t2-t1)+" ms"


Text 0,160, "testing copyrect, flagwise 0 to 0"
t1=MilliSecs()
For i=0 To n
 CopyRect 0,0,256,256,0,0,TextureBuffer(tex1),TextureBuffer(tex3)
Next
t2=MilliSecs()
Text 0,176, (t2-t1)+" ms"


Text 0,192, "testing copyrect, flagwise 256 to 256"
t1=MilliSecs()
For i=0 To n
 CopyRect 0,0,256,256,0,0,TextureBuffer(tex2),TextureBuffer(tex4)
Next
t2=MilliSecs()
Text 0,208, (t2-t1)+" ms"


Text 0,224, "testing copyrect, flagwise 256 to 0"
t1=MilliSecs()
For i=0 To n
 CopyRect 0,0,256,256,0,0,TextureBuffer(tex2),TextureBuffer(tex1)
Next
t2=MilliSecs()
Text 0,240, (t2-t1)+" ms"


Text 0,256, "testing copyrect, flagwise backbuffer to 256"
t1=MilliSecs()
For i=0 To n
 CopyRect 0,0,256,256,0,0,BackBuffer(),TextureBuffer(tex2)
Next
t2=MilliSecs()
Text 0,272, (t2-t1)+" ms"


Text 0,288, "testing copyrect, 256 to backbuffer"
t1=MilliSecs()
For i=0 To n
 CopyRect 0,0,256,256,0,0,TextureBuffer(tex2),BackBuffer()
Next
t2=MilliSecs()
Text 0,304, (t2-t1)+" ms"



Text 0,320, "testing copyrect, flagwise backbuffer to 0"
t1=MilliSecs()
For i=0 To n
 CopyRect 0,0,256,256,0,0,BackBuffer(),TextureBuffer(tex1)
Next
t2=MilliSecs()
Text 0,336, (t2-t1)+" ms"



Text 0,352, "testing copyrect, 0 to backbuffer"
t1=MilliSecs()
For i=0 To n
 CopyRect 0,0,256,256,0,0,TextureBuffer(tex1),BackBuffer()
Next
t2=MilliSecs()
Text 0,368, (t2-t1)+" ms"










Flip
WaitKey()
End






Serpent(Posted 2010) [#3]
nice test program there thanks jfk. Okay using copyrect between two buffers in vram should be extremely fast (and is extremely fast). The problem is before I can copyrect or something, I have to go through all of the values of each pixel in the bank and WritePixelFast them to a buffer.

Basically, I was wondering if anyone knows of a way to transfer the values of pixels straight from a bank to a texture - from one memory location to another - to save the 700ms my screenshot function takes to write every pixel to a buffer.

Thanks for this example though JFK, it really shows the advantages of storing textures in vram - <1ms for a vram copyrect vs. 70ms for a non vram texture.


Leon Drake(Posted 2010) [#4]
savebank then loadimage?


Serpent(Posted 2010) [#5]
Leon, this would work - but you have to put in header information for the bitmap. However, I'm looking for a way to skip the whole 'save as file' then 'load file' thing as there should be a way to copy the data from a bank directly to the memory location of a blitz image.

I'm experimenting with 'RtlMoveMemory' (Kernel32.dll), however am having considerable problems.


Serpent(Posted 2010) [#6]
Does anybody know what format Blitz images are stored as in memory? Or better still textures or buffers?


Yasha(Posted 2010) [#7]
Take a look here.

However, I did some experimenting with that method a while back (using FastPointer), and discovered that it really doesn't seem to like the idea of you trying to access unlocked buffers. As far as I can tell, locking the buffer (and then unlocking it) copies the data into main memory (ie. the image data for an unlocked, usable buffer does not reside in main memory) and once it's been locked, writing to it doesn't seem to be significantly faster than just poking bytes; the speed hit appears to come from uploading it again when it's unlocked.


Ross C(Posted 2010) [#8]
Isn't there a way of pointing a variable to the location of the memory where the bank is stored?


Charrua(Posted 2010) [#9]
FastPointer?

http://fastlibs.com/libraries.php

Juan


Serpent(Posted 2010) [#10]
Thanks Yasha - useful info.
Charrua - thanks for posting in both crossed over topics
Ross - That's exactly what I'm trying to do


Charrua(Posted 2010) [#11]
yasha post the link that i was talking about!
thank's

Juan


Serpent(Posted 2010) [#12]
Finally, after literally thousands of MAVs and code modifications, I am having some success with moving a block of memory from a bank to a texture:




There are a few issues with this.

First of all, from what I can tell, the format of a 32-bit buffer in memory requires data in ARGB form (well at least according to this website). However, when I write the colour values to the buffer, they must be written in BGRA format.

Secondly, as you can see by running the code, the green stops at just past 4 fifths of the way down the screen. I am unsure as to why.

If anyone can shed light on these issues it would be really helpful.


Charrua(Posted 2010) [#13]
ARGB is a 32 bit variable and due to "little endian" storage of variables in memory the individual bytes are stored in what most of us think is unordered, when you access them byte by byte you get it from top down, like when you print a document with 4 sheets of paper.

Then ARBG 4 bytes are stored (little end first) G, B, R, A.

(any variable of more than 1 bytes are stored this way in all intel compatibles)

and i can't run the demo, i supose that MoveMemoryIntObj is a form of rtlMoveMemory?, but isn't included.

Juan


Serpent(Posted 2010) [#14]
oh sorry forgot to post .decls file. Basically, I have the different forms of rtlMoveMemory in the .decls because if you make one of the variables in the declaration an object (by adding a * to the end rather than %), you can specify a blitz bank as one of the parameters and blitz will pass the memory location of the data to the function. Basically, for moving memory to and from banks with ease.

.decls file:



Charrua(Posted 2010) [#15]
i do the same (many decls for a func, as needed)
have not time now to test.., probably at night.

Juan


Yasha(Posted 2010) [#16]
Slightly offtopic but I'd just like to point out an insanely, brilliantly useful thing that not a lot of people seem to realise:

You can also use the * datatype specifier for passing Blitz custom type objects to DLLs, which has the same effect as passing a pointer to an identically-constructed C struct. If you're dealing with data in a fixed format this is both faster and tidier (on both sides) than passing a bank, although it's not type-checked on the Blitz end.

eg.
Blitz3D:
;decls
.lib "forExample.dll"

myExampleFunction(obj*) : "_myExampleFunction@4"

;body

Type myType
    Field a
    Field b#
End Type

Local obj.myType=New myType
obj\a = 5
obj\b = 7.2

myExampleFunction(obj)

End

C:
typedef struct _myType {
    int a;
    float b;
} myType;

extern void _stdcall myExampleFunction(myType * obj)
{
    myIntFunction(obj->a);
    myFloatFunction(obj->b);
}


If you then store that pointer, because it points to the same object as the Blitz handle, you can change the field values and update the DLL without the need for another function call! Super useful!


Charrua(Posted 2010) [#17]
man, you are crasy!
as always, useful info

Juan


puki(Posted 2010) [#18]
What where the timescales in Millisecs for "jfk's" code? I've not run it and I've not read the thread as image stuff doesn't interest me; however we found direct access to all images and buffers back in 2003. "jfk" was very interested in it at that point in time, so I am not sure if what he is talking about is in relation to "MrCredo's" findings back in 2003. Back then a 100*ReadPixelFast with a 400x197 jpg was dropped from 9800ms to 320ms - that speed got lower and lower though with optimisation. The lowest reported result I am aware of was 48ms.

What I mention may be of no use as I haven't really read the thread, but I thought I'd mention it (in case it jogs memories).


Serpent(Posted 2010) [#19]
puki - the thread sounds interesting. Do you have a link to it or something? I'll try to find it.

Yasha - yeah passing types as objects into functions is really useful. The one problem with this in blitz though is that it doesn't support a range of data types - this has made everything a lot more complicated for me.


Serpent(Posted 2010) [#20]
Everything seems to be working, but buffers are stored with some kind of interlacing in memory which will be difficult to work out. It would be greatly appreciated if anyone could offer help, but nobody seems to have bothered with something as unnecessarily complicated and on the whole practically useless thing like this :P


jfk EO-11110(Posted 2010) [#21]
I don't recall something by MrCredo, but I remember I posted some code in the archives that patched a bank, making it think its adress-space was within an existing image buffer. This way I was able to poke right into an image. Unfortunately, this wasn't faster than Readpixefast with Lockbuffer, for some reason. I dropped the idea when I realized I cannot overcome this upload bottleneck this way.

BTW it makes a huge diffrence if you want to poke to an image buffer, or to the content of a texture. Also, but not only because of Mipmapping, that (afair) holds several copies of various size in memory, if I get this right. Those structures may also be something, that isn't exactly the same on all machines and systems. Eg. Mipmapping may be something that is organized by the Card rather individually. Someone correct me if I'm wrong.

If you're still after a solution then consider: Yes, Writepixelfast to a texturebuffer is slow, but Writepixelfast to the backbuffer may be much faster (make a test!). If so, you only need to copy it from backbuffer to the 256-flag texture, that's fast.

You may also think about to use Renderworld for some 2D Pixel manipulation, eg. to darken/brighten or mix things etc.

I recently had to mix 2 textures. Renderworld with two semitransparent quads, far away from the main scene and a special camera, it was done in under 1 ms.

Puki:
testing readpixelfast without flag 256
34 ms
testing readpixelfast with flag 256
665 ms
testing writepixelfast without flag 256
38 ms
testing writepixelfast with flag 256
38 ms
testing copyrect, flagwise 0 to 256
300 ms
testing copyrect, flagwise 0 to 0
211 ms
testing copyrect, flagwise 256 to 256
40 ms
testing copyrect, flagwise 256 to 0
3550 ms
testing copyrect, flagwise backbuffer to 256
44 ms
testing copyrect, 256 to backbuffer
41 ms
testing copyrect, flagwise backbuffer to 0
3583 ms
testing copyrect, 0 to backbuffer
309 ms




Serpent(Posted 2010) [#22]
jfk - I have actually done all of my testing with the backbuffer. I have also tested it with a texturebuffer, and I think that everything runs at the same speed - in fact it should because both are just buffers, identically structured in memory.
The information on RenderWorld is useful - I never realised it could actually be a fast operation.
The idea of patching a bank sounds interesting. I'll look into it further if what I'm trying now doesn't work. Perhaps one could even patch a texture so that the pointer to the actual pixels is the data from a bank! This would be useful for my purposes.

About mipmapping - I haven't seen anything about it to be honest, and no indication that it would be a default thing in most drivers, but then again I barely understand what it is and my reasoning is that if it isn't an option in my nVidia Control Panel, then it isn't something to worry about. Either way, Blitz allows direct reading and writing to buffers - Read/WritePixelFast. If default mipmapping would affect memory operations on the texture's data, then it would also affect these memory operations on the buffers. I would think that either this has to be purposely programmed in to happen, or Blitz has somehow ensured that no default mipmapping has occurred. As far as I know, this shouldn't be a problem but if anyone knows more about this it would be good to clarify it.


jfk EO-11110(Posted 2010) [#23]
Mipmapping is on by default. As somebody stated, when you unlockbuffer a texturebuffer, then the entire texture is uploaded again. In this process the mipmapping copies of the original texture are renewed as well, as I guess. When you WritePixelfast then you don't poke into the Vram, at least that's what I think. It is however a fact that the layout of textures in vram is much more complicated compared to the one of images.


Serpent(Posted 2010) [#24]
Okay I don't actually know what mipmapping is after all :P

But anyway, when you lock the buffer I think it is copied from its complicated form in vram to a simple format in memory. Otherwise all of my direct memory copying - which has actually been working! - would just cause errors or result in an untintelligible image.


Ross C(Posted 2010) [#25]
jfk, what to do you when you refer to:


flagwise 256 to 256



All the descriptions actually with the flagwise word?


jfk EO-11110(Posted 2010) [#26]
"Flagwise 256 to 256" means, I am doing a CopyRect from a Texture that was created or loaded with the Flag 256 to an other texture that was created or loaded with the flag 256 too. Contrary to "Flagwise 0 to 256", where the first texture had no explicit Flag Parameter, or Zero, or, however, not 256. Just read this part:

tex1=CreateTexture(256,256,0)
tex2=CreateTexture(256,256,256)
tex3=CreateTexture(256,256,0)
tex4=CreateTexture(256,256,256)

so copyrect from tex2 to tex4 would be "Flagwise 256 to 256"
(Don't be confused by width and height, that is also 256 everywhere)

I have no Idea if "Flagwise" is a real english word.

Serpent: Mipmapping (afaik) is, when there are several copies of the original Texture in Vram, but in smaller sizes. The renderer will then pick Texture Data for the far away pixels from a smaller copy of the texture, resulting in smoothly blur distant parts, and also to prevent Pattern-Artefacts that may occur and look bad, without Mipmapping.

You can compare a render with and without Mipmapping when you add a Cleartexturefilters() right before you load a texture with default flags (no flags at all).

Yes, it may be possible to access the Vram directly, as you did, But you also reported that it worked only partially. In the case of Mipmapping you would have to poke the changes to all copies of the texture - rather complicated. Of course, you can turn it off. I however doubt, that accessing Vram directly can be done with the same speed as accessing Ram that can be used for EXEs etc. Because, on a PC everything has an address, even the Ports, but not every address can be accessed as fast as DDR or whatever RAM Chips. You should make some speed tests and compare it.

If you manage to access texturebuffers in a fast way, let me know. I tried that too, but failed. Copyrect from Backbuffer to Texturebuffer with Flag 256 set is still the fastest way IMHO. (BTW Flag 256 turns off Mipmapping as well, someone correct me if I'm wrong)


Serpent(Posted 2010) [#27]
Thanks for clarifying mipmapping jfk.

First of all, when I've said before that I've had mixed success, this wasn't poking into VRAM. In fact, I'm not even sure how you can.

I'll try to list some of the testing I've done with RtlMoveMemory soon. I'm swamped with work at the moment.


DareDevil(Posted 2010) [#28]
Hi all
hi have changend the code for test performance



This code implement 2 function
1° All function Biltz
2° Blitz + Cpp Dll

pleace test this

this is a first step for post process ;)

http://digilander.libero.it/eyeandlight/Demo.zip

TKS bye

excuse for my english :|


Serpent(Posted 2010) [#29]
I haven't posted anything for a while because I've been busy lately. Hopefully this thread can be revived.

DareDevil: Thanks for tidying up my code and adding the FPS counter and etc. Your test includes an external image post-processing DLL but I haven't been bothered to use the function as it doesn't relate to my problem. Without the post processing function, I had speeds of around 45 FPS - which is pretty good. Thanks for posting this.

More importantly however, the default size in this example led me to a series of tests that have shown a strange problem that occurs when using differing resolutions.

The reason that I have not actually been successful is that, when testing the MoveMemory copy from the bank to the texture, depending on the width and height, the copy may sometimes only copy around 4/5 of the pixels! I'll write a simple resolution testing program soon and see which resolutions fail and which succeed - strangely with certain 'neat' resolutions (including 1024 by 768) the results seem to be fine, which is why DareDevil's code works fine.


Serpent(Posted 2010) [#30]
This thread is well and truly dead, but everything is actually working now!

Here's an explanation of what I did - if you want the working screenshot function then just scroll down.

Previously, I've experienced major problems copying data straight from a bank into a buffer. Strangely however, when DareDevil posted his code, I noticed something different. The resolution of 1024 x 768 worked!
The first thing you notice about this is that it is a power of 2 width. After more testing I found a couple of resolutions that worked as well.
I wrote a few simple programs to test this further - aka systematically going through every resolution from 100 by 100 to 1680 by 1050. After a few days of running they hadn't finished all of the resolutions, but I could see clear patterns emerging.

I tested the BackBuffer in 2D and 3D graphics modes, ImageBuffers in 2D and 3D graphics modes, and TextureBuffers (in the 3D graphics mode).
The first thing that I noticed was that height does not matter at all. The width determines whether the resolution works or not.
With '2D' buffers (aka the BackBuffer in a 2D graphics mode and all ImageBuffers), the width needs to be a multiple of 16 to work.
With '3D' buffers (aka the BackBuffer in a 3D graphics mode and TextureBuffers), the width needs to be a power of 2.

I have written the screenshot function so that it works with any buffer, regardless of size.

I'll post this in the code archives later once I've optimised it further and allow for capturing any rectangle of the screen (rather than just the top left corner). Right now on my PC this takes a 1680 by 1050 screenshot in around 60 ms and an 800 by 600 screenshot in around 20 ms. Larger resolutions take just too long for serious realtime use, but unless you're trying to do video capture or something it is more than good enough.


gdi32 .decls file:
.lib "gdi32.dll"
BitBlt%(hDestDC%,X%,Y%,nWidth%,nHeight%,hSrcDC,XSrc,YSrc,dwRop)
CreateCompatibleDC%(hdc%)
CreateCompatibleBitmap%(hdc%,Width%,Height%)
SelectObject%(obj%,selobj%)
DeleteDC(hdc%)
DeleteObject(hdc%)
GetDIBits%(hDC%,hbmp%,uStartScan%,uScanLines%,lpvBits*,lpbi*,uUsage%)
GetDC%(hWnd%)
ReleaseDC%(hWnd%,hDC%)


kernel32 .decls file:
.lib "Kernel32.dll"
MoveMemoryIntObj(Destination%,Source*,Length%) : "RtlMoveMemory"
MoveMemoryObjInt(Destination*,Source%,Length%) : "RtlMoveMemory"



Screenshot Function: