View/Rewrite LoadTexture's Code

Blitz3D Forums/Blitz3D Programming/View/Rewrite LoadTexture's Code

Doktor Zeus(Posted 2010) [#1]
Ok, I've been hammering away at code for several days now, and I've come to the conclusion that there is no way to use Blitz's standard internal commands to create the results I want. The sad thing is the alteration I want to make is actually pretty small and simple. I want to rewrite Blitz's "LoadTexture" command so that it can do so directly from a filestream instead of needing to receive a single, standard encoded file. Using Blitz's internal commands is far too slow for the job.

See, the problem I'm getting is this. I can save a texture to a stream by reading each pixel's RGBA value and saving it as an integer, but this suffers from two problems. Firstly, it is very slow when compared to LoadTexture. The fastest routine I've found that performs the function is three times slower, and my specialised routine that does everything I need is nine times slower (which doesn't even make sense seeing as it's actually a much simpler version). Either way, both methods are unsuitable, and both methods also result in very large files. Ideally what I'd like to do is chain a series of PNG format images together into one big file with a simple index file header, then load each image individually from its index.

I know I could use an animtexture, but the point is I'm trying to reduce memory overheads and loading time for the operation by only loading and holding in memory the data I need, besides which it'd still be a PNG file which anyone could open and view.

Now, is there any way I can view and directly alter or ammend the LoadTexture command in its natural state so that it can take its data from specific parts of the file? Maybe using DLLs? Any resources on creating DLLs for Blitz would also be useful.


Yasha(Posted 2010) [#2]
There is no easy way to do this.

Firstly, no, there is no way to view or amend the LoadTexture function. It's precompiled, and built into the Blitz3D compiler, so no source exists in the commercial distribution for you to modify.

I can think of two possible solutions:


1) Write a completely independent LoadTexture function that creates a functioning Blitz3D texture object, or at any rate a DirectDrawSurface that can then be copied using Direct3D drawing functions onto a "real" Blitz3D texture. Personally I would call this ridiculously hard, but not impossible (MikhailV seems to have written his own loaders for FastExtension, so it can clearly be done).

There's also room for an intermediate solution here where you just create the DDS and then copy it to a native B3D texture, rather than recreating all of Blitz3D's existing functionality... maybe similar to this?


2) Load the data as you would a data file, into a bank. Create a new texture; lock the buffer. Now with some clever application of Windows API functions (RTLMoveMemory) you ought to be able to copy the data from bank to buffer in a single go. Unlock buffer, free bank.

Writing directly to buffers is no faster than WritePixelFast when done pixel-by-pixel, but you might see a big speed boost if you do it in a block.


2) is a much easier solution - but it might not provide much of a speed boost.

EDIT: Based on the data given in the other thread, the real speed hit comes from the ReadBytes function, as RtlMoveMemory is too quick to measure, and WritePixelFast is also pretty quick. No idea why simply loading data is so much slower than using the builtin load command, but that's where the slowdown is.


Kryzon(Posted 2010) [#3]
1) Write a completely independent LoadTexture function that creates a functioning Blitz3D texture object, or at any rate a DirectDrawSurface that can then be copied using Direct3D drawing functions onto a "real" Blitz3D texture. Personally I would call this ridiculously hard, but not impossible (MikhailV seems to have written his own loaders for FastExtension, so it can clearly be done).

Yeah, if you can interface with the texture's data buffer, that should be possible.

Here are the blitz elements structure (check the Texture): Структуры блица в памяти 2.

Courtesy of MikhailV.


Serpent(Posted 2010) [#4]
Yasha's option 2 is clearly the best. RTLMoveMemory will copy everything over incredibly fast. If you can get the data into a data bank the code will look like this:

LocBnk = CreateBank(76)  ;Bank used to get location of data in memory
MoveMemoryObjInt(LocBnk,TextureBuff,76)
Loc = PeekInt(LocBnk,72)
FreeBank LocBnk
LockBuffer TextureBuff
MoveMemoryIntObj(Loc, ImgData, BankSize(ImgData))
UnlockBuffer TextureBuff

Where ImgData is the bank and TextureBuff is the texture buffer.

Userlibs:
.lib "Kernel32.dll"
MoveMemoryIntObj(Destination%,Source*,Length%) : "RtlMoveMemory"
MoveMemoryObjInt(Destination*,Source%,Length%) : "RtlMoveMemory"



jfk EO-11110(Posted 2010) [#5]
You forget one thing: writing to Videoram is much slower than writing to ordinary ram. In fact some sort of upload is peformed. Some time ago I hacked Blitz to use a Bank as an Image. Using a normal Bank is fast, but as soon as the bank pointed to Videoram, it became as slow as Writepixelfast. Still fast, but not really fast compared to moving Bulks of Memory.

What you can do is: use the Texture Flag 256 and Copyrect from Backbuffer or so to it, this is really fast. I've posted a Speed test for the Flag 256 with various Sources and Destinations some time ago, please use the search link.

No Idea on how to copy a Stream to eg. the Backbuffer. I guess writing Pixels to the Backbuffer ist faster than to a Texturebuffer, is it?


Serpent(Posted 2010) [#6]
Sorry - my code above should be changed so that the LockBuffer is at the start of the code:

LockBuffer TextureBuff
LocBnk = CreateBank(76)  ;Bank used to get location of data in memory
MoveMemoryObjInt(LocBnk,TextureBuff,76)
Loc = PeekInt(LocBnk,72)
FreeBank LocBnk
MoveMemoryIntObj(Loc, ImgData, BankSize(ImgData))
UnlockBuffer TextureBuff


The code I posted above actually wouldn't have worked unless the LockBuffer is moved before the memory location of the buffer is found.

You forget one thing: writing to Videoram is much slower than writing to ordinary ram.


LockBuffer moves the buffer information from V-RAM to normal RAM, right? And UnlockBuffer does the opposite. Please let me know if I'm wrong. This eliminates the slow V-RAM issues:

This operation is actually incredibly fast. The upload/download from video RAM is performed quickly in LockBuffer and UnlockBuffer - in my 1024x768x32 tests a Lock/UnlockBuffer pair was less than 1ms. The RTLMoveMemory is also incredibly fast because it is moving normal RAM to normal RAM. In fact, I think the entire operation comes out to under 1ms.


No Idea on how to copy a Stream to eg. the Backbuffer. I guess writing Pixels to the Backbuffer ist faster than to a Texturebuffer, is it?


Because LockBuffer moves the buffer info from V-RAM to RAM, it doesn't matter which buffer you use - they should all take just as long. Writing the image to the BackBuffer then using CopyRect would only slow down the operation.



Either way, I'd say the code executing in under 1ms is an added bonus - either way it allows you to copy data directly from a bank into an image :). You should be able to modify this to copy the data from streams rather than banks, but I'm not sure if a pointer to a stream is simply a pointer to a null-terminated string or whatever...


jfk EO-11110(Posted 2010) [#7]
LockBuffer moves the buffer information from V-RAM to normal RAM

What do you mean by "buffer information"? As far as I know, a texture must be uploaded to VRam by DirectX, if you want to alter parts of it, the entire Texture needs to be re-uploaded. At least as long as the Flag 256 isn't set.

I don't think Locking a buffer makes access to it faster, just more save. Correct me if I'm wrong.


Yasha(Posted 2010) [#8]
the entire Texture needs to be re-uploaded


Locking the buffer creates a copy of the texture in memory that you can alter; unlocking it re-uploads that to VRAM. So you get some advantage our of using a locked buffer and Read/WritePixelFast because it doesn't have to re-upload the whole texture after each individual pixel operation the way it does with Read/WritePixel (which is why you're not allowed to draw with a locked buffer; the changes haven't been applied to the actual texture yet).

Thus, creating a buffer in normal memory with LockBuffer, and copying the data in a single memory copy operation with RtlMoveMemory, then re-uploading it with UnlockBuffer, should be faster as it also cuts out the individual data poking operations.

I think that's the idea. At any rate it works pretty well.


jfk EO-11110(Posted 2010) [#9]
Should? :) Why did noone try this yet? Can't be that hard. Ehrm, my excuse it I haven't got admin rights right now, and I'm afraid I have to add it to a decls first. Maybe later. The idea is good, of course.


Serpent(Posted 2010) [#10]
What do you mean by "buffer information"?


Yeah I wasn't too clear about this sry - simply the 4-byte values for the colours of the pixels.

Why did noone try this yet?


I have already used a very similar approach to copy images of the screen into blitz buffers (i.e. take screenshots). As a part of my testing, I was copying image data to banks and then into buffers, so unless I've mistyped the above code, I'm certain that it works.

One thing that I neglected to mention above though was that in order for the copy to work, the destination buffer needs to have a width that is a multiple of 16. That is, for an image buffer or the backbuffer/frontbuffer of a 2D graphics mode. If it is a TextureBuffer or the backbuffer/frontbuffer of a 3D graphics mode, the width needs to be a power of 2.

The simple way around the above problem (which I used) is, if the destination buffer doesn't fit the criteria I listed, simply create a new image (with a width that is a multiple of 16), copy the data to that, and then copyrect the pixels from that image to the buffer you want it to end up in. This can be seen in my screenshot functions in the code archives. However, I think I did something wrong regarding 3D buffers in the code (which I've neglected to update so far).


I think the issues with buffer widths results from some sort of interlacing at irregular widths, but I've just taken the easy way out.


Edit: @jfk you mentioned your store in V-RAM (256 flag) tests above. As a matter of fact, that is very useful for minimising the time taken by the copyrect that you might need to use.