Large Files

BlitzMax Forums/BlitzMax Programming/Large Files

TaskMaster(Posted 2009) [#1]
Can BlitzMax handle reading/writing large files? i am talking about files whose filesize goes beyond the Integer limit? It looks to me like all of BlitzMax's file handling uses integers, which cause it to fail when dealing with really large files? I am trying to copy a 5GB file using BlitzMax and it is failing and acting quite weird.


MGE(Posted 2009) [#2]
Can you somehow split the process into smaller chunks?


TaskMaster(Posted 2009) [#3]
I am just trying to copy one large file from one place to another. If all of the stream seeking commands operate with Integers, there will not be a way to move to a point in the file past the Integer barrier.


TaskMaster(Posted 2009) [#4]
If I were to go through all of the bmx files that do the streams and file handling and switch the variables to long, would it work? Or is there something somewhere I do not have access to that affects this process?


TaskMaster(Posted 2009) [#5]
I wrote this whole backup application. But never tested any files larger than 2GB while developing it. Now that I am trying to actually use it, I have run in to this roadblock. Can any one give me an idea on how I could copy a file larger than 2GB? Without shelling out to use the copy command :)

Some way to use streams, or maybe call some C code to copy large files for me?

Even simple things like the FileSize command just returns an Integer that has rolled over to a negative number.

Edit: It seems the CopyFileEx WinAPI function can do it, and it has a callback function so I would be able to get progress from it as it copies. Can anybody point me in the right direction to Implement this in BlitzMax?

I also found this which can read file sizes larger than 2GB:
http://www.blitzmax.com/codearcs/codearcs.php?code=1688

So, part of the problem is solved.

Thanks for any help that can be offered. This is driving me nuts as I wrote this backup app and tried to implement it as my backup solution in our office today and as soon as it hit a 3GB zip file, it failed miserably.


TaskMaster(Posted 2009) [#6]
Feel like I'm talking to myself. :)

I figured it out, hopefully this will help somebody else at some point.

Here is the code that does it:

Extern "win32"
Function CopyFileExW:Int(SrcFile:Byte Ptr, DestFile:Byte Ptr, CallBack:Byte Ptr, Data:Int Ptr, Cancel:Int Ptr, Flags:Int)
End Extern

Local src:String = "srcfile.ext"
Local dst:String = "destfile.ext"
CopyFileExW(src.ToWString(), dst.ToWString(), Callback, Null, Null, 0)

Function Callback(lFileSize:Long, lBytesTransfered:Long, lStreamSize:Long, lStreamBytesTransfered:Long, iStreamNumber:Int, iCallBackReason:Int, iSrcHandle:Int, iDestHandle:Int, data:Int)

debuglog lBytesTransfered

End Function



Russell(Posted 2009) [#7]
The filesize limit also depends on what file system you're using. NTFS, for example, doesn't have a filesize limit (supposedly).

Russell


TaskMaster(Posted 2009) [#8]
Well, all is not exactly well. That callback works fine on my notebook running vista. But that callback fails miserable on my desktop running XP.

For now, I have disabled the callback.

Does anybody know how to make that callback work? Or can at least give me some ideas on how I need to declare it and call it?

Thanks for any help.


tonyg(Posted 2009) [#9]
Out of interest what code were you using in native Bmax?


Perturbatio(Posted 2009) [#10]
I'm sure you could probably manage it with Pub.Stdc and fread/fwrite


TaskMaster(Posted 2009) [#11]
That code I posted is exactly what I was using to test it.

Here is a slightly modified version, as I have been playing with it trying to get it to work.

SuperStrict

Extern "win32"
  Function CopyFileExW:Int(SrcFile$W, DestFile$W, CallBack:Byte Ptr, Data:Int Ptr, Cancel:Int Ptr, Flags:Int)
End Extern

Local src:String = "1.txt"
Local dst:String = "2.txt"
CopyFileExW(src, dst, CallBack, Null, Null, 0)
DebugLog "hello"
End

Function Callback:Int(lFileSize:Long, lBytesTransfered:Long, lStreamSize:Long, lStreamBytesTransfered:Long, iStreamNumber:Int, iCallBackReason:Int, iSrcHandle:Int, iDestHandle:Int, data:Int)
  DebugLog lBytesTransfered
  Return 0
End Function


I get slightly different results now. The copy happens, the callback gets called, but then the program freezes as though the CopyFileEx never returns.

If I do not give a Callback, like this:

CopyFileExW(src, dst, Null, Null, Null, 0)

Then the file is copied and the program ends like it should.


tonyg(Posted 2009) [#12]
Sorry, I thought your original post suggested you tried Bmax (copyfile or streams or something) and got a problem and *then* tried CopyFileExW.


TaskMaster(Posted 2009) [#13]
Oh I did. BlitzMax's CopyFile cannot handle any filesize above the 2GB barrier. It uses streams and Integers. And since Integers cannot be larger than 2GB, it fails.

Well, I shouldn't say "it fails", but once it does the copy, the CopyFile command never returns. So, the application hangs up.


tonyg(Posted 2009) [#14]
I can see that copyfile uses copystream but not sure what you mean about it using integers.
Doesn't it just copy bytes?
I might be missing something though.


Perturbatio(Posted 2009) [#15]
I might be missing something though.


I *think* he means that you can't specify an offset to read from greater than an int is capable of storing.


tonyg(Posted 2009) [#16]
... but I thought the intent was to copy the file (copyfile or copystream) and I can't see where an integer comes into it.
Doesn't matter I suppose just interested.


TaskMaster(Posted 2009) [#17]
I am looking into the BlitzMax side again. The whole problem on the BlitzMax side could have come from the FileSize function not working, which caused something elseof mine to fail, and not the CopyFile.

But, I just did a CopyFile on a 3.22GB file and it copied, but the CopyFile command never returned, so the Blitz app hung.


TaskMaster(Posted 2009) [#18]
Just did a little more testing, and from what I can see, the CopyFile command (and more specifically, the CopyStream Function) just continues right on reading past EOF when copying a file larger than the 2GB Integer barrier.

Function CopyStream( fromStream:TStream,toStream:TStream,bufSize=4096 )
	Assert bufSize>0
	Local buf:Byte[bufSize]
	While Not fromStream.Eof()
		toStream.WriteBytes buf,fromStream.Read( buf,bufSize )
	Wend
End Function


It never seems to find fromStream.Eof. And the bufSize variable just starts getting larger and larger.

I am going to post this into the BlitzMax bugs section.


Streaksy(Posted February) [#19]
Out of everything I wish this was fixed. I want to read/write to big package files. It's pretty normal these days. Copying and everything should be basic stuff. The file-size limit is embarrassing. I made a forum post about it years ago and it didn't help. I think someone linked me to a large file library which just didn't work. It's 2017 and I still can't find a way to get bmax to handle large files. And 2g isn't even large these days. It's SUCH a limitation.


Henri(Posted February) [#20]
Hi,

not sure, but this issue might be fixed in Brucey's updated pub & brl modules found here: https://github.com/maxmods. Be sure to backup the old ones just in case.

By looking at the CopyStream-function the problem seems to be the bufSize param which is an Int that holds maximum value of 2147483648 (2,1 gigabytes) and beyond that the number can be anything.
Not sure if this is the only problem though.

-Henri


cps(Posted February) [#21]
This works on small files, copies 'map1.jpg' to 'new2.jpg' But I can't find a large file to try it on. Have fun cps



EdzUp MkII(Posted February) [#22]
you would have to write your own C copy module to get round this.

Could you not just read x bytes from the source file and write that to the target file without seeking or am I missing something here.

Also why such a big file in all my years of coding I've never got files in the Gigabytes ever.


grable(Posted February) [#23]
@cps
Reading 1 byte at a time is not recommended, especially for large files.
And it does not work on large files im afraid, since the underlying apis dont support >4gb, and TStream uses seeking to get the size which it then uses to check eof.

You can implement your own using something like the stdio 64bit file apis though. All you need is fopen64, ftelli64 & fseeki64, no need for C.
(i variants are Windows specific, for linux there is ftello64 & lseek64, but they use off64_t for offsets so i dont know if they are compatible with bmx Long)
But you have to abandon TStream and and write your own using Longs instead, since TStream uses Ints all over the place.

EDIT: I spoke too soon, turns out Long return values from external functions are still not handled correctly in some cases. So ftelli64 needs a C forwarder to get the full value.


grable(Posted February) [#24]
Heres something i had lying around, which i abandoned because i couldnt get Long return values to work. But with a small C stub it does :)
Just dont pass this to anything else, use it on its own. And use LongPos(), LongSize() and LongSeek() instead of the regular ones.

Stream64.c
#include <stdint.h>
#include <stdio.h>

void ftelli64( int64_t* result, FILE* f) {
	*result = _ftelli64(f);
}
Stream64.bmx



Streaksy(Posted February) [#25]
Grable: I must have a bad mingw. I've had problems setting it up from day one. I could only get one strange version to work. I get this error when I try to run that blitz code:

Building Stream64
Compiling:Stream64.c
gcc: installation problem, cannot exec `cc1': No such file or directory
Build Error: failed to compile E:/BMax Projects/Stream64/Stream64.c
Process complete


Derron(Posted February) [#26]
- use BMX-NG and you get "long"-file-support with TStream and the likes
or
- use a proper vanilla BMX installation with the installed MinGW-libs copied to BMX and the modules recompiled

With Bruceys BMK you could even get rid of /BlitzMax/libs and therefore store the MinGW you want in /BlitzMax/MinGW32 and it is automatically used (disregarding of other MinGW installations you might have done before).


bye
Ron


grable(Posted February) [#27]
@Streaksy
Yeah, your MinGW is probably broken. As a last resort you can try compiling the C file on its own via GCC and import the object file instead. But it likely wont work either :/

@Derron
The Open Source version of BlitzMax is also nice, its bmk doesnt use /BlitzMax/libs either and uses GCC for linking. And one can use the latest TDM-GCC-64 without any trouble.
So for those having trouble with MinGW, that might be a solution.

I even backported the GCC linking to vanilla bmk ;)


cps(Posted February) [#28]
@grable Thanks for the heads up. Not to sure I'll ever want to copy such large files, but if I do I'll not be left wondering why it won't work. Cheers, have fun cps