Verify Function, anyone got one?

Blitz3D Forums/Blitz3D Programming/Verify Function, anyone got one?

ThePict(Posted 2011) [#1]
My game related data files are downloaded from web at start of game, then sometimes altered during play, but not always.

I had an idea today about downloading datafile, copying a backup to local drive. Then if the main datafile is altered a quick Verify Function to compare the main and backup would tell me if any changes had been made. If none then there is no need to upload the file, saving time and bandwidth and roaming charges if you're out and about.

I thought I'd ask the forum if anyone has written a bb func to compare two files and return True if they are., before I go off re-inventing the wheel.


Warner(Posted 2011) [#2]
Maybe you could give the file a version number o.s., and then compare the two numbers? Ie: check their "last modified date". I believe that is a very common method. Else, I often heard the term 'checksum', but I'm not completely sure what it is, but it must have a wikipedia page. It is used to verify if a file transfer has transfered all data properly. Somewhat along the lines of: if both checksums match, it is rather safe to assume that both file contents are the same.


jfk EO-11110(Posted 2011) [#3]
a simple checksum test would work this way:
read the file byte by byte, add these values to a variable, then "cut off" everything above the 16th bit (a=a and $FFFF). the result is the 16 bit checksum. do it with both files, compare the checksums, if they are the same, the files are identic (99.999999 percent for sure).

a=a+readbyte(file)
a=a and $ffff
...


ThePict(Posted 2011) [#4]
Yes I thought about a checksum type thing, but on a different tack.
My datafiles are straight ascii text files, quite readable with Notepad, and thus could be compared line by line with Readline(f1)=Readline(f2).
Cumbersome and clunky but would detect any tampering.

I'd hoped to Encrypt the datafiles with a simple key. thereby making it a binary file, less tamperable and certainly not readable with Notepad. Not sure if my compare-line-by-line would still work. Could be worth experimenting with....


Yasha(Posted 2011) [#5]
Note that comparing the files line by line requires them to both be present, so you'd be negating the whole saving-bandwidth advantage (how big are these data files?). A checksum can be worked out once in advance, and costs next to nothing to download.

There's apparently (haven't looked at it) an MD5 function here: http://www.blitzbasic.com/codearcs/codearcs.php?code=278 (Wikipedia suggests MD5 should not be used for anything important)


ThePict(Posted 2011) [#6]
I'd be comparing the files locally on the users computer to check for changes prior to uploading updated files - if there have been no changes there's no need to update the remote file.
Had a look at the MD5 link - all Greek to me I'm afraid.
I'll continue my experiments with my Verify function to compare two 'local' files for identicalness. I'll roll out the results on the codearcs section.


Yasha(Posted 2011) [#7]
If that's the route you want to take, I have a couple of suggestions:

1) Compare file size before you compare file content - the majority of changes to a human-readable text file will change its size too, so you never need to open the file to see that it's different. Big performance improvement.

2) Don't bother with ReadLine. Use ReadBytes to dump the binary contents of both files into a bank, then iterate over its length, comparing each four-byte chunk as an integer (you could compare the individual bytes, but that just means four times as many jump and compare instructions for the loop).

Suggestion 2) has one slight problem in that if Mac or Linux users edit your files, they may use the wrong line ending characters, and create "differences" when none appear to exist to the reader... but I don't know if ReadLine accounts for this anyway (to be honest very few if any such false positives of this sort will occur... it's probably a small enough number that it's more efficient to ignore it).


jfk EO-11110(Posted 2011) [#8]
if you read the bank byte by byte or int by int, both is very fast, assuming the file is not in the gigs range. But using ints requires the size to be dividable trough 4, complicating things a little bit.


_PJ_(Posted 2011) [#9]
I use something a little more complex, but the basic is here:

	Local nc_Byte
	Local n4_Size%=FileSize(sfp_CheckFile)
	Local n4_End%=n4_Size-1
	
	Local nc_Start%=False
	Local hst_ReadFile=ReadFile(sfp_CheckFile)
	
	If (n4_Size=1)
		nc_Byte=ReadByte(hst_ReadFile)
	End If
	
	Local nc_Chunks%=Log(n4_Size) Mod 256
	nc_Byte=nc_Chunks
	
	Local n4_IterByte%=nc_Start
	
	While (n4_IterByte < n4_End)
		SeekFile(hst_ReadFile,n4_IterByte)
		nc_Byte=nc_Byte+ReadByte(hst_ReadFile)
		nc_Byte=nc_Byte Mod 256
		If (n4_End-n4_IterByte)>nc_Chunks
			n4_IterByte=n4_IterByte+nc_Chunks
		Else
			n4_IterByte=n4_IterByte+1
		End If
	Wend
	CloseFile hst_ReadFile
	
	Print nc_Byte
	


Basically, this checks against a single byte checksum, by testing bytes at key points throughout the file.