Text To Checksum (Source Included)

BlitzMax Forums/BlitzMax Programming/Text To Checksum (Source Included)

dw817(Posted 2015) [#1]
As part of a compression and verification algorithm I need for my engine, I want to have a string be converted to a single 16-bit number from 1 to 65535 where zero is no entry.

With these in place, we can start our code:
Strict
Local t$,i,cs,ok
Print
Repeat
  ok=1
  t$=Input$("Enter sample text to create checksum: ")
  If Len(t$)>16
    Print"too big, please keep text under 16-characters."
    ok=0
  EndIf
Until ok
Print
t$=Lower$(t$)
For i=1 To Len(t$)
  cs=cs+Asc(Mid$(t$,i,1))*i
Next
Print"Checksum # is "+cs
End


My question is, is there a better and more accurate way of getting a 16-bit checksum from a 15 or 16-character text ?

How would you code and improve it ?


Brucey(Posted 2015) [#2]
Here's one based on Fletcher-16 ( https://en.wikipedia.org/wiki/Fletcher%27s_checksum )

SuperStrict

Framework brl.standardio

Local txt:String = "Some Text"

ShowCheckSum("Some Text")
ShowCheckSum("some text")
ShowCheckSum("Small")
ShowCheckSum("Much longer text")
ShowCheckSum("1111111111111111")

Function CalcChecksum:Int(txt:String)

	Local sum1:Short
	Local sum2:Short
	
	For Local index:Int = 0 Until txt.length
		sum1 = (sum1 + txt[index]) Mod 255
		sum2 = (sum2 + sum1) Mod 255
	Next

	Return (sum2 Shl 8) | sum1

End Function

Function ShowCheckSum(txt:String)
	Print "Checksum for '" + txt + "' = " + CalcChecksum(txt)
End Function


Note that it can return '0' as a valid checksum. But you work around that if you need to.


Bobysait(Posted 2015) [#3]
the fletcher method can be improved by "moding" only on the return
(I didn't benched the mod function from blitzmax, but I remember the blitz3d one is pretty slow)




dw817(Posted 2015) [#4]
... interpreting. Looks to me like you are changing my step of multiplying by the index with adding the total to a new total.
cs=cs+Asc(Mid$(t$,i,1))
cs2=cs2+cs

Where cs2 mod 65536 is the total.

Actually I think that does add a very definite accuracy to the final checksum. Nicely done, Brucey and Bobysait. I'll experiment with this.

Experimenting ... yep, excellent. I tried "apple" and "bople." I get a single digit of difference though if you add only the ASCII by itself these come out the same. Good job guys !

Even with 100 characters of a string "~" the highest number I get is 63600. No need for a 16-character limit apparently. I'll be using MAP variables to store the return string.

For those curious, the completely accurate way of generating a checksum might be to multiply the totals each of the ASCII characters by themselves, but that generates too large a sum in a very short time that not even long integer or REAL is capable of handling it ... unless you do it this way:

Strict
Local t$,i,cs,ok
Print
Repeat
  ok=1
  t$=Input$("Enter text to create checksum for: ")
  If Len(t$)>15
    Print"too big, please keep text under 15-characters."
    ok=0
  EndIf
Until ok
Print
t$=Lower$(t$)
cs=Asc(Mid$(t$,1,1))
For i=2 To Len(t$)
  cs=cs*Asc(Mid$(t$,i,1))
  cs=cs Mod 65536
Next
Print"Checksum # is "+cs
End


My question is, is this more accurate or less accurate than adding an index twice as you - both Brucey and Bobysait have written above ?


Bobysait(Posted 2015) [#5]
it depends on the input, but your method can be very inaccurate :
In many cases (not that much, but really noticeable) it will return "0" (actually, at any time cs is 65536 it will ends with "0")

by the way, with blitzmax, you can use strings as byte pointer (faster than using Asc/Mid...), but if you want to use Asc, your initialisation use a useless Mid() -> Asc return the ascii code of the first char of the string (whatever the length of the string). -> cs = asc(t)

Anyway, there is still this "0" behaviour that ruins your algorithm.


Brucey(Posted 2015) [#6]
Good job guys !

To be honest, I just googled "16-bit checksum", and converted the C-code on the page referenced in my original post, to BlitzMax....


dw817(Posted 2015) [#7]
Bobysait:

* I was going to add code at the end:
if cs=0 then cs=1
but I wanted to show it was a true 16-bit checksum with no fudging of the numbers.

And yes, you are likely correct. NOW if I could just multiply the lot well past a trillion and ONLY mod 65536 for the end, it might be about the most accurate checksum of all. Sadly I don't think there is a way BlitzMAX can handle very VERY large numbers.

And that's understandable as seldom they would ever be needed.

Brucey:

* Now that I'm aware BlitzMAX can run C-code:

http://www.blitzbasic.com/Community/post.php?topic=69277&post=775359

... is there a section I can read (URL) to see many different source code examples of C-code used and implemented in Blitzmax to create a variety of unique effects and abilities ?

At one point I coded in C++ exclusively. It's interesting to note I could start some of this again, yet still maintain the strong variable classes in BlitzMAX.


Brucey(Posted 2015) [#8]
You may find a one or two of lines of C and C++ code integrated with BlitzMax at https://github.com/maxmods
and more specifically at https://github.com/maxmods/bah.mod
Also of interest may be https://github.com/bmx-ng

There are others if you search the forum.


dw817(Posted 2015) [#9]
Hi Brucey:

* I am seeing your programs.

https://github.com/maxmods/bah.mod

Do you have a page which describes what each does ?

Also I noted you are saying MOD 255 instead of MOD 256 above. Will that make much of a difference retrieving values 0-254 instead ?


Brucey(Posted 2015) [#10]
I am seeing your programs.

They aren't programs. They are modules that plug into BlitzMax intended to make it more useful. For example, regex.mod adds support for regular expressions (via pcre), libxml.mod adds an xml processor (via libxml), etc...
Do you have a page which describes what each does ?

There was, but since the migration from googlecode to github, it needs to be recreated. You can check here for an out-of-date list.

255 is $FF.
256 is $100, which is outside your 16-bit range.


dw817(Posted 2015) [#11]
Right, but you're doing a MOD, Brucey. A MOD always gets the number minus one of the selection.

n mod 256 will always retrieve a value 0-255, perfect for a byte. I just thought it was a typo for mod 255 which yields 0-254. It's not a big.

In truth, the checksum does get a bit doctored.

a=n mod 256
b=n/256
if a=0 then a=1
if a=13 then a=14
if a=94 then a=95
if b=0 then b=1
if b=13 then b=14
if b=94 then b=95
r=a+b*256

The reason being I am using INSTR to search for these critical elements yet I still need to have a valid 2-character identification mark enclosed and not always in the same position.

Checking your list. ... .. . ? Wow you've been busy.

Does your BASS or FMOD play MIDI and can retrieve the playing position and length ?

Now I am certainly all set with Casaber's wonderful media player and routines. I'm quite familiar with MCI. Yet - it never hurts to learn more techniques if possible.