String compression advice

Blitz3D Forums/Blitz3D Programming/String compression advice

MadJack(Posted 2009) [#1]
If I had a very long string containing numbers + alphabetic characters
e.g.
+1200121050023/0200134000233430+1200121450012

what would be a good method to compress the string into a smaller string?


Gabriel(Posted 2009) [#2]
Can we assume this string is a packet of network data? So the speed of compression would be critical?

If so, and if memory serves, I'm pretty sure I remember Antony Wells tested zip compression for realtime networking and found that to be pretty performant. I can't offer any personal experience, as I've never gone beyond four players over a network (KSPool) so compression was never necessary for me.


MadJack(Posted 2009) [#3]
Can we assume this string is a packet of network data? So the speed of compression would be critical?


Actually no - I'm toying with the idea of storing custom map data in a string short enough to be copied/pasted in emails - so the speed of de/compression is not an issue.


Warner(Posted 2009) [#4]
Wasn't there a ZIP userlib for Blitz ? (edit, ow Gabriel allready mentioned it sorry) But the link can be found here:
http://www.blitzbasic.com/Community/posts.php?topic=63176
If you can convert your string into a bank, you could use CompressBank, then, after emailing, use UnCompressBank, and turn the bank into a string again.


Ross C(Posted 2009) [#5]
Compression, if it's just numbers like that, and a couple of symbols, should be fairly easy man. You have 0,1,2,3,4,5,6,7,8,9,+,-,/,* (14 (based on the code you posted) different combination of character there. So, each character should take 4 bits each maximum (that would hold 0 to 14, 15 is the termination character (that gives you 16 different states). 4 divides nicely into 16 (bits for a character string) and gives you 4.

So, you should be able to shorten the above by about 4 times.

You will need in your code, an array to hold each bits values, so:


dim com_array(15)
for loop = 0 to 9
   com_array(loop) = loop
next

com_array(10) = "+"
com_array(11) = "-"
com_array(12) = "*"
com_array(13) = "/"



Then you can divide the string into character chunks of 4:

+1200121050023/0200134000233430+1200121450012

would be:

"+120"

take each character in turn and convert it into bits, using your array...

1010 + 0001 + 0002 + 0000

gives you a 16 bit character code, that you can add together with the rest of them. However, on hindsight here, i don't know if you can copy and paste 16 bit character code (UTF-16) or whether blitz3d support it... So you might have to opt for the 8 bit ascii code, which means you'll only be able to halve your string size, which is still pretty good i reckon.


MadJack(Posted 2009) [#6]
RossC

Heh - I guess I could have thought this through for myself, but this is why I post questions to the forum. Thanks for the pointer, I'll develop it further.


Andy(Posted 2009) [#7]

If I had a very long string containing numbers + alphabetic characters
e.g.
+1200121050023/0200134000233430+1200121450012

what would be a good method to compress the string into a smaller string?



I see numbers and '/' and '+', but what other alphabetic characters are there?

Generally you model the compression technique on the data.


Andy(Posted 2009) [#8]
>MadJack<

This might be useful for you.
http://www.blitzbasic.com/codearcs/codearcs.php?code=2405