Code archives/File Utilities/LZMA Compression

This code has been declared by its author to be Public Domain code.

Download source code

LZMA Compression by Otus2008
This LZMA compression module works like Pub.zlib but uses the public domain LZMA SDK (http://www.7-zip.org/sdk.html) instead of zlib for compression - hopefully for better compression ratio. See http://en.wikipedia.org/wiki/LZMA for information about the algorithm.

If you don't want to go through the hassle of installing, you can download a zipped module version here: http://jan.varho.org/blog/programming/blitz/lzma-module-version-1-01/

Installation:

1 - Save the following code as lzma.bmx and LzmaEnc.c

2 - Download LZMA SDK from http://www.7-zip.org/sdk.html

3 - Copy the "C" directory from the SDK and rename it "lzmasdk"

4 - Build modules and docs.
'lzma.bmx
SuperStrict

Rem
bbdoc: Lzma compression
End Rem
Rem
Module Otus.Lzma

ModuleInfo "Version: 1.0"
ModuleInfo "Author: Igor Pavlov (7-zip.org)"
ModuleInfo "License: Public domain"
ModuleInfo "Credit: BlitzMax interface by Jan Varho"
ModuleInfo "History: 1.01 Release"
ModuleInfo "History: Fixed interface to exactly match zlib"
ModuleInfo "History: Removed redundant wrapper"
ModuleInfo "History: Upgraded SDK to 4.65"
End Rem
Import "LzmaEnc.c"
Import "lzmasdk/LzmaUtil/Lzma86Dec.c"
Import "lzmasdk/Alloc.c"
Import "lzmasdk/Bra86.c"
Import "lzmasdk/LzmaEnc.c"
Import "lzmasdk/LzmaDec.c"
Import "lzmasdk/LzFind.c"

Extern

Rem
bbdoc: Uncompress a block of data
End Rem
Function LzmaUncompress( dest:Byte Ptr, destLen:Int Var, src:Byte Ptr, srcLen:Int Var ) = "Lzma86_Decode"

Rem
bbdoc: Compress at the compression level given using a specified dictionary size
about:
Compression level should be in the range 1-9 with 9 the maximum compression.

Valid dictionary sizes are between 2^12 and 2^27 bytes. A power of two is recommended.
The default (used in LzmaCompress and LzmaComress2) is 2^24 bytes (16 MB).
End Rem
Function LzmaCompress3( dest:Byte Ptr, destLen:Int Var, src:Byte Ptr, srcLen:Int, level:Int, dictSize:Int = LZMA_DICT_SIZE ) = "_LzmaCompress"

End Extern

' Dictionary size in bytes (16MB)
Const LZMA_DICT_SIZE:Int = $1000000

Rem
bbdoc: Compress a block of data at default compression level
end rem
Function LzmaCompress( dest:Byte Ptr, destLen:Int Var, src:Byte Ptr, srcLen:Int )
	LzmaCompress3( dest, destLen, src, srcLen, 5, LZMA_DICT_SIZE )
End Function

Rem
bbdoc: Compress a block of data at the compression level given
about:
Compression level should be in the range 1-9 with 9 the maximum compression.
End Rem
Function LzmaCompress2( dest:Byte Ptr, destLen:Int Var, src:Byte Ptr, srcLen:Int, level:Int )
	LzmaCompress3( dest, destLen, src, srcLen, level, LZMA_DICT_SIZE )
End Function

// LzmaEnc.c
// Wrapper for the Encode function without filtering

#include "lzmasdk/LzmaUtil/Lzma86Enc.c"

void _LzmaCompress( Byte *dest, size_t *destLen, const Byte *src, size_t srcLen,
    int level, UInt32 dictSize )
{
	Lzma86_Encode( dest, destLen, src, srcLen, level, dictSize, SZ_FILTER_NO );
}

Comments

Otus2008
Simple test app.

'Tests that lzma module works.

SuperStrict

Framework BRL.StandardIO

Import "lzma.bmx"

Const DATA_BYTES% = 2560000


Print "Generating "+DATA_BYTES+" bytes of sequential data..."

Local rsize% = DATA_BYTES
Local raw:Byte[rsize]
Local rbuf:Byte Ptr = raw

For Local i% = 0 Until DATA_BYTES
	rbuf[i] = i
Next

Print "Done."


Print "Compressing data using default level..."

Local csize% = DATA_BYTES
Local comp:Byte[csize]

LzmaCompress comp, csize, raw, rsize

Print "Done: "+csize+" bytes."


Print "Compressing data using maximum compression level..."

csize = DATA_BYTES

LzmaCompress2 comp, csize, raw, rsize, 9

Print "Done: "+csize+" bytes."


Print "Uncompressing data..."

Local dsize% = DATA_BYTES
Local dec:Byte[dsize]

LzmaUncompress dec, dsize, comp, csize

Print "Done: "+dsize+" bytes."


Print "Verifying integrity..."

If dsize <> DATA_BYTES Then Print "Failed!" ; End

Local dbuf:Byte Ptr = dec

For Local i% = 0 Until dsize
	If Byte(dbuf[i] - i) Then Print "Failed!" ; End
Next

Print "Done."



xlsior2008
nice!


plash2008
Does LMZA compress faster/smaller then zip?


xlsior2008
Smaller than zip, apparently. From the wiki link:

The Lempel-Ziv-Markov chain-Algorithm (LZMA) is an algorithm used to perform data compression. It has been under development since 1998[1] and is used in the 7z format of the 7-Zip archiver. This algorithm uses a dictionary compression scheme somewhat similar to LZ77 and features a high compression ratio (generally higher than bzip2


I know that the 7zip native format tends to be smaller than plain vanilla .zip on average, so this one should/could be too.


Otus2008
Yes, smaller. LZMA is used extensively in for example live Linux distributions precisely because of its better compression ratio. It should also have relatively fast decompression, though I haven't profiled this implementation. I haven't tested this extensively yet, but it seems like image data compresses 10-20% better than with zlib.

I can post benchmarks later.


ImaginaryHuman2010
This looks cool and I'd like to use it but I'm getting compilation errors. I even tried reinstalling the sdk part with version 4-whatever.

Building test
Compiling:test.bmx
flat assembler version 1.68 (1680888 kilobytes memory)
3 passes, 2335 bytes.
Linking:test.exe
C:/Users/Admin/Documents/DocumentsByMe/Other/BlitzMax1.36/mod/otus.mod/lzma.mod/lzma.release.win32.x86.a(LzmaEnc.c.release.win32.x86.o):LzmaEnc.c:(.text+0x3d): undefined reference to `LzmaEncProps_Init'
C:/Users/Admin/Documents/DocumentsByMe/Other/BlitzMax1.36/mod/otus.mod/lzma.mod/lzma.release.win32.x86.a(LzmaEnc.c.release.win32.x86.o):LzmaEnc.c:(.text+0x19c): undefined reference to `LzmaEncode'
Build Error: Failed to link C:/Users/Admin/Documents/DocumentsByMe/Other/BlitzMax1.36/mod/Otus.mod/Lzma.mod/test.exe
Process complete


ImaginaryHuman2010
Hmm, actually your code above does work, but the code on your blog/website/google code gives the above errors.

I'm also not sure your module is compressing properly... in the test is compresses 2.5 megabytes down to 678 bytes!

Generating 2560000 bytes of sequential data...
Done.
Compressing data using default level...
Done: 678 bytes.

And if I put random numbers in the array rather than incresing integers, it gets even smaller down to 14 bytes!

Generating 2560000 bytes of sequential data...
Done.
Compressing data using default level...
Done: 14 bytes.

This cannot be correct at all. Does anyone have this working properly?


Otus2010
Hi,

I emailed you, but here's the answer for others:

678 bytes is correct. 14 bytes is an error as uncompressible data may need a larger target buffer (larger csize).

I don't have the linking problem on XP or Ubuntu.


ImaginaryHuman2010
Hey, yes, thanks, I needed a bigger buffer because the compressed size was a bit bigger than the original - which is typical for random data.

Okay... so now here is another issue. I've been using the lzma compression (above) on a variety of files and sometime I find that LzmaUncompress sometimes reports that the length of the decompressed data is 1 byte longer than it should be. It doesn't happen with all files. I was trying to figure out why my decompressed data wasn't working properly and it was because there was an extra 0 byte on the end of it, due to the length reported by LzmaUncompress.

I don't know if that means it's a bug in LZMA or in your wrapper? All of the bytes in the decompressed file are the same as the original file, it's just the length can be slightly off. Thankfully it's only off by an additional byte, occasionally, which isn't too difficult to work around. It just means you have to include a list of the original file sizes as part of the app and use that to determine how much data is output from the decompression.

Any ideas what's going on there?

Also I am trying to reinstall the module and I am now getting the above errors for every version, even the version that is quoted above, version 1.01 zip, 1.02 zip, and direct from the 7z sdk. What's going on?


Code Archives Forum