FREEZE / MELT COMPRESSION PROGRAM This version is tested under SunOS 4.1.2, Xenix 2.3.2, MS-DOS. The following preprocessor symbols control the compilation of Freeze package (these can be added by hand into config.h after running ./configure): o COMPAT Turns on backwards compatibility with Freeze 1.0 (have you ever heard about it? :-) ) o FASTHASH Uses another hash function which gives some speedup, thus reducing the compression rate (!) o TEXT_DEFAULT (For MS-DOS only!) Define, if you freeze text files more often than binary ones. In this case don't forget to use "-i" to freeze binary files!! o TINY For segmented architectures - allows to build freeze in small model (data segment is less than 64K). o SMALL Decreases the amount of memory but without 64K restriction o DEFFILE Full name of the file with freeze's Huffman values; it is the only symbol defined in the Makefile to allow correlation with installation place. Other preprocessor symbols (DEBUG, GATHER_STAT) are for internal use only. The format of frozen (2.X) file is incompatible with that of frozen (1.0), but if this package is compiled with -DCOMPAT switch, you will able to unpack frozen (1.0) files, if you have them. ------------------- COMPILING ----------------------- Type "./configure", or "sh configure". If it runs successfully, you will see messages "creating config.h" and "creating Makefile". You should check them and probably you may have to make some corrections. After that simply type "make". ------------------- LINT ---------------------------- Some lint's complaints about `used/declared inconsistently' are (in my case) due to inconsistencies of /usr/include/* and /usr/lib/llib*ln. It isn't dangerous. ------------------- BUGS ---------------------------- Please send descriptions of found bugs, incompatibilities, etc. to leo@ipmce.su. ------------ SPEED & COMPRESSION RATE --------------- Compression rate depends somewhat of the hash table size and may vary with different static Huffman tables. Note: if you see 16-bits Compress works nearly as Freeze (on some big files), this means the maximum is gained, so LHA and ARJ won't do their job better by more than 1-1.5%. There are some files (I have one) that freeze compresses better than ARJ 2.20. Please! If your computer supports the string operations, try to write "asm" instructions (GNU style) which realize the memory comparison "(s1, s2, maxlength) --> matched length" in the minimum number of clock cycles. If a noticeable (5% or more) speedup is gained, please send me a message. --------------- POSSIBLE IMPROVEMENTS --------------- The high-level routines (freeze, melt) are almost independent from low-level routines (Get_Next_Match, Insert/Delete_Node, Encode/Decode_Char/Position), so if you want the speed and/or compression rate `a la vogue' you may replace the low-level routines with the homebrew (e. g.) ones and enjoy the results. (I tried to implement splay trees instead of Huffman ones and instead of static table for position information, but this gives nothing, alas.) --------- CALGARY COMPRESSION CORPUS RESULTS -------- 41127 bib.F 340447 book1.F 229188 book2.F 68610 geo.F 155157 news.F 10551 obj1.F 86216 obj2.F 19924 paper1.F 32439 paper2.F 54993 pic.F 14180 progc.F 17136 progl.F 11771 progp.F 22903 trans.F Average bits/byte on the standard set (except paper3-6) = 1104642 * 8 / 3141622 = 2.813 --------------- BACKGROUND INFORMATION ----------------- The format of a frozen file is as follows: (version 1.0 had only 2-bytes header) offset type value comment 0 byte 037 2 byte magic header 1 byte 0237 (version 1.0 - 0236) 2 short X (little endian) 4 byte Y X = 0 e e e e e d d d d c c c b b a \ > [a-f] are binary digits Y = 0 0 f f f f f f / a - number of 1-bit static Huffman codes in the `matching positions' table (see freeze.1) bb - number of 2-bit codes, etc. The numbers of 7- and 8-bits codes are evaluated from the conditions: sum(codes) = 62(dec), max code = 1111111(bin). E.g. values are: 0 1 1 1 4 10 27 18, what means: no 1-bit codes, one 2-bit, 3-bit and 4-bit codes, etc., so Huffman codes are: 00, 010, 0110, 01110, 01111, 10000, .... , 11111111. General format of frozen file is: magic header - table description - stream of bits The stream of bits is considered as a sequence of variable length dynamic Huffman codes (if their values are in the range of 0-255, they mean single bytes, special value of 256 means EOF, and further values mean the lengths of matched string.) If we have the value greater than 256, we get a static Huffman code from the stream (his value is 6 higher bits of matched string's position in the buffer), and then we get 7 bits literally. Because buffer length is 8192 bytes and maximum match length is 256 bytes, the position of matched string cannot be greater than 8192-256, that's why there is only (8192-256)/2^7 = 62 static codes. * * * The default table is tuned for both C texts and executable files (as in LHARC). If you freeze any other files (databases, images, fonts, etc.) you can calculate the matching positions distribution using the `statist' program, which calculates and displays the mentioned distribution for the given file. It is useful for large (100K or more) files. Though the built-in position table is polyvalent, the tuning can increase the compression rate up to one additional percent. (Or even more, if the matching strings distribution is very bizarre!) Usage: statist < sample_file ; you can also see the intermediate values and watch their changes by pressing INTR key when you wish. Note: If you use "gensample | statist", remember that INTR influence BOTH processes !! You may create the /etc/default/freeze (or rather /usr/local/lib/freeze.cnf, which is now the default, NOTE IT!) file (in MS-DOS it is FREEZE.CNF in the directory of FREEZE.EXE), which has the following format: name = ``statist's output (8 numbers)'', e.g.: ---------- cut here ----------- # This is freeze's defaults file gif = 0 0 0 0 2 60 0 0 # This is NOT! optimal data # for GIF files doc=0 0 1 2 7 16 36 0 # The sample was gcc.lp greedy2=0,1,1,2,5,8,11,34 # Both space- and comma-separated # lists are allowed # End of file ---------- cut here ----------- If you find values which are better THAN DEFAULT both for text (C programs) and binary (executable) files, please send them to me. Important note: statist.c is NOT a part of freeze package, it is an additional feature. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! There is yet another additional feature since this version: showhuf - shows Huffman table in a frozen file. It can be used to find files frozen with freeze 1.0, too.