Chinaunix首页 | 论坛 | 博客
  • 博客访问: 430234
  • 博文数量: 114
  • 博客积分: 3361
  • 博客等级: 中校
  • 技术积分: 1060
  • 用 户 组: 普通用户
  • 注册时间: 2010-05-18 13:14
文章分类

全部博文(114)

文章存档

2012年(1)

2011年(84)

2010年(29)

分类: LINUX

2011-02-21 19:03:19

摘自

GZIP vs BZIP2

GNU zip (also known as GZIP) is a software application with the purpose to compress files. It was originally intended to replace the compress program used in the early systems – to be used in the GNU Project (a free software project).

BZIP2 is an open source lossless compression algorithm – basically, a class of data compression algorithms that makes it possible for the original data of a compressed file to be completely reconstructed from the compressed data.

GZIP is based on an algorithm known as DEFLATE. This is also a lossless data compression algorithm. It uses both the LZ77 algorithm and Huffman coding. Essentially, GZIP refers to the file format of the same name. This format is a 10-byte header which contains a magic number (which means a numerical or text value that never changes and is used to signify a file format or protocol, an unnamed numerical value that never changes, or distinct that cannot be mistaken for anything else), extra headers that may or may not actually be necessary (original file name, for example), a body that contains a DEFLATE -compressed payload (which is the data that the headers carry), and an 8-byte footer which contains a CRC-32 checksum, as well as the actual length of the original uncompressed data.

There are a variety of compression techniques that the BZIP2 format uses, which are stacked atop one another in several layers. They occur in a very distinctive order: Run-length encoding (which is any sequence of four to 255 duplicate symbols that is replaced by the first four symbols, and a length of coding that repeats between 0 and 251), Burrows-Wheeler transform (which is the reversible block-sort that makes up the very core of the BZIP2), Move to front (leaves the size of the processed block unaltered), Run-length encoding (which consists of long strands of symbols – usually zeros – that constantly repeat in the output, and are replaced by both the symbol and a sequence of two codes), Huffman coding (which is a process that replaces fixed length symbols of 8-bit bytes with changing length codes), Multiple Hoffman coding (which consist of multiple Hoffman tables of identical size), Unary base 1 encoding, Delta encoding, and Sparse bit array.

Summary:

1. GZIP is a free application used to compress files; BZIP2 is an open source lossless data compression algorithm that makes it possible to retrieve the original data of a compressed file.

2. GZIP consists of a 10-byte header, optional headers, a body, and an 8-byte footer; BZIP2 consists of no fewer than nine layers of compression techniques.


阅读(588) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~