Collisions in the MD5 cryptographic hash function-dorainm-ChinaUnix博客

dorainm's blog

首页　| 　博文目录　| 　关于我

dorainm

博客访问： 1148461
博文数量： 53
博客积分： 10025
博客等级：上将
技术积分： 1640
用户组：普通用户
注册时间： 2007-06-15 17:05

文章分类

全部博文（53）

Database（1）
GNU/shell（9）
GNU/server（3）
HTML/CGI（5）
algorithm（2）
C/C++（1）
GNU/life（20）
GNU/O.S.（11）
BlackBerry（0）
未分配的博文（1）

文章存档

2011年（1）

2010年（3）

2009年（25）

2008年（24）

我的朋友

Exploits

As we will explain below, the algorithm of Wang and Yu can be used to create files of arbitrary length that have identical MD5 hashes, and that differ only in 128 bytes somewhere in the middle of the file. Several people have used this technique to create pairs of interesting files with identical MD5 hashes:

Magnus Daum and Stefan Lucks have created , of which one is a letter of recommendation, and the other is a security clearance.
Eduardo Diaz has described a by which two programs could be packed into two archives with identical MD5 hash. A special "extractor" program turn one archive into a "good" program and the other into an "evil" one.
In 2007, Marc Stevens, Arjen K. Lenstra, and Benne de Weger used an improved version of Wang and Yu's attack known as the method to produce two executable files with the same MD5 hash, but different behaviors. Unlike the old method, where the two files could only differ in a few carefully chosen bits, the chosen prefix method allows two completely arbitrary files to have the same MD5 hash, by appending a few thousand bytes at the end of each file. (Added Jul 27, 2008).

An evil pair of executable programs

The following is an improvement of Diaz's example, which does not need a special extractor. Here are two pairs of executable programs (one pair runs on Windows, one pair on Linux).

Windows version:
- . MD5 Sum: cdc47d670159eef60916ca03a9d4a007
- . MD5 Sum: cdc47d670159eef60916ca03a9d4a007
Linux version (i386):
- . MD5 Sum: da5c61e1edc0f18337e46418e48c1290
- . MD5 Sum: da5c61e1edc0f18337e46418e48c1290

These programs must be run from the console. Here is what happens if you run them:

C:\TEMP> md5sum hello.exe
cdc47d670159eef60916ca03a9d4a007
C:\TEMP> .\hello.exe
Hello, world!

(press enter to quit)
C:\TEMP>

C:\TEMP> md5sum erase.exe
cdc47d670159eef60916ca03a9d4a007
C:\TEMP> .\erase.exe
This program is evil!!!
Erasing hard drive...1Gb...2Gb... just kidding!
Nothing was erased.

(press enter to quit)
C:\TEMP>

How it works

The above files were generated by exploiting two facts: the block structure of the MD5 function, and the fact that Wang and Yu's technique works for an arbitrary initialization vector. To understand what this means, it is useful to have a general idea of how the MD5 function processes its input. This is done by an iteration method known as the Merkle-Damgard method. A given input file is first padded so that its length will be a multiple of 64 bytes. It is then divided into individual 64-byte blocks M₀, M₁, ..., M_n-1. The MD5 hash is computed by computing a sequence of 16-byte states s₀, ..., s_n, according to the rule: s_i+1 = f(s_i, M_i), where f is a certain fixed (and complicated) function. Here, the initial state s₀ is fixed, and is called the initialization vector. The final state s_n is the computed MD5 hash.

The method of Wang and Yu makes it possible, for a given initialization vector s, to find two pairs of blocks M,M' and N,N', such that f(f(s, M), M') = f(f(s, N), N'). It is important that this works for any initialization vector s, and not just for the standard initialization vector s₀.

Combining these observations, it is possible to find pairs of files of arbitrary length, which are identical except for 128 bytes somewhere in the middle of the file, and which have identical MD5 hash. Indeed, let us write the two files as sequences of 64-byte blocks:

M₀, M₁, ..., M_i-1, M_i, M_i+1, M_i+2, ..., M_n,

M₀, M₁, ..., M_i-1, N_i, N_i+1, M_i+2, ..., M_n.

The blocks at the beginning of the files, M₀, ..., M_i-1, can be chosen arbitrarily. Suppose that the internal state of the MD5 hash function after processing these blocks is s_i. Now we can apply Wang and Yu's method to the initialization vector s_i, to find two pairs of blocks M_i, M_i+1 and N_i, N_i+1, such that

s_i+2 = f(f(s_i, M_i), M_i+1) = f(f(s_i, N_i), N_i+1).

This guarantees that the internal state s_i+2 after the i+2st block will be the same for the two files. Finally, the remaining blocks M_i+2, ..., M_n can again be chosen arbitrarily.

So how can we use this technique to produce a pair of programs (or postscript files) that have identical MD5 hash, yet behave in arbitrary different ways? This is simple. All we have to do is write the two programs like this:

Program 1: if (data1 == data1) then { good_program } else { evil_program }
Program 2: if (data2 == data1) then { good_program } else { evil_program }

and arrange things so that "data1" = M_i, M_i+1 and "data2" = N_i, N_i+1 in the above scheme. This can even be done in a compiled program, by first compiling it with dummy values for data1 and data2, and later replacing them with the properly computed values.

Do it yourself: the "evilize" library

Here, you can download the software that I used to create MD5-colliding executable files.

Download: .

This software is based on Patrick Stach's implementation of Wang and Yu's algorithm. You can find his original implementation .

Quick usage instructions:

Note for Windows users: the below instructions are for Unix/Linux. On Windows, you may have to append ".exe" to the names of executable files. Also, to use "make", you must have the GNU tools installed and working.

Unpack the archive and build the library and tools:
```
    tar zxf evilize-0.1.tar.gz
    cd evilize-0.1
    make
```
This creates the programs "evilize", "md5coll", and the object file "goodevil.o".
Create a C program with multiple behaviors. Instead of the usual top-level function main(), write two separate top-level functions main_good() and main_evil(). See the file hello-erase.c for a simple example.
Compile your program and link against goodevil.o. For example:
```
    gcc hello-erase.c goodevil.o -o hello-erase
```
Run the following command to create an initialization vector:
```
    ./evilize hello-erase -i
```
Create an MD5 collision by running the following command (but replace the vector on the command line with the one you found in step 4):
```
    ./md5coll 0x23d3e487 0x3e3ea619 0xc7bdd6fa 0x2d0271e7 > init.txt
```
Note: this step can take several hours.
Create a pair of good and evil programs by running:
```
    ./evilize hello-erase -c init.txt -g good -e evil
```
Here "good" and "evil" are the names of the two programs generated, and "hello-erase" is the name of the program you created in step 3.
NOTE: steps 4-6 can also be done in a single step, as follows:
```
    ./evilize hello-erase -g good -e evil
```
However, I prefer to do the steps separately, since step 5 takes so long.
Check the MD5 checksums of the files "good" and "evil"; they should be the same.
Run the programs "good" and "evil" - they should exhibit the two different behaviors that you programmed in step 2.

阅读(1973) | 评论(0) | 转发(0) |

上一篇：unable to open an initial console解决

下一篇：XTerm配置文件

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6