1. k-gram Based Software Birthmarks ---- by Ginger Myles Christian Collberg, 2005.
They uniquely identifies a program through instruction sequences.
For an effective birthmarking technique it is highly likely that two programs, or program parts, p and q, are copies if they both have the same birthmark.
They propose the use of opcode-level k-grams as a software birthmarking technique. This technique computes the set
of unique opcode sequences of length k for a set of modules.
k-grams have been previously used to detect similarity between documents and programs at the source code level(e.g. Moss), but not at the opcode-level.
A hash of each k-gram is then computed and a subset of hashes is selected as the document fingerprint.
In addition, these systems do not consider semantics-preserving transformations(e.g. instructions could be reordered or
bogus code could be added that is never actually executed) and the effects of decompilation on the formatting of the source code. For example, it was shown by Collberg, et al. that given the source code of a Java application, simply compiling then decompiling will cause Moss to indicate 0% similarity between the original and the decompiled source code.
A k-gram is a contiguous substring of length k which can be comprised of letters, words, or in our case opcodes. The
k-gram birthmark is based on static analysis of the executable program. For each method in a module we compute the set of unique k-grams by sliding a window of length k over the static instruction sequence as it is laid out in the
executable.
The birthmark for the module is the union of the birthmarks of each method in the module. The order of the kgrams within the set is unimportant as is the frequency of occurrence of each k-gram. By using the unique k-grams without their associated frequency the birthmark is less susceptible to semantics-preserving transformations. For example, an obfuscation which duplicates basic blocks will increase the frequency of those k-grams in the block. Additionally,
because the birthmark is independent of the order of the methods in the module or the modules within the program,
the technique can be used at the module or program level.
2. DroidMoss ---- by Xuxian Jiang
only use opcode, because operands are easy for repackagers to modify or rename.
remove ad SDK libraries, because they serve as noise to feature extraction.
3. Detecting Theft of Java Application via a Static Birthmark Based on Weighted Stack Patterns ---- Hyunil LIM
They statically identify the stack patterns by analyzing the Java bytecodes stored in a class file. The similarity between two class files is calculated by matching the set of stack patterns and weight values which balance the effect of each bytecode.
The java bytecodes use the operand stack as a workspace, and the bytecodes share the operand with each other through the operand stack.
Because the interdependence of the bytecodes must be retained to preserve the sematics, a good way of designing a birthmark is to use a sequence of bytecodes, which share their operands through the operand stack.
Every bytecode in a program has its own unique stack status during a runtime execution and the status can be calculated at static time. Hence, the stack status of each bytecode in a class file means the stack depth after the bytecode has been executed.
4. Detecting Java Theft Based on Static API Trace Birthmark ---- by Heewan Park
They propose a static API trace birthmark for Java programs. The static API trace birthmark is the set of all possible run-time API traces of each Java method. Unlike existing birthmarks, the static API trace birthmark does not simply
extract adjacent API sequences but analyzes the control flow of methods and generates the possible run-time API traces.
5. Detecting Code Theft via a Static Instruction Trace Birthmark for Java Methods ---- by Heewan Park
同样是使用Static Instruction Trace, 不过还可以检测算法雷同, I don't need it.
6. Detecting Common Modules in Java Packages Based on Static Object Trace Birthmark ---- Heewan Park
同样的作者, 又用object instruction来检测了, 韩国人科研真疯狂, 同样的东西做了5年了……
7. Analyzing Stack Flows to Compare Java Programs ---- Hyunil LIM
too unpractical
对抗方法:
1. Eavde Behavior-Based ---- by Zhi Xin
2. Polymorphic Attacks against Sequence-based Software Birthmarks
阅读(1201) | 评论(0) | 转发(0) |