Chinaunix首页 | 论坛 | 博客
  • 博客访问: 3298455
  • 博文数量: 258
  • 博客积分: 9440
  • 博客等级: 少将
  • 技术积分: 6998
  • 用 户 组: 普通用户
  • 注册时间: 2009-05-03 10:28
个人简介

-- linux爱好者,业余时间热衷于分析linux内核源码 -- 目前主要研究云计算和虚拟化相关的技术,主要包括libvirt/qemu,openstack,opennebula架构和源码分析。 -- 第五届云计算大会演讲嘉宾 微博:@Marshal-Liu

文章分类

全部博文(258)

文章存档

2016年(1)

2015年(4)

2014年(16)

2013年(22)

2012年(41)

2011年(59)

2010年(40)

2009年(75)

分类: LINUX

2010-03-26 21:31:48

  • (approx. 4K) (Just fixed so that it can be used with ocaml-cil package from Fedora 9).
  • (approx. 4K)

Introduction

is a library for doing static analysis on large C projects. It is interesting because unlike other tools around, it has actually been applied to large projects in the real world (such as the Linux kernel), it supports most gcc extensions, C99, and many subtleties of the C language. Another thing which makes it interesting is that it is distributed under a free license (specifically, BSD).

In this document I describe how I used it to analyze .

Static analysis

Static analysis is the process of loading the source code for a program into memory, then analyzing or transforming it in various ways. The most common programs around which do static analysis are compilers: they both analyze the program for warnings, and transform it (optimization, conversion to machine code).

Compilers such as gcc nowadays do sophisticated analyses on programs -- reachability, dataflow, pointer aliasing, and so on. However they are not customizable. You can't stop the compiler during its analysis phase and add your own constraints. Some of the things you might want to add are:

  • Check that my functions release any memory they allocate, even on error paths.
  • Check that locks are acquired in the same order, and locks are released on error paths.
  • Check that functions A() and B() are always called in that order, and never B(); A();.
  • Check that I always set errno whenever I return an error indication from a library function.

Static analysis versus testing

Static analysis is concerned with the program at compile time. Testing observes the program's behaviour at runtime. As such static analysis complements testing, because each can find bugs in the code that the other misses.

Writing static analysis rules

Because the rules you want to apply are likely to be quite specific to your C program or library, you will necessarily end up having to write up these rules in some sort of programming language. For the Stanford Checker, rules were written in a special language called "MC". For CIL you have to write them up in .

C++

CIL does not support C++. There are other solutions which work for C++, including / .

However ...

Static analysis of C code is useful, but if you are thinking of starting a new project, choose a sensible language! Don't start new projects in C or C++, or dynamic languages like Perl and Python which leave errors in code for customers to find.

Coverity & the Stanford Checker

A leading commercial (and very proprietary) tool for static analysis of C programs is , which started life as the Stanford Checker. Early in 2001 postings started to appear on LKML detailing bugs in the Linux kernel which had been found through static analysis (for example: and ).

About CIL

It is highly recommended that you read the first few chapters of the CIL documentation.

CIL reads C source code after it has gone through the C preprocessor. It performs some transformations on the code to make analysis simpler, and it turns the C source code into memory structures which describe the name, type and contents of every variable and function definition (and everything else) in the program.

CIL provides a drop-in compatible replacement for gcc called cilly which you can use in Makefiles (eg. make CC=cilly). It also provides library functions that you can use to import preprocessed C files and iterate over their structural contents.

CIL provides libraries for doing common transformations and analyses on programs. Some of the more important ones are:

  • control flow graph (CFG) analysis
  • data flow analysis
  • points-to pointer analysis
  • reachability analysis
  • availability analysis
  • deadcode analysis
  • liveness analysis
  • simplification to three-address code

Note two things: Firstly compilers like GCC already do these sorts of transformations and analyses, however not in a way that you can use or extend. Secondly these are only preludes to writing your own program-specific analysis programs. Typically you will use the library to load the source and do at least CFG analysis on it, before you start to do your own analysis.

Loading a test program into CIL

For working code, please refer to the section at the top of this document.

Before embarking on libvirt, I wrote a few lines of test code in C, split across a couple of files, and loaded those into CIL using the library functions. I preprocessed them using /bin/cpp first, as required by CIL. The OCaml code to load the C code is just:

let files = [ "test1.i"; "main.i" ]
let files = List.map (fun filename -> (Frontc.parse filename) ()) files

The two files are then merged together in memory (kind of like linking), and unused prototypes and definitions are removed. The unused stuff mainly comes from libc header files.

let file = Mergecil.merge files "test"
let () = Rmtmps.removeUnusedTemps file

Finally I call the CIL function to perform control flow graph (CFG) analysis which will later allow me to produce some pretty pictures of the functions:

let () = Cfg.computeFileCFG file

Now as a test I wrote some analysis code. This code simply iterates over the top level expressions in the program, finds function definitions, and prints out the control flow graphs of the functions as GraphViz dot-files. It ignores anything which isn't a function definition:

  List.iter (
function
| GType _ -> () (* typedef *)
| GCompTag _ -> () (* struct/union *)
| GCompTagDecl _ -> () (* forward prototype of struct/union *)
| GEnumTag _ -> () (* enum *)
| GEnumTagDecl _ -> () (* forward prototype of enum *)
| GVarDecl _ -> () (* variable/function prototype *)
| GVar _ -> () (* variable definition *)
| GFun (fundec, loc) -> (* function definition *)

printf "%s:%d: %s has %d arg(s)\n"
loc.file loc.line fundec.svar.vname (List.length fundec.sformals);

(* Write control flow graph to a GraphViz dot-file: *)
Cfg.printCfgFilename (fundec.svar.vname ^ ".dot") fundec

| GAsm _ -> () (* global asm statement *)
| GPragma _ -> () (* toplevel #pragma *)
| GText _ -> () (* verbatim text, comments *)
) file.globals;

When you run this program (make run-ciltest1), you see:

test1.c:11: test1_add has 2 arg(s)
test1.c:22: test1_inc has 1 arg(s)
test1.c:36: test1_print has 1 arg(s)
test1.c:48: test1_last_error has 0 arg(s)
test1.c:56: test1_print_last_error has 0 arg(s)
main.c:10: main has 0 arg(s)

and here are some of the GraphViz dot-files rendered next to their corresponding functions:

Control flow graph Function
int
test1_inc (int a)
{
if (a >= 0)
return a+1;
else
return -1;
}
void
test1_print_last_error (void)
{
const char *err = last_err;
last_err = NULL;
if (err)
fprintf (stderr, "test1: %s\n", err);
}
int
test1_add (int a, int b)
{
if (a >= 0 && b >= 0)
return a+b;
else {
last_err = "Negative value";
return -1;
}
}

In the corresponding graph for this function we see two if nodes. This is because CIL has simplified the if (...&&...) expression into two if statements, to make automatic analysis easier. The code is equivalent to the original, and C compilers commonly perform the same step.

Loading libvirt into CIL

For working code, please refer to the section at the top of this document.

Before we can perform any analysis, we first have to load the libvirt C code into CIL. I found it simpler to avoid the cilly drop-in replacement for gcc, and instead just compile libvirt like this:

make CC="gcc "

This builds libvirt in the ordinary way, but leaves around the preprocessed C files (*.i), and these can be directly loaded into CIL using the library functions.

There are some fragments of shell script in the Makefile which are just there to build libvirt with the special CC variable, find the list of *.i files, and get the list of exported symbols (the entry points into libvirt). Interested readers can consult the Makefile rule run-cilimportvirt for details.

In any case, once these housekeeping tasks are done, the whole of libvirt was loaded into CIL, barring one problem: CIL doesn't yet implement , so I had to modify libvirt's build to add -D_Bool=unsigned. Still, a single change isn't bad considering that libvirt is a large body of code written by a whole variety of authors which uses C99 and gcc extensions throughout.

Control flow graphs of real functions in libvirt are considerably more complicated than my test functions. For example on the right is the CFG of function xenHypervisorMakeCapabilitiesXML. The Postscript file produced by GraphViz actually crashes most file viewers.

Simple analysis — extract exported function types

As a simple example of a static analysis, I used CIL along with my list of exported function names to extract the argument type(s), return type and point of definition of each exported function. The output looks like this:

virterror.c:70: virGetLastError exported function with 0 arg(s)
virErrorPtr (void)
virterror.c:88: virCopyLastError exported function with 1 arg(s)
int (virErrorPtr to )
virterror.c:105: virResetError exported function with 1 arg(s)
void (virErrorPtr err )
virterror.c:126: virResetLastError exported function with 0 arg(s)
void (void)

[...]

libvirt.c:56: virInitialize exported function with 0 arg(s)
int (void)
libvirt.c:336: virGetVersion exported function with 3 arg(s)
int (unsigned long *libVer , char const *type , unsigned long *typeVer )
libvirt.c:443: virConnectOpen exported function with 1 arg(s)
virConnectPtr (char const *name )
libvirt.c:461: virConnectOpenReadOnly exported function with 1 arg(s)
virConnectPtr (char const *name )
libvirt.c:478: virConnectClose exported function with 1 arg(s)
int (virConnectPtr conn )
libvirt.c:504: virConnectGetType exported function with 1 arg(s)
char const *(virConnectPtr conn )
libvirt.c:534: virConnectGetVersion exported function with 2 arg(s)
int (virConnectPtr conn , unsigned long *hvVer )
libvirt.c:566: virConnectGetHostname exported function with 1 arg(s)
char *(virConnectPtr conn )

阅读(2768) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~