The art of software debugging-zhucj-ChinaUnix博客

只叹江湖几人回

首页　| 　博文目录　| 　关于我

zhucj

博客访问： 91569
博文数量： 31
博客积分： 2010
博客等级：大尉
技术积分： 350
用户组：普通用户
注册时间： 2007-10-16 20:38

文章分类

全部博文（31）

数据库技术（4）
五千年的中华史（1）
Linux（1）
硬件平台（1）
操作系统（1）
网络协议（2）
程序和计算（10）
日常点滴（7）
跟踪技术（1）
信息安全（2）
未分配的博文（1）

文章存档

2009年（12）

2008年（19）

我的朋友

最近访客

推荐博文

The art of software debugging

分类：

2008-08-05 07:48:26

The Art of Software Debugging

--御六气之变，乘天地之正，以游无穷

I. Introduction

In this article, we will discuss some details about software debugging. As is known to every one, to be a professional computer programmer, he/she must master the skills of the debugging. Maybe everybody who understands one or more programming languages keeps in his/her mind that the skill of debugging is just a trick which is often used in his/her programming process. However, things are not so simple.

Today, varieties of advanced programming languages emerge like bamboo shoots after a spring rain. This is good news in the field of software engineering, nevertheless, as a student who majors in computer science, I suffer greatly from the encapsulation of codes. For example, when I want to output some variables in my C program, I must call the C library function “ printf ”, then what happens after I call “ printf ”? Are the variables immediately sent to the terminal which I watch on? Or the variables are stored in the terminal buffer and will be printed on the terminal afterwards?

And there are some other questions. When we call “ fprintf ”, we must indicate the I/O device number, such as stdin, stdout, stderr …, then if we call “ fprintf ” like the following code:

……

fprintf(stdout, “Hello, STDOUT\n”);

fprintf(stderr, “Hello, STDERR\n”);

……

Which statement will be first printed on the terminal? Try to think deeply, if we call “ fprintf ” to output some information on the file descriptor STDERR to indicate that our program reaches a wrong state, we, maybe not we, but at lease I , want these information will have a higher priority than these information printed on STDOUT, because there is an emergency in our program when codes which call “ fprintf ” on STDERR are executed. Then what will happen if above codes are executed on some platform, such as Windows or Linux. Try to do it!

So, if we want to know the details about the “ printf ” function family, there often should be two ways. One way for us is to try to find the available source code of the C function library, we are lucky enough today, because GNU project can support this request for us. The other way is useful without any source code available, this is debugging at assembly-language level. We can debug any function which we are interested in. Please remember that the two popular OS platforms, Windows/Linux, they adopt two different assembly-language format, Windows employs the x86 format, the AT&T format is well-known in Linux. And there is another important thing, when we debug the C library functions, “ printf ” as previously mentioned, deeper and deeper, we will find that the assembly-code is not available at some point, this is a very popular phenomena, at this point we need to resort to some kernel-debug tools, such as SoftICE in Windows, KDB in Linux. Kernel debugging is an advanced topic, we will not discuss it in this section, and I’m also a stranger in this field.

Even if you are just an application programmer, sometimes debugging at source code level is not helpful. For example, the sequence of arguments passing is different between the C and Pascal programming languages. And we can see that many functions have prefix “ __fastcall ” in Linux kernel. The prefix “ __fastcall ” means that two arguments of the function will be passed through the registers not through the stack, as we all know, the registers-access speed is faster than the memory-access speed, so prefix “ __fastcall ” is worthy of the name, actually it is a hint to the compiler. However, if we want to debug these hints, we must watch the assembly codes very carefully, there are no other ways working.

II. Available debugging tricks

If someone has been practicing in programming for just a very short time, it is likely that he/she has mastered a few tricks. The most popular tricks, I think, are often included in the following list: 1, insertion of printf statements 2, breakpoints 3, memory/register watching 4, call stack 5, process control, such as debugging step by step.

No matter you know or not, there are some other advanced debugging tricks. If you have experienced in using GDB, a default user-level debugger in Linux, you must have benefited from the “ attach ” utility, by which we can first execute the suspicious program, and then start GDB, afterwards, we just input the command “attach ” to attach the suspicious program. Also, there is another very useful way which is easily neglected. This is the information from the log file, however, this information is not available sometimes, it is often useable when we are coding some kernel programs. The most well-known log messages in Linux, I think, are oops messages, which are often caused by invalid or null pointer (maybe the message displayed in /var/log/messages tells you unable to handle kernel paging request at virtual address 0xXXXXXXX, in essence, this problem is often the result of a invalid pointer pointing to a low address).

These previous mentioned tricks are just guides which are helpful to locate the exact problem-caused instruction, but, in fact these information we gather from the output of debugger or the log files does not directly trace back to origination of the bug. So, the essence of debugging is only how to locate the origination of the bug exactly? However, we regret to say we cannot answer this question. That is to say, debugging is beyond the scope of technology, therefore, we consider debugging a kind of art, something that is unconscious mind in our brain, but we are not able to depict it in our words.

III. Hardware-level support

In the Intel CPU family, from the presence of i386 processor on, IA-32 (Intel Architecture) internally has had eight debug registers Dr0~Dr7. Except that Dr4 and Dr5 are reserved for future use, the others are included in the following list: (1) four address registers of 32-bit length, they are Dr0~Dr3 (2) one control register Dr7 of 32-bit length (3) one status register Dr6 of 32-bit length, but only 16 bits have been used. If some reader wants to know the detailed utility of the debug registers, please read the Intel manual, we do not waste time discussing it here.

The most important thing, we must keep in our mind, is that we can define the HARDWARE breakpoint with the debug registers. What will we benefit from this? The popular trick we often use when we are debugging our program is to insert breakpoints at wherever we are suspicious of, these breakpoints, we call them SOFTWARE breakpoints here, tells compiler that the original code should be replaced by an instruction “ INT 3 ”, and the original code would be stored at the memory which the compiler has previously allocated. Then we start our program with a TRACE-like flag set (that is to say, we start the program under the monitor of a debugger process), when the instruction “ INT 3 ” is executed out program will be interrupted and then an exception handler function will be dispatched to cope with this situation, the most important mission of the exception handler function is to notify the debugger process that there is something abnormal.

By contrast with SOFTWARE breakpoints, when we use HARDWARE breakpoints, the only thing to do is to store the address of one variable or one instruction in one of the address registers Dr0~Dr3 and set some attribute-bits in Dr7 and Dr6, then all the thing is done by the hardware, the compiler does not need to replace some original instruction with instruction “ INT 3 ” and the original instructions are not to be kept in record either. One utility of HARDWARE breakpoints that SOFTWARE breakpoints do not provide is the Data Access Breakpoint, which means that when our program tries to access some variable whose address is kept in the debug address registers, the program will be interrupted. If we want to monitor one shared variable in a multi-process/multi-thread program, this utility is very helpful. Although HARDWARE breakpoints have many advantages, there are also disadvantages existing, for example, there are only four debug address registers in i386 processor, so the resource of registers is very limited.

Now, let us see the utility of the Data Access Breakpoint, this utility is supported by Microsoft VC++, we should follow this order : (1) click the “ EDIT ” menu (2) select the “ Breakpoint ” option, then we face a dialog, (3) select the “ DATA ” tab of the dialog, we can input some statements like “ *((int *)0x004257c0) ” in the Condition-Input-Textbox, with these work done we start to debug the program, if some instructions change the value stored in the address 0x004257c0 the program will be interrupted. To use this utility, we must know the runtime address of a variable first.

The previous mentioned utility is an automated tool assembled in VC++, here we can also see how to manipulate the debug registers manually. The following codes exhibit this trick,

#define _WIN32_WINNT 0x1000

#include

int main(int argc,char * argv[])

{

CONTEXT cxt;

HANDLE hThread = GetCurrentThread();

DWORD dwTestVar = 0;

if(!IsDebuggerPresent())

{

printf("The sample can only run within a debugger!\n");

return E_FAIL;

}

cxt.ContextFlags = CONTEXT_DEBUG_REGISTERS|CONTEXT_FULL;

if(!GetThreadContext(hThread,&cxt))

{

printf("Failed to get thread context!\n");

return E_FAIL;

}

cxt.Dr0 = (DWORD)&dwTestVar;

cxt.Dr7 = 0xF0001;

if(!SetThreadContext(hThread,&cxt))

{

printf("Failed to set Thread context!\n");

return E_FAIL;

}

dwTestVar = 1;

GetThreadContext(hThread,&cxt);

printf("Break into Debugger with DR6 = %x!\n",cxt.Dr6);

return S_OK;

}

Note : The above program of its original-version is coded by 张银奎 who works for Intel in Shanghai, but the codes you see here are little modified. This program works in the Windows platform.

IV. Kernel-level support

The previous section describes SOFTWARE breakpoint, now we have known that if we insert one SOFTWARE breakpoint in our program, the compiler will be told to replace the original instruction with an instruction “ INT 3 ”, but it is not visible, however, when our program is interrupted, how do our program and the debugger process interact with each other?

First, let us look at the data structure task_struct, each represents every different process in Linux System. You can find the detailed information in file /include/linux/sche.h of the Linux Source Tree; we don’t give the information here. We just concentrate on the field PTRACE, type of which is unsigned long.

Please recall that to issue a SOFTWARE breakpoint is just to force the CPU to execute an instruction “ INT 3”, as soon as “ INT 3 ” is executed, CPU is interrupted to execute the appropriate exception handler function. In Linux Kernel, the function is assigned to deal with “ INT 3 ” is “ do_int3 ”, which will later call function “ do_trap ”, and “ do_trap ” will call function “ force_sig_info ” or “ force_sig ” after it has prepared some data structure. At last, the function “ force_sig_info ” or “ force_sig ” will send a signal named SIG_TRAP to the interrupted process afterwards, what it actually done is to set a bit in a field whose type is sigset_t of the task_struct. Then at some point, for example, when the execution flow goes back to the userland from the kernel, the kernel checks whether there are pending signals received by the process, if so, the kernel will call function “ do_signal ” to handle all the signals, no exception with SIG_TRAP. Now, we trace “ do_dignal ” to see what will happen, and it’s enough for us to just concentrate on the source code about the software debugging.

if ((current-> & ) && signr != ) {

          /* Let the debugger run.  */

          current-> = signr;

          current-> = ;

(current, );

();

/* We're back.  Did the debugger cancel the sig?  */

if (!(signr = current->))

        continue;

current-> = 0;

/* The debugger continued.  Ignore SIGSTOP.  */

if (signr == )

        continue;

/* Update the siginfo structure.  Is this good?  */

if (signr != .si_signo) {

        .si_signo = signr;

        .si_errno = 0;

        .si_code = ;

        . = current->p_pptr->;

        . = current->p_pptr->uid;

/* If the (new) signal is now blocked, requeue it.  */

if ((&current->blocked, signr)) {

        (signr, &, current);

        continue;

We can see that the traced process notify its parent process with signal SIGCHLD, function “ notify_parent ” then calls function “ do_notify_parent ”, which do most of work. Let’s see the last two lines of code in function “ do_notify_parent ”:

(sig, &, tsk->p_pptr);

(tsk->p_pptr);

The parameter tsk->p_pptr is a pointer which points the parent process of “ tsk ” process, function “ wake_up_parent ” wakes up the kernel scheduler to schedule the tsk process to make it run. Therefore, the parent process, which is a debugger process, can get the detailed information about the traced process, such as the current process context. This work is very easily done, for that parent process has a pointer within the process descriptor, “ task_struct ” as we all know, which points to the child process descriptor, all we want to know about the child process can be found in the child process descriptor. Furthermore, the debugger process has the choice to send a signal back to the traced process; as a result, the traced process terminates or continues itself according to the signal it receives.

Many programmers have a misunderstanding with the debugging, I am used to, deem that when a debugger process starts to debug another process, all the information displays in front of my face, such as call stack, values of registers and so on, is extracted form the address space of traced process by the debugger process forcibly. Now, we clearly know that in Linux kernel the debugger process do not know the debugging event until the traced process notify the debugger process with a signal SIGCHLD. So, in kernel, debugging facility needs complex interaction of the debugger process and traced process, not just the debugger process forces the traced process to do something.

Here, we do not explain the detailed source code in Linux kernel, because we are discussing software debugging, not the execution flow of kernel source code about any operating system. Linux is only a platform we do something on, we can also get our jobs done on Windows platform, and however, Linux source code is available anywhere as long as we can get connection to Internet so we can download source code form the official website for free.

Do not forget the PTRACE field of the “ task_struct ” previously mentioned, this field’s type is unsigned long, actually, in our discussion, we only see this field is used to indicate two different status of some a process, TRACED or NON-TRACED, so this field can be shrunk to only a bit, but C programming language does not provide this data type.

To support the debugging facility, Linux kernel must cope with a lot of business logic, this directly results in much code added in the kernel, and also there is a PTRACE field in the process descriptor to indicate whether the process is being traced or debugged. Now, you can see that this design in Linux is definitely ingenious. Do you appreciate it?

V. Two advanced topics: Parallel-program debug and Kernel-image debug

If you have written some parallel programs, you may have found that it is very difficult to debug parallel programs. Why? Because the potential for introducing subtle timing faults is very considerable, and if we introduce one, it will take long time to locate it. For example, please look at the following codes: (assume that the C file’s name is threadExample.c)

1#include

2#include

3#include

4#include

5#define NR_THREAD 5

6void * start_routine(void *);

7int main(int argc, char ** argv)

9 pthread_t tid[NR_THREAD];

10 int i = 0, res = 0;

11 for(i = 0; i < NR_THREAD; i++)

12 {

13 res = pthread_create(&tid[i], NULL, start_routine, (void *)&i);

14 //res = res = pthread_create(&tid[i], NULL, start_routine, (void *)i);

15 if(res != 0)

16 {

17 perror("pthread_create error");

18 exit(-1);

19 }

20 }

21 for(i = 0; i < NR_THREAD; i++)

22 {

23 res = pthread_join(tid[i], NULL);

24 if(res != 0)

25 {

26 perror("pthread_join error");

27 exit(-1);

28 }

29 }

30 return 0;

31}

32void * start_routine(void * args)

33{

34 int id = *((int *)args);

35 //int id = (int)args;

36 printf("Thread id is %d\n", id);

37}

Now, you type "./ threadExample " on the command line, then what will happen? I think you expect that all threads created within main-thread would print “ id ” in an ascending sequence. However, it seems that the results always differ from what you expect. But why does this happen? We just do every step properly. Where is the offending line? Maybe the occurrence reminds of you GDB, but, here GDB is not helpful, and you must examine the source code again and again to spot the offending code. So, what a painful travel!

At last, if lucky enough, you locate the bug on Line 13, there every thread is started with an argument, a different pointer pointing the same variable in main-thread’s stack, why does this introduce a subtle bug? Imagine that if main-thread runs very fast, or main-thread is scheduled prior to some of sub-threads, then main-thread might alter local variable “ i ”, and it is imperceptible to all sub-threads, when any sub-thread reference “ i ” it just obtains the modified value. So, if we want to run the above program without any unexpected result, we shall use the commented lines to replace the original ones. This time, we create sub-threads by passing a copying argument onto the stack, so each sub-thread’s argument will not conflict with the one of another.

Besides the timing fault, there is another issue which we often neglect when we are coding parallel programs, that is the memory model of modern processors. We don’t discuss the details about the memory model, here it’s enough to only keep in mind that memory is organized as a hierarchical structure in modern processor, the closer memory is away from the processor, the higher of access speed and the higher of price, vice versa.

If you code your program without careful attention to the memory model, there will be something subtle similar to timing fault. To understand this, please look at the following codes:

#include

#define NR_THREAD 1

void * start_routine(void *);

int globalvar = 0;

int main(int argc, char ** argv)

{

pthread_t tid[NR_THREAD];

int i = 0, res = 0;

for(i = 0; i < NR_THREAD; i++)

{

res = pthread_create(&tid[i], NULL, start_routine, NULL);

if(res != 0)

{

perror("pthread_create error");

exit(-1);

}

printf("Hello GlobalVar!\n");

while(globalvar == 0)

continue;

printf("Goodbye GlobalVar!\n");

for(i = 0; i < NR_THREAD; i++)

{

res = pthread_join(tid[i], NULL);

if(res != 0)

{

perror("pthread_join error");

exit(-1);

}

return 0;

}

void * start_routine(void * args)

{

sleep(3);

globalvar = 1;

}

/*compiling command: gcc –g –O2 threadExample.c –o threadExample*/

What you expect is that the main-thread will exit just as usual, but the fact is the program will exist in system for ever without receiving SIGKILL, SIGTERM… To find out what’s really going on, we must watch its assembly code:

0x08048584 : mov 0x8049834,%eax

0x08048589 : lea 0x0(%esi),%esi

0x08048590 : test %eax,%eax

0x08048592 : je 0x8048590

I only extract the most concentrated lines of code here, and 0x8049834 is the address of global variable “ globalvar ”. We can see that “ globalvar ” is buffered in eax register and every time the main-thread just test the buffered value in eax, not the value in memory, but, actually the value of “ globalvar ” is modified in the sub-thread which is not visible to main-thread. So, what’s the problem? Why does main-thread just test the buffered value not the original value in memory? Please notice that we type the compiling command like that “gcc –g –O2 threadExample.c –o threadExample ”. Oh, here we direct GCC to optimize our program. Therefore, GCC will buffer some common-used variables on the register file, yet, GCC is not intelligent enough to keep track of which variable must be synchronized every time it is accessed, and then main-thread cannot jump out of the loop code. Well, how can we tell GCC which variable should be synchronized every time accessed? It’s very simple, just add a key word “ volatile ” in front of the variable when defined.

Now, game over? No. There is another funny thing that will attract every intensely-curious programmer’s attention. If we type “gcc –g threadExample.c –o threadExample” on the command line, although we do not impose GCC to synchronize “ globalvar ” every time it is accessed, the main-thread can exit just as normal. Again, let’s watch it’s assembly code:

0x08048584 : mov 0x8049824,%eax

0x08048589 : test %eax,%eax

0x0804858b : je 0x8048584

Here, we see that “ globalvar ” is also buffered in eax register, but, every time accessed main-thread reread its value from memory not from eax register, so is just exit without something abnormal. Whereas, if some variable is accessed within two or more threads, it’s a good convention to define it with key word “ volatile ”.

We move on to talk something about Kernel-image debug. What is “ Kernel-image debug ”? Anytime we boot our operating system from a disk, the loader program would eventually load all codes and data which are necessary for running a system from disks to memory, so the running system in memory, including kernel, is just an image of codes and data stored on a disk. “ Kernel-image debug ” is to debug the running kernel in memory. And there are some tools to help us get this done, such as KDB in Linux. When we are working on this job, we must be very careful to guarantee that anything the running kernel relies on is consistent, if not, the whole system will crash in a short time. Here I’m sorry to say that I’m not familiar about how to use KDB, so, we will get back to discuss this after I master this tool.

VI. A sample debugger

In Linux platform, we can use an interface function named “ ptrace ”, which is defined in /usr/include/sys/ptrace.h, in our own program. From manual pages, we know that whenever we want our own program to trace another program (process, accurately speaking) the very thing is just to request the ptrace function and then to verify whether this request can be fulfilled. Of course, we need some other system calls to make our program efficient and robust. Now, let’s show a sample debugger which is implemented with Linux ptrace interface, you can find the source code in the Appendix section.

From these codes, you can see it’s not difficult to design and implement a useful debugger. Oh, sorry, here we must cut in to say something others. In my Linux platform (SUSE 10.1 version), there is a very strange inconsistency between manual pages and glibc header files about ptrace interface. Some requests expressly indicated by manual pages, such as PTRACE_SETOPTIONS and PTRACE_GETEVENTMSG, cannot be found in /usr/include/sys/ptrace.h, the header file which defines the ptrace interface, and this means we are not able to send some kind of request to ptrace function. However, as we dig into Linux kernel, we can found these requests defined in /usr/src/linux-x.x.x/include/linux/ptrace.h and there is also a function prototype like this:

extern int ptrace_request(struct task_struct *child, long request, long addr, long data)

It seems that when we do kernel development we can call this function in replacement of ptrace which is exported to user space. So, why does the manual pages differ from glibc header files? I don’t know. If some reader knows, please give me a piece of mail. Thank you!

VII. Conclusion

Nowadays, we can get varieties of development tools from the Internet, including debugging tools. However, even the most convenient tool, is just assistance to help us to deal with problems more efficiently. Essentially, the answer to the questions, how to keep track of the cause of problem and how to 驾御 ant tool available for us to help us, is still left us, human beings, to seek. On the way of nonstop pursuing the answer, we improve ourselves again and again, and this makes us to be human beings, neither animals nor other things.

VIII. Appendix

/**

* file:trace.c

* author:XXX

* date:2/3/08

* note: Although we define a subfunction named "getMainEntryPoint", actually, the address

* it gets is not the main() function entry point, but some virtul address prior to

* main() function. Futhermore, there are still some bugs to be fixed. However, my free

* is so limited, yet, I really want some reader to fix bugs for me. And you are quarl-

* -ified to modify any lines of code below. If you do, please send email to notify me.

* My email address is zhucj041070075@gmail.com, I'm looking forward to your letters.

#include

#define INPUTLINE 64

void usage();

void command();

void ptraceErrCheck(int res, enum __ptrace_request req);

int getUserRegs(pid_t pid, struct user_regs_struct * regs, int verbose);

void setUserRegs(pid_t pid, struct user_regs_struct * regs);

char ** CreateExecArgv(int argc, char ** argv);

void FreeExecArgv(int argc, char ** argv);

void calAddress(char * comm, unsigned long * bpAddr);

int getMainEntryPoint(FILE * Elf_fp, unsigned long * bpAddr);

int main(int argc, char ** argv)

{

if(argc < 2) {

usage();

return 0;

}

FILE * fp;

int stat_loc; long res; char comm[INPUTLINE];

pid_t pid;

long oldInstruct, newInstruct;

int IsInterrupted = 0; unsigned long bpAddr;//main entrypoint

struct user_regs_struct regs;

memset(®s, 0, sizeof(struct user_regs_struct));

if((fp = fopen(argv[1], "r")) < 0) {

fprintf(stderr, "File not exist!\n");

return -1;

}

else {

int ret = getMainEntryPoint(fp, &bpAddr);//get Main EntryPoint

switch(ret) {

case 0:

fclose(fp);

break;

case -1://file read error

fprintf(stdout, "file read error!\n");

fclose(fp);

return -1;

case -2://data format error

fprintf(stdout, "data format error!\n");

fclose(fp);

return -1;

}

char ** ExecArgv = CreateExecArgv(argc, argv);

if(!ExecArgv) {

fprintf(stderr, "Failed to allocate memowy!\n");

return -1;

}

TRACEHERE:

pid = fork();

if(pid < 0) {

fprintf(stderr, "Fork error!\n");

return -1;

}

else if( pid == 0) { //child

int ret = ptrace(PTRACE_TRACEME, 0, NULL, NULL);

ptraceErrCheck(ret, PTRACE_TRACEME);

execvp(ExecArgv[0] , ExecArgv);

}

else { //parent

res = waitpid(pid, &stat_loc, 0);

fprintf(stdout, "Begin to trace program %s!\n", ExecArgv[0]);

while(1) {

fprintf(stdout, "Command: ");

fgets(comm, INPUTLINE, stdin);

if(comm[0] == 'h') {//help

command();

}

else if(comm[0] == 'b') {//break

IsInterrupted = 1;

calAddress(comm, &bpAddr);

oldInstruct = ptrace(PTRACE_PEEKTEXT, pid, bpAddr, NULL);

ptraceErrCheck(oldInstruct, PTRACE_PEEKTEXT);

/**

*here I'm not sure whether it's always successful to midify the

*instructure code like this, maybe we will be signaled with SIGILL,

*so this is a subtle bug.

newInstruct = 0xcccccccc;

res = ptrace(PTRACE_POKETEXT, pid, bpAddr, newInstruct);

ptraceErrCheck(res, PTRACE_POKETEXT);

res = ptrace(PTRACE_CONT, pid, NULL, NULL);

ptraceErrCheck(res, PTRACE_CONT);

waitpid(pid, &stat_loc, 0);

if(WIFSTOPPED(stat_loc)) {

int signal = WSTOPSIG(stat_loc);

if(signal == SIGTRAP)

fprintf(stdout, "breakpoint at 0x%x \n", bpAddr);

else

fprintf(stdout, "Program %s interrupted by signal %d\n",

ExecArgv[0], signal);

getUserRegs(pid, ®s, 1);

}

else if(comm[0] == 'r') {//run

res = ptrace(PTRACE_CONT, pid, NULL, NULL);

ptraceErrCheck(res, PTRACE_CONT);

res = waitpid(pid, &stat_loc, 0);

fprintf(stdout, "Program %s exit with code %d\n",

ExecArgv[0], WEXITSTATUS(stat_loc));

goto TRACEHERE;

}

else if(comm[0] == 'c') {//continue

if(!IsInterrupted) {

fprintf(stdout, "program %s is not being running!\n", ExecArgv[0]);

continue;

}

IsInterrupted = 0;

res = getUserRegs(pid, ®s, 0);//Get the context of being ptraced process

/**

*x86 instructioin CC(INT 3) cause a TRAP,then eip is increased to

*point to the next instruction.So, here we must decrease eip to

*ensure it points to the trap-caused instruction.

if(res) {

continue;

}

regs.eip--;

setUserRegs(pid, ®s);//set back context to the being ptraced process

res = ptrace(PTRACE_POKETEXT, pid, bpAddr, oldInstruct);

ptraceErrCheck(res, PTRACE_POKETEXT);

res = ptrace(PTRACE_CONT, pid, NULL, NULL);

ptraceErrCheck(res, PTRACE_CONT);

waitpid(pid, &stat_loc, 0);

if(WIFEXITED(stat_loc)) {

fprintf(stdout, "Program %s exit with code %d\n",

ExecArgv[0], WEXITSTATUS(stat_loc));

}

else if(WIFSTOPPED(stat_loc)) {

fprintf(stdout, "Program %s interrupted by signal %d\n",

ExecArgv[0], WSTOPSIG(stat_loc));

}

goto TRACEHERE;

}

else if(comm[0] == 'k') {//kill

res = ptrace(PTRACE_KILL, pid, NULL, NULL);

ptraceErrCheck(res, PTRACE_KILL);

fprintf(stdout, "program %s terminated!\n", ExecArgv[0]);

goto TRACEHERE;

}

else if(comm[0] == 'q') {//quit

res = ptrace(PTRACE_KILL, pid, NULL, NULL);

ptraceErrCheck(res, PTRACE_KILL);

fprintf(stdout, "Tracer quit!\n");

break;

}

else {

fprintf(stderr, "Unknown Command!\n");

}

FreeExecArgv(argc, ExecArgv);

return 0;

}

void usage()

{

fprintf(stdout, " usage: trace [filename] [parameters]\n");

}

void command()

{

fprintf(stdout, " command usage: b(break) *addr\n"

" : c(continue) \n"

" : r(run) \n"

" : h(help) \n"

" : k(kill) \n"

" : q(quit) \n");

}

void ptraceErrCheck(int res, enum __ptrace_request req)

{

if(res < 0 && errno != 0) {

perror("PTRACE error:");

switch(req) {

case PTRACE_KILL:

return;

default:

exit(-1);

}

int getUserRegs(pid_t pid, struct user_regs_struct * regs, int verbose)

{

long res = ptrace(PTRACE_GETREGS, pid, NULL, (void *)regs);

if(res < 0 && errno != 0) {

perror("PTRACE error:");

return -1;

}

if(verbose) {

fprintf(stdout, "registers infomation:\n");

fprintf(stdout, " eax: 0x%x\n", regs->eax);

fprintf(stdout, " ecx: 0x%x\n", regs->ecx);

fprintf(stdout, " edx: 0x%x\n", regs->edx);

fprintf(stdout, " ebx: 0x%x\n", regs->ebx);

fprintf(stdout, " esp: 0x%x\n", regs->esp);

fprintf(stdout, " ebp: 0x%x\n", regs->ebp);

fprintf(stdout, " esi: 0x%x\n", regs->esi);

fprintf(stdout, " edi: 0x%x\n", regs->edi);

fprintf(stdout, " eip: 0x%x"

"<--Here eip points to the next instruction\n", regs->eip);

fprintf(stdout, " eflags: 0x%x\n", regs->eflags);

fprintf(stdout, " cs: 0x%x\n", regs->cs);

fprintf(stdout, " ss: 0x%x\n", regs->ss);

fprintf(stdout, " ds: 0x%x\n", regs->ds);

fprintf(stdout, " es: 0x%x\n", regs->es);

fprintf(stdout, " fs: 0x%x\n", regs->fs);

fprintf(stdout, " gs: 0x%x\n", regs->gs);

}

return 0;

}

void setUserRegs(pid_t pid, struct user_regs_struct * regs)

{

long res = ptrace(PTRACE_SETREGS, pid, NULL, (void *)regs);

if(res < 0 && errno != 0) {

perror("PTRACE error:");

exit(-1);

}

void calAddress(char * comm, unsigned long * bpAddr)

{

//example:comm = b 0xffffffff

//we must strip all the prefix

//here we do not check the validity of Address

char * index = comm;

while(*index != '\0') {

if(!strncmp(index, "0x", 2)){

*bpAddr = (unsigned long)strtol(index + 2, NULL, 16);

printf("%x\n", *bpAddr);

return;

}

index++;

}

int getMainEntryPoint(FILE * Elf_fp, unsigned long * bpAddr)

{

Elf32_Ehdr elf_header;

if(fread(&elf_header, sizeof(Elf32_Ehdr), 1, Elf_fp) != 1)

return -1;//file read error

unsigned char * field = (unsigned char *)&elf_header.e_entry;

switch(elf_header.e_ident[EI_DATA]) {

case ELFDATA2LSB:

* bpAddr = ((unsigned long)(field[0]))

| (((unsigned long)(field[1])) << 8)

| (((unsigned long)(field[2])) << 16)

| (((unsigned long)(field[3])) << 24);

return 0;//success

case ELFDATA2MSB:

* bpAddr = ((unsigned long)(field[3]))

| (((unsigned long)(field[2])) << 8)

| (((unsigned long)(field[1])) << 16)

| (((unsigned long)(field[0])) << 24);

return 0;//success

default:

fprintf(stderr, "Unknown data format!\n");

return -2;//data format error

}

char ** CreateExecArgv(int argc, char ** argv)

{

char ** execArgv = (char **)malloc(sizeof(char *) * argc);

if(!execArgv) {

return NULL;

}

int i, size;char * index;

for(i = 1; i < argc; ++i) {

size = sizeof(argv[i]) + 1;

if(i == 1)

size += 2;

execArgv[i - 1] = (char *)malloc(sizeof(char) * size);

if(!execArgv[i - 1]) {

return NULL;

}

index = execArgv[i - 1];

if(i == 1) {

if(argv[i][0] != '/' && argv[i][0] != '.') {

strcpy(execArgv[i - 1], "./");

index = execArgv[i - 1] + 2;

}

strcpy(index, argv[i]);

}

execArgv[argc - 1] = (char *)0;

return execArgv;

}

void FreeExecArgv(int argc, char ** argv)

{

int i;

for(i = 0; i < argc; ++i) {

if(!argv[i])

free(argv[i]);

}

free(argv);

}

阅读(942) | 评论(0) | 转发(0) |

上一篇：程序与有穷状态机

下一篇：读了毛文波博士有关Cloud Computing的几篇blog，略有所思

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6