使用gprof分析程序性能的瓶颈-gavinx-ChinaUnix博客

当当的天空

首页　| 　博文目录　| 　关于我

gavinx

博客访问： 138795
博文数量： 51
博客积分： 2500
博客等级：少校
技术积分： 540
用户组：普通用户
注册时间： 2007-07-21 12:33

文章分类

全部博文（51）

体育（1）
生活（4）
linux（36）
putty（2）
未分配的博文（8）

文章存档

2011年（1）

2010年（5）

2009年（1）

2008年（12）

2007年（32）

我的朋友

相关博文

使用gprof分析程序性能的瓶颈

分类： LINUX

2010-12-28 12:15:08

使用gprof分析程序性能的瓶颈

gprof可以显示程序运行的“flat profile”，包括每个函数的调用次数，每个函数消耗的处理器时间。也可以显示“调用图”，包括函数的调用关系，每个函数调用花费了多少时间。还可以显示程序中每行代码的执行次数。

为了使用gprof, 需要在编译时添加-pg选项。编译时编译器会自动在目标代码中插入用于性能测试的代码片断，这些代码在程序在运行时采集并记录函数的调用关系和调用次数，以及采集并记录函数自身执行时间和子函数的调用时间，程序运行结束后，会在程序退出的路径下生成一个gmon.out文件。这个文件就是记录并保存下来的监控数据。

下面通过一个例子来说明：
#include
#include
#include
#include

void ouch(int sig)
{
printf("OUCH: - I got signal %d\n", sig);
exit(0);
}

void a(){
    printf("\t\t+---call a() function\n");
}

void c(){
    printf("\t\t+---call c() function\n");
}

int b(){
    printf("\t+--- call b() function\n");
    a();
    c();
    while (1)
    {
    }
    return 0;
}

int main(){
struct sigaction act;
act.sa_handler = ouch;
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
sigaction(SIGBUS, &act, 0);

printf(" main() function()\n");
b();
}

编译以上程序，
gcc -pg -g -o perf perf.c

程序正常运行结束后，会产生gmon.out. 但在某些情况下，因为程序不能正常退出，gmon.out是不能生成的。这时候我们需要添加一个信号处理函数，保证当收到某个信号时，自动调用exit函数，从而产生gmon.out.

在这个例子中，因为存在死循环，程序不能正堂退出，所以添加了信号处理函数。

当perf运行后，用kill -7 perf退出程序。然后运行gprof.

->qdbuild2:./perf
main() function()
        +--- call b() function
                +---call a() function
                +---call c() function

->qdbuild2:ps -ef | grep perf
jx         655 29541 99 12:09 pts/215 00:00:13 ./perf
->qdbuild2:kill -7 655

->qdbuild2:gprof perf gmon.out -l

Flat profile:

Each sample counts as 0.01 seconds.
%   cumulative   self              self     total
time   seconds   seconds    calls Ts/call Ts/call name
50.52      8.26     8.26                             b (perf.c:23 @ 4007ab)
50.52     16.53     8.26                             b (perf.c:26 @ 4007b5)
0.00     16.53     0.00        1     0.00     0.00 a (perf.c:12 @ 400764)
0.00     16.53     0.00        1     0.00     0.00 b (perf.c:20 @ 40078e)
0.00     16.53     0.00        1     0.00     0.00 c (perf.c:16 @ 400779)
    Call graph (explanation follows)

granularity: each sample hit covers 2 byte(s) for 0.06% of 16.53 seconds

index % time    self children    called     name
                0.00    0.00       1/1           b (perf.c:21 @ 400797) [11]
[3]      0.0    0.00    0.00       1         a (perf.c:12 @ 400764) [3]
-----------------------------------------------
                0.00    0.00       1/1           main (perf.c:38 @ 400809) [24]
[4]      0.0    0.00    0.00       1         b (perf.c:20 @ 40078e) [4]
-----------------------------------------------
                0.00    0.00       1/1           b (perf.c:23 @ 4007ab) [1]
[5]      0.0    0.00    0.00       1         c (perf.c:16 @ 400779) [5]
-----------------------------------------------

从gprof的输出我们可以得到，perf.c的23到26行占用大部分时间。然后我们可以对程序进行些优化。

阅读(1951) | 评论(0) | 转发(0) |

上一篇：使用valgrind检查内存泄露

下一篇：英语听力常用词组

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6