3.Cachegrind
Cache分析器,它模拟CPU中的一级缓存I1,Dl和二级缓存,能够精确地指出程序中cache的丢失和命中。如果需要,它还能够为我们提供cache丢失次数,内存引用次数,以及每行代码,每个函数,每个模块,整个程序产生的指令数。这对优化程序有很大的帮助。
作一下广告:valgrind自身利用该工具在过去几个月内使性能提高了25%-30%。据早先报道,kde的开发team也对valgrind在提高kde性能方面的帮助表示感谢。
它的使用方法也是:valgrind --tool=cachegrind 程序名,
4.Helgrind
它主要用来检查多线程程序中出现的竞争问题。Helgrind寻找内存中被多个线程访问,而又没有一贯加锁的区域,这些区域往往是线程之间失去同步的地方,而且会导致难以发掘的错误。Helgrind实现了名为“Eraser”的竞争检测算法,并做了进一步改进,减少了报告错误的次数。不过,Helgrind仍然处于实验阶段。
首先举一个竞态的例子吧:
-
#include
-
#include
- #define NLOOP 50
- int counter = 0; /* incremented by threads */
- void *threadfn(void *);
- int main(int argc, char **argv)
- {
- pthread_t tid1, tid2,tid3;
- pthread_create(&tid1, NULL, &threadfn, NULL);
- pthread_create(&tid2, NULL, &threadfn, NULL);
- pthread_create(&tid3, NULL, &threadfn, NULL);
- /* wait for both threads to terminate */
- pthread_join(tid1, NULL);
- pthread_join(tid2, NULL);
- pthread_join(tid3, NULL);
- return 0;
- }
- void *threadfn(void *vptr)
- {
- int i, val;
- for (i = 0; i < NLOOP; i++) {
- val = counter;
- printf("%x: %d \n", (unsigned int)pthread_self(), val+1);
- counter = val+1;
- }
- return NULL;
- }
49c0b70: 1
49c0b70: 2
==4666== Thread #3 was created
==4666== at 0x412E9D8: clone (clone.S:111)
==4666== by 0x40494B5: pthread_create@@GLIBC_2.1 (createthread.c:256)
==4666== by 0x4026E2D: pthread_create_WRK (hg_intercepts.c:257)
==4666== by 0x4026F8B: pthread_create@* (hg_intercepts.c:288)
==4666== by 0x8048524: main (in /home/yanghao/Desktop/testC/testmem/a.out)
==4666==
==4666== Thread #2 was created
==4666== at 0x412E9D8: clone (clone.S:111)
==4666== by 0x40494B5: pthread_create@@GLIBC_2.1 (createthread.c:256)
==4666== by 0x4026E2D: pthread_create_WRK (hg_intercepts.c:257)
==4666== by 0x4026F8B: pthread_create@* (hg_intercepts.c:288)
==4666== by 0x8048500: main (in /home/yanghao/Desktop/testC/testmem/a.out)
==4666==
==4666== Possible data race during read of size 4 at 0x804a028 by thread #3
==4666== at 0x804859C: threadfn (in /home/yanghao/Desktop/testC/testmem/a.out)
==4666== by 0x4026F60: mythread_wrapper (hg_intercepts.c:221)
==4666== by 0x4048E98: start_thread (pthread_create.c:304)
==4666== by 0x412E9ED: clone (clone.S:130)
==4666== This conflicts with a previous write of size 4 by thread #2
==4666== at 0x80485CA: threadfn (in /home/yanghao/Desktop/testC/testmem/a.out)
==4666== by 0x4026F60: mythread_wrapper (hg_intercepts.c:221)
==4666== by 0x4048E98: start_thread (pthread_create.c:304)
==4666== by 0x412E9ED: clone (clone.S:130)
==4666==
==4666== Possible data race during write of size 4 at 0x804a028 by thread #2
==4666== at 0x80485CA: threadfn (in /home/yanghao/Desktop/testC/testmem/a.out)
==4666== by 0x4026F60: mythread_wrapper (hg_intercepts.c:221)
==4666== by 0x4048E98: start_thread (pthread_create.c:304)
==4666== by 0x412E9ED: clone (clone.S:130)
==4666== This conflicts with a previous read of size 4 by thread #3
==4666== at 0x804859C: threadfn (in /home/yanghao/Desktop/testC/testmem/a.out)
==4666== by 0x4026F60: mythread_wrapper (hg_intercepts.c:221)
==4666== by 0x4048E98: start_thread (pthread_create.c:304)
==4666== by 0x412E9ED: clone (clone.S:130)
==4666==
49c0b70: 3
......
55c1b70: 51
==4666==
==4666== For counts of detected and suppressed errors, rerun with: -v
==4666== Use --history-level=approx or =none to gain increased speed, at
==4666== the cost of reduced accuracy of conflicting-access information
==4666== ERROR SUMMARY: 8 errors from 2 contexts (suppressed: 99 from 31)
5. Massif
堆栈分析器,它能测量程序在堆栈中使用了多少内存,告诉我们堆块,堆管理块和栈的大小。Massif能帮助我们减少内存的使用,在带有虚拟内存的现代系统中,它还能够加速我们程序的运行,减少程序停留在交换区中的几率。
Massif对内存的分配和释放做profile。程序开发者通过它可以深入了解程序的内存使用行为,从而对内存使用进行优化。这个功能对C++尤其有用,因为C++有很多隐藏的内存分配和释放。
此外,lackey和nulgrind也会提供。Lackey是小型工具,很少用到;Nulgrind只是为开发者展示如何创建一个工具。我们就不做介绍了。
三 使用Valgrind
Valgrind使用起来非常简单,你甚至不需要重新编译你的程序就可以用它。当然如果要达到最好的效果,获得最准确的信息,还是需要按要求重新编译一下的。比如在使用memcheck的时候,最好关闭优化选项。
valgrind命令的格式如下:
valgrind [valgrind-options] your-prog [your-prog options]
一些常用的选项如下:
选项 |
作用 |
-h --help |
显示帮助信息。 |
--version |
显示valgrind内核的版本,每个工具都有各自的版本。 |
-q --quiet |
安静地运行,只打印错误信息。 |
-v --verbose |
打印更详细的信息。 |
--tool=<toolname> [default: memcheck] |
最常用的选项。运行valgrind中名为toolname的工具。如果省略工具名,默认运行memcheck。 |
--db-attach= |
绑定到调试器上,便于调试错误。 |