Chinaunix首页 | 论坛 | 博客
  • 博客访问: 355241
  • 博文数量: 83
  • 博客积分: 5322
  • 博客等级: 中校
  • 技术积分: 1057
  • 用 户 组: 普通用户
  • 注册时间: 2010-04-11 11:27
个人简介

爱生活,爱阅读

文章分类

全部博文(83)

文章存档

2015年(1)

2013年(1)

2012年(80)

2011年(1)

分类: LINUX

2012-09-10 20:27:28

Using GNU's GDB Debugger

Debugging With Your Brain

By Peter Jay Salzman

Previous: Memory Layout And The Stack

Next: Initialization, Listing, And Running

继续之前请阅读之

As of SDL 1.2.11, it appears that no longer generates SIGFPE when passed SDL_OPENGL. This means you can use GDB to debug spinning_cube. However, this is still an excellent example of:

自从SDL 1.2.11开始,当向函数SDL_SetVideoMode()传递参数SDL_OPENGL时,它不再生成SIGFPE。这意味着你可以使用GDB来调试spinning_cube。然而,这依然是一个优秀的例子:

1.   How to debug with your brain.

如何使用大脑来调试。

2.   Why knowing theory, like the memory layout of a program, can be helpful when debugging.

为什么懂得像程序的内存布局这样的理论,能够在调试的时候很有用处。

使用大脑调试

In the last section we looked at how a program is laid out in memory. Knowing this is not only useful for debugging with GDB, but it's also useful for debugging without GDB. In this interlude, guest written by my close friend, , we'll see how.

在上一部分,我们查看了程序如何在内存中布局。知道这些不仅对于使用GDB调试有用,而且对于没有采用GDB的调试也会很有帮助。这这个,由我最好的朋友Mark Kim写的特别精彩的插曲部分,我们将印证之。

Compile and run spinning_cube.tar.bz2. A spinning cube is displayed with images of Geordi (white) and Juliette (calico), me on a New York City subway, and where I work.

编译并运行spinning_cube.tar.bz2。旋转的方块是在纽约地铁上通过Geordi(白色)和Juliette()显示的,而纽约也是我工作的地方。

However, when you press a key, some of the cube's textures mysteriously vanish. My first instinct was to use GDB to find the problem, but I discovered that SDL programs that use OpenGL can't be debugged via GDB. Upon investigation, I found that when you pass the flag  to the function , a SIGFPE is generated which terminates the program. If you try to handle the SIGFPE, you'll find that SDL_SetVideoMode() never returns, so GDB is left in a hung state.

然而,当你按下键盘,一些方块的纹路神秘的消失了。我的第一本能是使用GDB来定位出问题,但是我发现采用OpenGLSDL程序不能通过GDB调试。通过研究我发现,当你向函数SDL_SetVideoMode()传递标记SDL_OPENGL时,将会产生终止程序的SIGFPE信号。如果你试图处理SIGFPE,你将会发现函数SDL_SetVideoMode()永远不会返回,所以GDB就处于悬空(hung)的状态。

I had just spent over 40 hours programming over the last 3 days and was getting punch-drunk. Not having GDB available pushed me over the edge and I sent an exasperated email to Mark for help. I got a reply within 10 minutes.

在最后的3天中,我投入了40个小时用于编程,整个人被弄得头晕眼花。不能使用GDB进行调试把我推向了崩溃的边缘,于是我向Mark发送了电子邮件以请求帮助。我在10分钟内收到了答复。

Before continuing you'll want to:

在继续之前,你想要:

1.   Run the program to see the bug in action. You need OpenGL and SDL to compile the program.

运行该程序并实际查看这个缺陷。你需要OpenGLSDL来编译这个程序。

2.   Look at HandleKeyPress() in input.c, which handles keystrokes.

查看input.c中的函数HandleKeyPress(),它负责处理按键。

3.   Look at Debug(), in yerror.h, which is called from HandleKeyPress().

查看yerror.h中的函数Debug(),它由HandleKeyPress()调用。

Spend 10 minutes trying to fix the bug. This will make Mark's email all the more impressive. As you read Mark's email, pay particular attention to steps 6, 7B, and 7C for particular examples of sheer debugging brilliance!

花费10分钟时间尝试修复该缺陷。这将使得Mark的电子邮件印象更加深刻。当你阅读Mark的电子邮件时,特别注意第6,7B7C步骤,这绝对是调试技能中的优秀榜样。

Hey Peter,

你好,彼得,

 

The problem was there was an overlapping memory area between the debugging

variables and the texture variabes.  In video.[hc], the "texture[2]" array

should have been declared "texture[NUM_TEXURES]" instead.  Attached is a

patch file.

该问题是由于调试变量与代码段内容间的内存越界造成的。在文件video.[hc]

中,数组“texture[2]”应该声明为“texture[NUM_TEXURES]”。附件为补丁文件。

The debugging process went like this:

调试过程是这样的:

 

1.   Try Debug() -- indeed it makes some textures disappear.

试验Debug()—它的确会使得一些纹路消失。

 

   2. Try debug_for_reals() into an empty function -- same happens,

      so that's not the problem.

将函数debug_for_reals()设置为空函数--,同样的现象发生,所以,这不是问题所在。

 

   3. Try removing each line of Debug() macro.  This revealed that

      writing values into the "die_*" variables cause the texture

      to disappear.

尝试移除Debug()宏中的每一行。这揭示了向“die_*”变量写入数值造成了纹路消失。

 

   4. So instead of calling Debug(), try writing some values into

      the "die_*" variables -- the textures disappear again.

因而,代替调用Debug(),尝试向“die_*”变量写入一些值纹路又消失了。

 

   5. Check if any other code is using those variables by changing

      variable names and looking out for compilation errors --

      nothing significant showed up.

检查其它代码中是否有通过改变变量名的方式使用这些变量,并查看编译错误没有明显信息。

 

   6. Perhaps someone is using the same memory space as the "die_*"

      variables unintentionally.  I tried shifting the memory locations of    the "die_*" variables down by putting an array in front of them,

      like this:

可能是有人无意中使用了“die_*”变量相同的内存空间。我试图通过在其之前放置一个数组,来偏移“die_*”变量的内存位置,像下面这样:

 

       yerror.c:

         ...

         #include "yerror.h"

 

       + char buffer[1024];

 

         // Global Debugging/Dying Variables

         const char *die_filename;

         const char *die_function;

         int        die_line;

         bool       debug = true;

      which fixed the problem.  So now it's a matter of finding the

      overlapping memory.

而这解决了该问题。所以现在问题的关键是找出越界的内存。

 

   7. Tracking down the problem needs some narrowing down of the

      possiblilities, so I made the following assumptions:

追踪该问题需要逐步缩小出现问题的可能性,所以,我做了如下假设:

 

      A. I know a problem like this occurs most often when an array

         size is declared too short at another place, so there's probably

         an array out there that's declared too short, and the "die_*"

         variables, placed in memory right after that array, is probably

         getting overwritten by some code expecting the array to be

         longer.

我知道,出现这样的问题经常是一个数组的大小在其它地方声明得太小造成的,所以很可能有一个声明过小的数组,而在内存中, “die_*”的内存位于该数组之后。程序中很可能需要一个更大的数组,于是,该处代码重写了“die_*”变量对应的内存。

 

         It could also be a pointer combined with malloc() but at this

         point I'm just thinking about one problem at a time.

也可能是一个与malloc()结合的指针,但在这里,我们仅仅考虑一个问题。

 

      B. The problem must be with either a global or static variable

         since it's overlapping with another global variable

         in the heap space.  So I'm looking for an array declared in

         global or static scope.  That narrows down my search quite a bit.

该问题必然与一个全局变量或者静态变量有关,因为它重写了堆中另外一个全局变量。【堆中?】所以,我查找了声明为全局或者静态范围的数组。这大大缩小了我的搜查范围。

         BTW, the fact that I'm looking for a variable that overlaps with

         a global variable probably discounts malloc() from our potential

         list of problems since malloc(), if the way I view the memory is

         correct, should allocate memory only *after* all global

         variables, and it's unlikely code accidentally writes to

         a memory location before a pointer rather than an after

         (though it's certainly possible to write to memory before

         a pointer.)  But again, this is all an afterthought... I'm just

         thinking about another global array at this point.

顺便说一句,事实上,我正在查找重写全局变量的变量,但将malloc()是造成我们问题的可能性大打折扣。因为,如果我查看的内存是正确的,那么malloc()分配的内存应该在所有的全局变量之后,代码不大可能在一个指针之前写入一块内存,而不是在指针之后(尽管可以在指针之前写入内存)。但这只是一个事后的想法。。。此时,我正在思考另一个全局的数组。

 

      C. I know the global array I'm looking for must be somehow linked

         to a texture operation since that's what's being interfered by

         writing to the "die_*" variables.  So I'm looking for a global

         array that does something with textures, probably one that stores

         textures or pointers to textures or index to textures or

         something like that.

我知道我正在查找的全局数组必然与纹路操作(texture operation)相关,因为这正是通过向“die_*”变量写入造成了干扰。所以我正在查找与纹路(texture)相关的全局数组,可能是保存了纹路数组,或指向纹路的指针,或者纹路的索引,或者像这样的变量。

 

   8. And that's what I looked for.  texture[2] looked a little suspicious

      so I tried expanding its size and that fixed the problem.  Just to

      make sure, I looked for the code that writes to texture with index

      greater than 1 and found init.c:127 and several places in render.c.

这就是我找到的。Texture[2]看起来有点可疑,于是,我尝试扩展它的大小就解决了这个问题。为了确认这一点,在文件init.c 127行和文件render.c中的几个地方,我发现了通过大于1的索引向texture进行写入的代码。

Hope that helps!

希望能有所帮助!

-Mark

-马克

 

 

 

阅读(945) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~