Inside ELF Symbol Tables-bigluo-ChinaUnix博客

Linux is Powerbigluo.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

bigluo

博客访问： 1427375
博文数量： 343
博客积分： 13098
博客等级：上将
技术积分： 2862
用户组：普通用户
注册时间： 2005-07-06 00:35

文章分类

全部博文（343）

Web Development（2）
Python & Perl（35）
Operating System（8）
Visualization Te（106）
Miscellaneous（10）
Google Android（15）
Motorola EzX（6）
Linux Memory Mgm（10）
Embedded Develop（31）

Embedded Toolcha（5）

Embedded Linux O（9）

Embedded Java（0）

Embedded Hardwar（3）

Embedded Databas（2）

Embedded Browser（0）

Embedded UI Fram（9）

Embedded Multime（3）
C++ Programming（36）
Linux System Adm（76）
Secure Programmi（5）
未分配的博文（3）

文章存档

2012年（131）

2011年（31）

2010年（53）

2009年（23）

2008年（62）

2007年（2）

2006年（36）

2005年（5）

我的朋友

.symtab and .dynsym

Sharable objects and dynamic executables usually have 2 distinct symbol tables, one named ".symtab", and the other ".dynsym". (To make this easier to read, I am going to refer to these without the quotes or leading dot from here on.)

The dynsym is a smaller version of the symtab that only contains global symbols. The information found in the dynsym is therefore also found in the symtab, while the reverse is not necessarily true. You are almost certainly wondering why we complicate the world with two symbol tables. Won't one table do? Yes, it would, but at the cost of using more memory than necessary in the running process.

To understand how this works, we need to understand the difference between allocable and a non-allocable ELF sections. ELF files contain some sections (e.g. code and data) needed at runtime by the process that uses them. These sections are marked as being allocable. There are many other sections that are needed by linkers, debuggers, and other such tools, but which are not needed by the running program. These are said to be non-allocable. When a linker builds an ELF file, it gathers all of the allocable sections together in one part of the file, and all of the non-allocable sections are placed elsewhere. When the operating system loads the resulting file, only the allocable part is mapped into memory. The non-allocable part remains in the file, but is not visible in memory. can be used to remove certain non-allocable sections from a file. This reduces file size by throwing away information. The program is still runnable, but debuggers may be hampered in their ability to tell you what the program is doing.

The full symbol table contains a large amount of data needed to link or debug our files, but not needed at runtime. In fact, in the days before sharable libraries and dynamic linking, none of it was needed at runtime. There was a single, non-allocable symbol table (reasonably named "symtab"). When dynamic linking was added to the system, the original designers faced a choice: Make the symtab allocable, or provide a second smaller allocable copy. The symbols needed at runtime are a small subset of the total, so a second symbol table saves virtual memory in the running process. This is an important consideration. Hence, a second symbol table was invented for dynamic linking, and consequently named "dynsym".

And so, we have two symbol tables. The symtab contains everything, but it is non-allocable, can be stripped, and has no runtime cost. The dynsym is allocable, and contains the symbols needed to support runtime operation. This division has served us well over the years.

Types Of Symbols

Given how long symbols have been around, there are surprisingly few types:

STT_NOTYPE
Used when we don't know what a symbol is, or to indicate the absence of a symbol.

STT_OBJECT / STT_COMMON
These are both used to represent data. (The word OBJECT in this context should not interpreted as having anything to do with object orientation. STT_DATA might have been a better name.)
STT_OBJECT is used for normal variable definitions, while STT_COMMON is used for tentative definitions. See my earlier blog entry about tentative symbols for more information on the differences between them.

STT_FUNC
A function, or other executable code.

STT_SECTION
When I first started learning about ELF, and someone would say something about "section symbols", I thought they meant a symbol from some given section. That's not it though: A section symbol is a symbol that is used to refer to the section itself. They are used mainly when performing relocations, which are often specified in the form of "modify the value at offset XXX relative to the start of section YYY".

STT_FILE
The name of a file, either of an input file used to construct the ELF file, or of the ELF file itself.

STT_TLS
A third type of data symbol, used for thread local data. A thread local variable is a variable that is unique to each thread. For instance, if I declare the variable "foo" to be thread local, then every thread has a separate foo variable of their own, and they do not see or share values from the other threads. Thread local variables are created for each thread when the thread is created. As such, their number (one per thread) and addresses (depends on when the thread is created, and how many threads there are) are unknown until runtime. An ELF file cannot contain an address for them. Instead, a STT_TLS symbol is used. The value of a STT_TLS symbol is an offset, which is used to calculate a TLS offset relative to the thread pointer. You can read more about TLS in the .

STT_REGISTER
The Sparc architecture has a concept known as a "register symbol". These symbols are used to validate symbol/register usage, and can also be used to initialize global registers. Other architectures don't use these.

In addition to symbol type, each symbols has other attributes:

Name (Optional: Not all symbols need a name, though most do)
Value
Size
Binding and Visibility
ELF Section it references

The exact meaning for some of these attributes depends on the type of symbol involved. For more details, consult the Solaris Linker and Libraries Guide, which is available in PDF form online.

Symbols Table Layout And Conventions

The symbols in a symbol table are written in the following order:

Index 0 in any symbol table is used to represent undefined symbols. As such, the first entry in a symbol table (index 0) is always completely zeroed (type STT_NOTYPE), and is not used.
If the file contains any local symbols, the second entry (index 1) the symbol table will be a STT_FILE symbol giving the name of the file.
Section symbols.
Register symbols.
Global symbols that have been reduced to local scope via a mapfile.
For each input file that supplies local symbols, a STT_FILE symbol giving the name of the input file is put in the symbol table, followed by the symbols in question.
The global symbols immediately follow the local symbols in the symbol table. Local and global symbols are always kept separate in this manner, and cannot be mixed together.

What would happen if we ignored these rules and reordered things in some other way (e.g. sorted by address)? There is no way to answer this question with 100% certainty. It would probably confuse existing tools that manipulate ELF files. In particular, it seems clear that the local and global symbols must remain separate. For years and years, arbitrary software has been free to assume the above layout. We can't possibly know how much software has been written, or how dependent on layout it is. The only safe move is to maintain the well known layout described above.

Next Time: Augmenting The Dynsym

One of the big advantages of Solaris relative to other operating systems is the extensive support for observability: The ability to easily look inside a running program and see what it is doing, in detail. To do that well requires symbols. The symbols in the dynsym may not be enough to do a really good job. For example, to produce a stack trace, we need to take each function address and match it up to its name. If we are looking at a stripped file, or referencing the file from within the process using it via , we won't have any way to find names for the non-global functions, and will have to resort to displaying hex addresses. This is better than nothing, but not by much. The standard files in a Solaris distribution are not stripped for exactly this reason. However, many files found in production are stripped, and in-process inspection is still limited to the dynsym.

Machines are much larger than they used to be. The memory saved by the symtab/dynsym division is still a good thing, but there are times when we wish that the dynsym contained a bit more data. This is harder than it sounds. The layout of dynsym interacts with the rest of an ELF file in ways that are set in stone by years of existing practice. Backward compatibility is a critical feature of Solaris. We try extremely hard to keep those old programs running. And yet, the needs of observability, spearheaded by important new features like , put pressure on us in the other direction.

This discussion is prelude to work I recently did to augment the dynsym to contain local symbols, while preserving full backward compatibility with older versions Solaris. I plan to cover that in a future blog entry. ELF is old, and much of how it works cannot be changed. Its original designers (our "Founding Fathers", as Rod calls them) anticipated that this would be the case, based no doubt on hard experience with earlier systems. The ELF design is therefore uniquely flexible, which explains why it has survived as long as it has. There is always a way to add something new. Sometimes, it takes several tries to find the best way.

阅读(1959) | 评论(0) | 转发(0) |

上一篇：GCC Specs File Format Described

下一篇： register变量

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6