分类:
2010-07-18 23:19:13
Memory management is one of the most fundamental areas of computer programming. In many scripting languages, you don't have to worry about how memory is managed, but that doesn't make memory management any less important. Knowing the abilities and limitations of your memory manager is critical for effective programming. In most systems languages like C and C++, you have to do memory management. This article covers the basics of manual, semi-automatic, and automatic memory management practices.
Back in the days of assembly language programming on the Apple II, memory management was not a huge concern. You basically had run of the whole system. Whatever memory the system had, so did you. You didn't even have to worry about figuring out how much memory it had, since every computer had the same amount. So, if your memory requirements were pretty static, you just chose a memory range to use and used it.
However, even in such a simple computer you still had issues, especially if you didn't how much memory each part of your program was going to need. If you have limited space and varying memory needs, then you need some way to meet these requirements:
The libraries that implement these requirements are called allocators, because they are responsible for allocating and deallocating memory. The more dynamic a program is, the more memory management becomes an issue, and the more important your choice of memory allocator becomes. Let's look at the different methods available to manage memory, their benefits and drawbacks, and the situations in which they work best.
The C programming language provides two functions to fulfill our three requirements:
malloc
, and returns it for
later use by the program or the operating system (actually, some malloc
implementations can only return memory back to the program, not to the
operating system).To understand how memory gets allocated within your program, you first need to understand how memory gets allocated to your program from the operating system. Each process on your computer thinks that it has access to all of your physical memory. Obviously, since you are running multiple programs at the same time, each process can't own all of the memory. What happens is that your processes are using virtual memory.
Just for an example, let's say that your program is accessing memory address 629. The virtual memory system, however, doesn't necessarily have it stored in RAM location 629. In fact, it may not even be in RAM -- it could even have been moved to disk if your physical RAM was full! Because the addresses don't necessarily reflect the physical location where the memory is located, this is called virtual memory. The operating system maintains a table of virtual address-to-physical address translations so that the computer hardware can respond properly to address requests. And, if the address is on disk instead of in RAM, the operating system will temporarily halt your process, unload other memory to disk, load in the requested memory from disk, and restart your process. This way, each process gets its own address space to play in and can access more memory than you have physically installed.
On 32-bit x86 systems, each process can access 4 GB of memory. Now, most people don't have 4 GB of memory on their systems, even if you include swap, must less 4 GB per process. Therefore, when a process loads, it gets an initial allocation of memory up to a certain address, called the system break. Past that is unmapped memory -- memory for which no corresponding physical location has been assigned either in RAM or on disk. Therefore, if a process runs out of memory from its initial allocation, it has to request that the operating system "map in" more memory. (Mapping is a mathematical term for one-to-one correspondence -- memory is "mapped" when its virtual address has a corresponding physical location to store it in.)
UNIX-based systems have two basic system calls that map in additional memory:
brk()
is a very simple system call.
Remember the system break, the location that is the edge of mapped
memory for the process? brk()
simply moves that location
forward or backward, to add or remove memory to or from the process.mmap()
,
or "memory map," is like brk()
but is much more flexible.
First, it can map memory in anywhere, not just at the end of the
process. Second, not only can it map virtual addresses to physical RAM
or swap, it can map them to files and file locations so that reading and
writing memory addresses will read and write data to and from files.
Here, however, we are only concerned with mmap
's ability to
add mapped RAM to our process. munmap()
does the reverse
of mmap()
.
As you can see, either brk()
or mmap()
can be
used to add additional virtual memory to our processes. We will use brk()
in our examples, because it is simpler and more common.
If you've done much C programming, you have probably used malloc()
and free()
quite a bit. However, you may not have taken
the time to think about how they might be implemented in your operating
system. This section will show you code for a simplistic implementation
of malloc
and free
to help demonstrate what
is involved with managing memory.
To try out these examples, copy this code listing and paste it into a file called malloc.c. I'll explain the listing a section at a time, below.
Memory allocation on most operating systems is handled by two simple functions:
void *malloc(long numbytes)
: This allocates numbytes
of memory and returns a pointer to the first byte.void
free(void *firstbyte)
: Given a pointer that has been returned by a
previous malloc
, this gives the space that was allocated
back to the process's "free space."malloc_init
is going to be our function to initialize our memory allocator. It does
three things: marks our allocator as being initialized, finds the last
valid memory address on the system, and sets up the pointer to the
beginning of our managed memory. These three variables are global
variables:
int has_initialized = 0; |
As mentioned above, the edge of mapped memory -- last valid address --
is often known as the system break or the current break. On many
UNIX® systems, to find the current system break, you use the function sbrk(0)
.
sbrk
moves the current system break by the number of bytes
in its argument, and then returns the new system break. Calling it with
an argument of 0
simply returns the current break. Here is
our malloc
initialization code, which finds the current
break and initializes our variables:
/* Include the sbrk function */ |
Now, in order to properly manage memory, we need to be able to track
what we are allocating and deallocating. We need to do things like mark
blocks as unused after free
has been called on them, and
be able to locate unused blocks when malloc
is called.
Therefore, the start of every piece of memory returned by malloc
will have this structure at the beginning:
struct mem_control_block { |
Now, you might think that this would cause problems for programs calling
malloc
-- how do they know about this struct? The answer
is that they don't have to know about it; we will hide it by moving the
pointer past this struct before we return it. This will make the
returned pointer point to memory that is not used for any other purpose.
That way, from the calling programs' perspective, all they get is
free, open memory. Then, when they pass the pointer back via free()
,
we simply back up a few memory bytes to find this structure again.
We're going to talk about freeing before we talk about allocating
memory, because it's simpler. The only thing we have to do to free
memory is to take the pointer we're given, back up sizeof(struct
mem_control_block)
bytes, and mark it as available. Here is the
code for that:
void free(void *firstbyte) { |
As you can see, in this allocator, freeing memory is done in constant time, using a very simple mechanism. Allocating memory is slightly harder. Here is the outline of the algorithm:
1. If our allocator has not been initialized, initialize it. |
We're basically walking through memory using linked pointers looking for open chunks. Here is the code:
void *malloc(long numbytes) { |
And that is our memory manager. Now, we just have to build it and get it to run with our programs.
To build your malloc
-compatible allocator (actually, we're
missing some functions like realloc()
, but malloc()
and free()
are the main ones), run the following command:
gcc -shared -fpic malloc.c -o malloc.so |
This will produce a file named malloc.so, which is a shared library containing our code.
On UNIX systems, you can now use your allocator in place of your system malloc()
by doing this:
LD_PRELOAD=/path/to/malloc.so |
The LD_PRELOAD
environment variable causes the dynamic
linker to load the symbols of the given shared library before any
executable it loads. It also gives precedence to the symbols in the
library specified. Therefore, any application we start from now on in
this session will be using our malloc()
and not the system
one. A few applications don't use malloc()
, but they are
the exception. Others, which use the other memory-management functions
such as realloc()
or which make poor assumptions about the
internal behavior of malloc()
will likely crash. The ash
shell appears to work just fine using our new malloc()
.
If you want to be sure that your malloc()
is being used,
you should test it by adding calls to write()
at the entry
points of your functions.
Our memory manager leaves a lot to be desired, but it is good for showing what a memory manager needs to do. Some of its drawbacks include the following:
mmap
.malloc
simply
assumes success).realloc()
.sbrk()
may give back more memory than we ask for, we leak some memory at the
end of the heap.is_available
flag uses a full
4-byte word, even though it only contains 1 bit of information.
There are many implementations of malloc()
, each with their
own strengths and weaknesses. There are a number of tradeoff decisions
when you design an allocator, including:
Each implementation has its own set of benefits and drawbacks. In our simple allocator, it was very slow in allocation but very, very fast in deallocation. Also, because of its poor behavior with virtual memory systems, it works best on large objects.
There are many other allocators available. Some of them include:
ptmalloc
. Doug Lea's allocator
has a basic structure much like our version, but it incorporates indexes
to make searching faster and has the ability to combine multiple unused
chunks into one large chunk. It also enables caching to make reuse of
recently freed memory faster. ptmalloc
is a version of
Doug Lea Malloc that was extended to support multiple threads. A paper
describing Doug Lea's Malloc implementation is available in the Resources
section later in this article.These are the best known of the many allocators available. If your program has specific allocation needs, you may prefer to write a custom allocator that matches the way your program allocates memory. However, if you aren't familiar with allocator design, custom allocators can often create more problems than they solve. For a good introduction to the subject, see Donald Knuth's The Art of Computer Programming Volume 1: Fundamental Algorithms in section 2.5, "Dynamic Storage Allocation" (see Resources for a link). It is a bit dated, because it doesn't take into account virtual memory environments, but most algorithms are based on the ones presented there.
In C++, you can implement your own allocator on a per-class or
per-template basis by overloading operator new()
. Andrei
Alexandrescu's Modern C++ Design describes a small object
allocator in Chapter 4, "Small Object Allocation" (see Resources
for a link).
Not only does our memory manager have shortcomings, there are many
shortcomings of malloc()
-based memory management that
remain no matter which allocator you use. Managing memory with malloc()
can be pretty daunting for programs that have long-running storage they
need to keep around. If you have lots of references to memory floating
around, it is often difficult to know when it should be released.
Memory whose lifetime is limited to the current function is fairly easy
to manage, but for memory that lives beyond that, it becomes much more
difficult. Also, many APIs are unclear as to whether the responsibility
for memory management lies with the calling program or the called
function.
Because of the problems managing memory, many programs are oriented around their memory management rules. C++'s exception handling makes this task even more problematic. Sometimes it seems that more code is dedicated to managing memory allocation and cleanup than actually accomplishing computational tasks! Therefore, we will examine other alternatives to memory-management.
Reference counting is a semi-automated memory-management technique, meaning that it requires some programmer support, but it does not require you to know for sure when an object is no longer in use. The reference counting mechanism does that for you.
In reference counting, all shared data structures have a field that contains the number of "references" currently active to that structure. When a procedure is passed a pointer to a data structure, it adds to the reference count. Basically, you are telling the data structure how many locations it is being stored in. Then, when your procedure is finished using it, it decreases the reference count. When this happens, it also checks to see if the count has dropped to zero. If so, it frees the memory.
The advantage to this is that you don't have to follow every path in your program that a given data structure may follow. Each localized reference to it simply increases or decreases the count as appropriate. This prevents it from being freed while it is still in use. However, you must remember to run the reference counting functions whenever you are using a reference-counted data structure. Also, built-in functions and third-party libraries will not know about or be able to use your reference-counting mechanism. Reference counting also has difficulties with structures having circular references.
To implement reference counting, you simply need two functions -- one to increase the reference count, and one to decrease the reference count and free the memory when the count drops to zero.
An example reference counting function set might look like this:
/* Structure Definitions*/ |
REF
and UNREF
might be more complicated, depending on what you wanted to do. For
example, you may want to add locking for a multithreaded program, and
you may want to extend refcountedstruct
so that it also
includes a pointer to a function to call before freeing the memory (like
a destructor in object-oriented languages -- this is required if
your structures contain pointers).
When using REF
and UNREF
, you need to obey
these rules for pointer assignments:
UNREF
the value that the left-hand-side pointer
is pointing to before the assignment.REF
the
value that that the left-hand-side pointer is pointing to after the
assignment.In functions that are passed refcounted structures, the functions need to follow these rules:
Here is a quick example of code using reference counting:
/* EXAMPLES OF USAGE */ |
Since reference counting is so simple, most programmers implement it
themselves rather than using libraries. They do, however, depend on
low-level allocators like malloc
and free
to
actually allocate and release their memory.
Reference counting is used quite a bit in high-level languages like Perl to do memory management. In those languages, the reference counting is handled automatically by the language, so that you don't have to worry about it at all except for writing extension modules. This takes away some speed as everything must be reference counted, but adds quite a bit of safety and ease of programming. Here are the benefits:
However, it also has its drawbacks:
try
or setjmp()
/longjmp()
)
while using reference counted objects.C++ can mitigate some of the programmer error by using smart pointers, which can handle pointer-handling details such as reference counting for you. However, if you have to use any legacy code that can't handle your smart pointers (such as linkage to a C library), it usually degenerates into a mess that is actually more difficult and twisted than if you didn't use them. Therefore, it is usually only useful for C++-only projects. If you want to use smart pointers, you really need to read the "Smart Pointers" chapter from Alexandrescu's Modern C++ Design book.
Memory pools are another method to semi-automate memory management. Memory pools help automate memory management for programs that go through specific stages, each of which has memory that is allocated for only specific stages of processing. For example, many network server processes have lots of per-connection memory allocated -- memory whose maximum lifespan is the life of the current connection. Apache, which uses pooled memory, has its connections broken down into stages, each of which have their own memory pool. At the end of the stage, the entire memory pool is freed at once.
In pooled memory management, each allocation specifies a pool of memory from which it should be allocated. Each pool has a different lifespan. In Apache, there is a pool that lasts the lifetime of the server, one that lasts the lifetime of the connection, one that lasts the lifetime of the requests, and others as well. Therefore, if I have a series of functions that will not generate any data that lasts longer than the connection, I can just allocate it all from the connection pool, knowing that at the end of the connection, it will be freed automatically. Additionally, some implementations allow registering cleanup functions, which get called right before the memory pool is cleared, to do any additional tasks that need to be performed before the memory is cleared (similar to destructors, for you object-oriented folks).
To use pools in your own programs, you can either use GNU libc's obstack implementation or Apache's Apache Portable Runtime. GNU obstacks are nice, because they are included by default in GNU-based Linux distributions. The Apache Portable Runtime is nice because it has a lot of other utilities to handle all aspects of writing multiplatform server software. To learn more about GNU obstacks and Apache's pooled memory implementation, see the links to their documentation in the Resources section.
The following hypothetical code listing shows how obstacks are used:
#include |
Basically, after each major stage of operation, the obstack for that
stage is freed. Note, however, that if a procedure needs to allocate
memory that will last longer than the current stage, it can use a
longer-term obstack as well, such as the connection or the global one.
The NULL
that is passed to obstack_free()
indicates that it should free the entire contents of the obstack. Other
values are available, but they usually are not as useful.
Benefits of using pooled memory allocation include the following:
The drawbacks for pooled memory are:
Garbage collection is the fully automatic detection and removal of data objects that are no longer in use. Garbage collectors are usually run when the available memory drops below a specific threshold. Generally, they start off with a "base" set of data that is known to be available to the program -- stack data, global variables, and registers. They then try to trace through every piece of data linked through those. Everything the collector finds is good data; everything that it doesn't find is garbage and can be destroyed and reused. To manage memory effectively, many types of garbage collectors require knowledge of the layout of pointers within data structures, and therefore have to be a part of the language itself to function properly.
Hans Boehm's conservative garbage collector is one of the most popular
garbage collectors available, because it's free and it's both
conservative and incremental. You can use it as a drop-in replacement
for your system allocator (using malloc
/free
instead of its own API) by building it with --enable-redirect-malloc
.
In fact, if you do this, you can use the same LD_PRELOAD
trick that we used for our simple allocator to enable garbage collection
in almost any program on your system. If you suspect a program is
leaking memory, you can use this garbage collector to keep the process
size down. Many people used this technique in the early days of Mozilla
when it leaked memory heavily. This garbage collector runs under both
Windows® and UNIX.
Some advantages of garbage collection:
The drawbacks include:
It's a world of tradeoffs: performance, ease-of-use, ease-of-implementation, and threading capability, just to name a few. There are numerous patterns of memory management at your disposal to match your project requirements. Each pattern has a wide range of implementations, each of which has its benefits and drawbacks. Using the default techniques for your programming environment is fine for many projects, but knowing the available options will help you when your project has special needs. This table compares the memory management strategies covered in this article.
Strategy | Allocation speed | Deallocation speed | Cache locality | Ease of use | Generality | Usable in real time | SMP and thread-friendly |
Custom allocator | Depends on implementation | Depends on implementation | Depends on implementation | Very difficult | None | Depends on implementation | Depends on implementation |
Simple allocator | Fast for small memory usage | Very fast | Poor | Easy | Very | No | No |
GNU malloc | Moderate | Fast | Moderate | Easy | Very | No | Moderate |
Hoard | Moderate | Moderate | Moderate | Easy | Very | No | Yes |
Reference counting | N/A | N/A | Excellent | Moderate | Moderate | Yes (depends on malloc implementation)
| Depends on implementation |
Pooling | Moderate | Very fast | Excellent | Moderate | Moderate | Yes (depends on malloc implementation)
| Depends on implementation |
Garbage collection | Moderate (slow when collection occurs) | Moderate | Poor | Moderate | Moderate | No | Rarely |
Incremental garbage collection | Moderate | Moderate | Moderate | Moderate | Moderate | No | Rarely |
Incremental conservative garbage collection | Moderate | Moderate | Moderate | Easy | Very | No | Rarely |
Learn
malloc
implementation that is based on
mmap()
.malloc
and
how
it interacts with BSD virtual memory.Get products and technologies
Discuss
Get involved in the developerWorks community through blogs, forums, podcasts, and spaces.