Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1707294
  • 博文数量: 607
  • 博客积分: 10031
  • 博客等级: 上将
  • 技术积分: 6633
  • 用 户 组: 普通用户
  • 注册时间: 2006-03-30 17:41
文章分类

全部博文(607)

文章存档

2011年(2)

2010年(15)

2009年(58)

2008年(172)

2007年(211)

2006年(149)

我的朋友

分类: LINUX

2006-09-20 14:48:37

Search the Catalog

Linux Device Drivers, 2nd Edition


2nd Edition June 2001
0-59600-008-1, Order Number: 0081
586 pages, $39.95

Chapter 7
Getting Hold of Memory

Contents:







Thus far, we have used kmalloc and kfree for the allocation and freeing of memory. The Linux kernel offers a richer set of memory allocation primitives, however. In this chapter we look at other ways of making use of memory in device drivers and at how to make the best use of your system's memory resources. We will not get into how the different architectures actually administer memory. Modules are not involved in issues of segmentation, paging, and so on, since the kernel offers a unified memory management interface to the drivers. In addition, we won't describe the internal details of memory management in this chapter, but will defer it to "Memory Management in Linux" in Chapter 13, "mmap and DMA".

The kmalloc allocation engine is a powerful tool, and easily learned because of its similarity to malloc. The function is fast -- unless it blocks -- and it doesn't clear the memory it obtains; the allocated region still holds its previous content. The allocated region is also contiguous in physical memory. In the next few sections, we talk in detail about kmalloc, so you can compare it with the memory allocation techniques that we discuss later.

The most-used flag, GFP_KERNEL, means that the allocation (internally performed by calling, eventually, get_free_pages, which is the source of the GFP_ prefix) is performed on behalf of a process running in kernel space. In other words, this means that the calling function is executing a system call on behalf of a process. Using GFP_KERNEL means that kmalloccan put the current process to sleep waiting for a page when called in low-memory situations. A function that allocates memory using GFP_KERNEL must therefore be reentrant. While the current process sleeps, the kernel takes proper action to retrieve a memory page, either by flushing buffers to disk or by swapping out memory from a user process.

GFP_KERNEL isn't always the right allocation flag to use; sometimes kmalloc is called from outside a process's context. This type of call can happen, for instance, in interrupt handlers, task queues, and kernel timers. In this case, the current process should not be put to sleep, and the driver should use a flag of GFP_ATOMIC instead. The kernel normally tries to keep some free pages around in order to fulfill atomic allocation. When GFP_ATOMIC is used, kmalloc can use even the last free page. If that last page does not exist, however, the allocation will fail.

Other flags can be used in place of or in addition to GFP_KERNEL and GFP_ATOMIC, although those two cover most of the needs of device drivers. All the flags are defined in : individual flags are prefixed with a double underscore, like __GFP_DMA; collections of flags lack the prefix and are sometimes called allocation priorities.

This flag requests memory usable in DMA data transfers to/from devices. Its exact meaning is platform dependent, and the flag can be OR'd to either GFP_KERNEL or GFP_ATOMIC.

DMA-capable memory is the only memory that can be involved in DMA data transfers with peripheral devices. This restriction arises when the address bus used to connect peripheral devices to the processor is limited with respect to the address bus used to access RAM. For example, on the x86, devices that plug into the ISA bus can only address memory from 0 to 16 MB. Other platforms have similar needs, although usually less stringent than the ISA one.[29]

[29]It's interesting to note that the limit is only in force for the ISA bus; an x86 device that plugs into the PCI bus can perform DMA with all normalmemory.

High memory is memory that requires special handling to be accessed. It made its appearance in kernel memory management when support for the Pentium II Virtual Memory Extension was implemented during 2.3 development to access up to 64 GB of physical memory. High memory is a concept that only applies to the x86 and SPARC platforms, and the two implementations are different.

Linux handles memory allocation by creating a set of pools of memory objects of fixed sizes. Allocation requests are handled by going to a pool that holds sufficiently large objects, and handing an entire memory chunk back to the requester. The memory management scheme is quite complex, and the details of it are not normally all that interesting to device driver writers. After all, the implementation can change -- as it did in the 2.1.38 kernel -- without affecting the interface seen by the rest of the kernel.

The one thing driver developers should keep in mind, though, is that the kernel can allocate only certain predefined fixed-size byte arrays. If you ask for an arbitrary amount of memory, you're likely to get slightly more than you asked for, up to twice as much. Also, programmers should remember that the minimum memory that kmalloc handles is as big as 32 or 64, depending on the page size used by the current architecture.

The data sizes available are generally powers of two. In the 2.0 kernel, the available sizes were actually slightly less than a power of two, due to control flags added by the management system. If you keep this fact in mind, you'll use memory more efficiently. For example, if you need a buffer of about 2000 bytes and run Linux 2.0, you're better off asking for 2000 bytes, rather than 2048. Requesting exactly a power of two is the worst possible case with any kernel older than 2.1.38 -- the kernel will allocate twice as much as you requested. This is why scull used 4000 bytes per quantum instead of 4096.

You can find the exact values used for the allocation blocks in mm/kmalloc.c (with the 2.0 kernel) or mm/slab.c (in current kernels), but remember that they can change again without notice. The trick of allocating less than 4 KB works well for scull with all 2.x kernels, but it's not guaranteed to be optimal in the future.

A device driver often ends up allocating many objects of the same size, over and over. Given that the kernel already maintains a set of memory pools of objects that are all the same size, why not add some special pools for these high-volume objects? In fact, the kernel does implement this sort of lookaside cache. Device drivers normally do not exhibit the sort of memory behavior that justifies using a lookaside cache, but there can be exceptions; the USB and ISDN drivers in Linux 2.4 use caches.

The offset is the offset of the first object in the page; it can be used to ensure a particular alignment for the allocated objects, but you most likely will use 0 to request the default value. flags controls how allocation is done, and is a bit mask of the following flags:

A scull Based on the Slab Caches: scullc

Time for an example. scullc is a cut-down version of the scull module that implements only the bare device -- the persistent memory region. Unlike scull, which uses kmalloc, scullc uses memory caches. The size of the quantum can be modified at compile time and at load time, but not at runtime -- that would require creating a new memory cache, and we didn't want to deal with these unneeded details. The sample module refuses to compile with version 2.0 of the kernel because memory caches were not there, as explained in "Backward Compatibility" later in the chapter.

scullc is a complete example that can be used to make tests. It differs from scullonly in a few lines of code. This is how it allocates memory quanta:

 
/* Allocate a quantum using the memory cache */
if (!dptr->data[s_pos]) {
dptr->data[s_pos] =
kmem_cache_alloc(scullc_cache, GFP_KERNEL);
if (!dptr->data[s_pos])
goto nomem;
memset(dptr->data[s_pos], 0, scullc_quantum);
}

 
for (i = 0; i < qset; i++)
if (dptr->data[i])
kmem_cache_free(scullc_cache, dptr->data[i]);
kfree(dptr->data);

To support use of scullc_cache, these few lines are included in the file at proper places:

 
/* declare one cache pointer: use it for all devices */
kmem_cache_t *scullc_cache;

/* init_module: create a cache for our quanta */
scullc_cache =
kmem_cache_create("scullc", scullc_quantum,
0, SLAB_HWCACHE_ALIGN,
NULL, NULL); /* no ctor/dtor */
if (!scullc_cache) {
result = -ENOMEM;
goto fail_malloc2;
}

/* cleanup_module: release the cache of our quanta */
kmem_cache_destroy(scullc_cache);

The main differences in passing from scullto scullc are a slight speed improvement and better memory use. Since quanta are allocated from a pool of memory fragments of exactly the right size, their placement in memory is as dense as possible, as opposed to scull quanta, which bring in an unpredictable memory fragmentation.

It's worth stressing that get_free_pages and the other functions can be called at any time, subject to the same rules we saw for kmalloc. The functions can fail to allocate memory in certain circumstances, particularly when GFP_ATOMIC is used. Therefore, the program calling these allocation functions must be prepared to handle an allocation failure.

Although kmalloc(GFP_KERNEL) sometimes fails when there is no available memory, the kernel does its best to fulfill allocation requests. Therefore, it's easy to degrade system responsiveness by allocating too much memory. For example, you can bring the computer down by pushing too much data into a scull device; the system will start crawling while it tries to swap out as much as possible in order to fulfill the kmalloc request. Since every resource is being sucked up by the growing device, the computer is soon rendered unusable; at that point you can no longer even start a new process to try to deal with the problem. We don't address this issue in scull, since it is just a sample module and not a real tool to put into a multiuser system. As a programmer, you must nonetheless be careful, because a module is privileged code and can open new security holes in the system (the most likely is a denial-of-service hole like the one just outlined).

A scull Using Whole Pages: scullp

In order to test page allocation for real, the scullp module is released together with other sample code. It is a reduced scull, just like scullc introduced earlier.

Memory quanta allocated by scullp are whole pages or page sets: the scullp_order variable defaults to 0 and can be specified at either compile time or load time.

The code to deallocate memory in scullp, instead, looks like this:

At the user level, the perceived difference is primarily a speed improvement and better memory use because there is no internal fragmentation of memory. We ran some tests copying four megabytes from scull0 to scull1 and then from scullp0 to scullp1; the results showed a slight improvement in kernel-space processor usage.

But the biggest advantage of __get_free_page is that the page is completely yours, and you could, in theory, assemble the pages into a linear area by appropriate tweaking of the page tables. For example, you can allow a user process to mmap memory areas obtained as single unrelated pages. We'll discuss this kind of operation in "The mmap Device Operation" in Chapter 13, "mmap and DMA", where we show how scullp offers memory mapping, something that scull cannot offer.

The next memory allocation function that we'll show you is vmalloc, which allocates a contiguous memory region in the virtual address space. Although the pages are not necessarily consecutive in physical memory (each page is retrieved with a separate call to __get_free_page), the kernel sees them as a contiguous range of addresses. vmalloc returns 0 (the NULL address) if an error occurs, otherwise, it returns a pointer to a linear memory area of size at least size.

The prototypes of the function and its relatives (ioremap, which is not strictly an allocation function, will be discussed shortly) are as follows:

[30]Actually, some architectures define ranges of "virtual'' addresses as reserved to address physical memory. When this happens, the Linux kernel takes advantage of the feature, and both the kernel and get_free_pages addresses lie in one of those memory ranges. The difference is transparent to device drivers and other code that is not directly involved with the memory-management kernel subsystem.

ioremap is most useful for mapping the (physical) address of a PCI buffer to (virtual) kernel space. For example, it can be used to access the frame buffer of a PCI video device; such buffers are usually mapped at high physical addresses, outside of the address range for which the kernel builds page tables at boot time. PCI issues are explained in more detail in "The PCI Interface" in Chapter 15, "Overview of Peripheral Buses".

A scull Using Virtual Addresses: scullv

Sample code using vmalloc is provided in the scullv module. Like scullp, this module is a stripped-down version of scull that uses a different allocation function to obtain space for the device to store data.

The module allocates memory 16 pages at a time. The allocation is done in large chunks to achieve better performance than scullp and to show something that takes too long with other allocation techniques to be feasible. Allocating more than one page with __get_free_pages is failure prone, and even when it succeeds, it can be slow. As we saw earlier, vmalloc is faster than other functions in allocating several pages, but somewhat slower when retrieving a single page, because of the overhead of page-table building. scullv is designed like scullp. order specifies the "order'' of each allocation and defaults to 4. The only difference between scullv and scullp is in allocation management. These lines use vmalloc to obtain new memory:

salma% cat /tmp/bigfile > /dev/scullp0; head -5 /proc/scullpmem

Device 0: qset 500, order 0, sz 1048576
item at e00000003e641b40, qset at e000000025c60000
0:e00000003007c000
1:e000000024778000
salma% cat /tmp/bigfile > /dev/scullv0; head -5 /proc/scullvmem

Device 0: qset 500, order 4, sz 1048576
item at e0000000303699c0, qset at e000000025c87000
0:a000000000034000
1:a000000000078000
salma% uname -m
ia64

rudo% cat /tmp/bigfile > /dev/scullp0; head -5 /proc/scullpmem

Device 0: qset 500, order 0, sz 1048576
item at c4184780, qset at c71c4800
0:c262b000
1:c2193000
rudo% cat /tmp/bigfile > /dev/scullv0; head -5 /proc/scullvmem

Device 0: qset 500, order 4, sz 1048576
item at c4184b80, qset at c71c4000
0:c881a000
1:c882b000
rudo% uname -m
i686

Allocation at boot time is the only way to retrieve consecutive memory pages while bypassing the limits imposed by get_free_pages on the buffer size, both in terms of maximum allowed size and limited choice of sizes. Allocating memory at boot time is a "dirty'' technique, because it bypasses all memory management policies by reserving a private memory pool.

One noticeable problem with boot-time allocation is that it is not a feasible option for the average user: being only available for code linked in the kernel image, a device driver using this kind of allocation can only be installed or replaced by rebuilding the kernel and rebooting the computer. Fortunately, there are a pair of workarounds to this problem, which we introduce soon.

The functions allocate either whole pages (if they end with _pages) or non-page-aligned memory areas. They allocate either low or normal memory (see the discussion of memory zones earlier in this chapter). Normal allocation returns memory addresses that are above MAX_DMA_ADDRESS; low memory is at addresses lower than that value.

This interface was introduced in version 2.3.23 of the kernel. Earlier versions used a less refined interface, similar to the one described in Unix books. Basically, the initialization functions of several kernel subsystems received two unsigned long arguments, which represented the current bounds of the free memory area. Each such function could steal part of this area, returning the new lower bound. A driver allocating memory at boot time, therefore, was able to steal consecutive memory from the linear array of available RAM.

This way of allocating memory has several disadvantages, not the least being the inability to ever free the buffer. After a driver has taken some memory, it has no way of returning it to the pool of free pages; the pool is created after all the physical allocation has taken place, and we don't recommend hacking the data structures internal to memory management. On the other hand, the advantage of this technique is that it makes available an area of consecutive physical memory that is suitable for DMA. This is currently the only safe way in the standard kernel to allocate a buffer of more than 32 consecutive pages, because the maximum value of order that is accepted by get_free_pages is 5. If, however, you need many pages and they don't have to be physically contiguous, vmalloc is by far the best function to use.

Another approach that can be used to make large, contiguous memory regions available to drivers is to apply the bigphysarea patch. This unofficial patch has been floating around the Net for years; it is so renowned and useful that some distributions apply it to the kernel images they install by default. The patch basically allocates memory at boot time and makes it available to device drivers at runtime. You'll need to pass a command-line option to the kernel to specify the amount of memory that must be reserved at boot time.

The patch is currently maintained at ~middelink/En/hob-v4l.html. It includes its own documentation that describes the allocation interface it offers to device drivers. The Zoran 36120 frame grabber driver, part of the 2.4 kernel (in drivers/char/zr36120.c) uses the bigphysarea extension if it is available, and is thus a good example of how the interface is used.

The last option for allocating contiguous memory areas, and possibly the easiest, is reserving a memory area at the end of physical memory (whereas bigphysarea reserves it at the beginning of physical memory). To this aim, you need to pass a command-line option to the kernel to limit the amount of memory being managed. For example, one of your authors uses mem=126M to reserve 2 megabytes in a system that actually has 128 megabytes of RAM. Later, at runtime, this memory can be allocated and used by device drivers.

The advantage of allocator over the bigphysarea patch is that there's no need to modify official kernel sources. The disadvantage is that you must change the command-line option to the kernel whenever you change the amount of RAM in the system. Another disadvantage, which makes allocator unsuitable in some situations is that high memory cannot be used for some tasks, such as DMA buffers for ISA devices.

The lookaside cache functions were introduced in Linux 2.1.23, and were simply not available in the 2.0 kernel. Code that must be portable back to Linux 2.0 should stick with kmalloc and kfree. Moreover, kmem_destroy_cache was introduced during 2.3 development and has only been backported to 2.2 as of 2.2.18. For this reason scullc refuses to compile with a 2.2 kernel older than that.

The functions and symbols related to memory allocation follow.

#include
void *kmalloc(size_t size, int flags);
void kfree(void *obj);

Only with version 2.4 of the kernel, memory can be allocated at boot time using these functions. The facility can only be used by drivers directly linked in the kernel image.



Back to:


| | | | | |

© 2001, O'Reilly & Associates, Inc.

阅读(552) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~