Chinaunix首页 | 论坛 | 博客
  • 博客访问: 89455
  • 博文数量: 14
  • 博客积分: 1708
  • 博客等级: 上尉
  • 技术积分: 260
  • 用 户 组: 普通用户
  • 注册时间: 2008-01-27 09:29
文章分类

全部博文(14)

文章存档

2011年(1)

2010年(13)

分类:

2010-07-19 09:54:50

When sparse_irq is used (CONFIG_SPARSE_IRQ), use kzalloc_node to get irq_desc, irq_cfg
use desc->chip_data for x86 to store irq_cfg also prepare
need to add struct (irq_desc **descp) to ack_edge/level to make sure desc get updated
try to pass desc cfg as more as possible to avoid list looking up.

This enables support for sparse irqs. This is useful for distro kernels that want to define a high CONFIG_NR_CPUS value but still want to have low kernel memory footprint on smaller machines.

( Sparse IRQs can also be beneficial on NUMA boxes, as they spread out the irq_desc[] array in a more NUMA-friendly way. )

Impact: new CONFIG_SPARSE_IRQ feature, which makes irq_desc[] a sparse array

To support kernels with very large NR_CPUS and NR_IRQS settings,
we need to reduce the size of irq_desc[]. On x86, when NR_CPUS is
set to 4096, the irq_desc[] array will waste megabytes of RAM,
which is not acceptable overhead to generic distro kernels.

In v2.6.28 we already introduced a generic API to make access to
the irq_desc[] array more abstract - and to allow a different
data structure to underly it. This patch finishes that process.

Core kernel changes:

- fix missing sparseirq API changes in various bits of core kernel code
(missing for_irq_desc primitives, missing checks for !desc, etc.)

- introduce a new data type in the IRQ code: irq_desc_ptrs[] and its
handling in the core IRQ code

- detach the IRQ statistics counters from kernel_stat and
attach it to irq_desc->kstat_irqs[] dynamically allocated
array of pointers. (this can use percpu_alloc() in the
future, once percpu_alloc() becomes generic enough)

- detach the NR_IRQS array in random.c.

- interrupt remapping: when moving an IRQ on NUMA, reallocate the irq
descriptor so that we get proper NUMA-local memory for the descriptor,
for the irq_cfg entry and for the kstat_irqs array.

Architectures can enable this by setting the CONFIG_SPARSE_IRQ
config switch. The x86 architecture is extended/fixed to deal
with such an irq_desc[] model:

- io_apic irq_cfg[NR_IRQS] array is re-attached to desc->irq_chip

- MSI virtual IRQ numbering is sanitized to go from the max upper
end of the physical IRQ range up towards NR_IRQS - instead of
coming down from the end of NR_IRQS.

- re-tunes our max NR_IRQS calculations

Architectures that do not specify CONFIG_SPARSE_IRQ, do not need
to change anything - this is a transparent feature that is not
supposed to break any existing code.


阅读(1586) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~