分类: LINUX
2014-06-24 07:47:40
原文地址: Intel IOMMU在Linux上的实现架构 作者:galaren77
./drivers/pci/dmar.c->int __init early_dmar_detect(void)
{
acpi_status status = AE_OK;
/* if we could find DMAR table, then there are DMAR devices */
status = acpi_get_table(ACPI_SIG_DMAR, 0,
(struct acpi_table_header **)&dmar_tbl);
if (ACPI_SUCCESS(status) && !dmar_tbl) {
printk (KERN_WARNING PREFIX "Unable to map DMAR/n");
status = AE_NOT_FOUND;
}
return (ACPI_SUCCESS(status) ? 1 : 0);
}
该函数在内存初始化的时候调用:
./arch/x86_64/mm/init.c:528: pci_iommu_alloc();
通过读取 DMA Remapping table,来判断判断是否支持DMAR设备。
./include/acpi/actbl1.h:64:#define ACPI_SIG_DMAR "DMAR" /* DMA Remapping table */
/*******************************************************************************
*
* FUNCTION: acpi_get_table
*
* PARAMETERS: table_type - one of the defined table types
* Instance - the non zero instance of the table, allows
* support for multiple tables of the same type
* see acpi_gbl_acpi_table_flag
* ret_buffer - pointer to a structure containing a buffer to
* receive the table
*
* RETURN: Status
*
* DESCRIPTION: This function is called to get an ACPI table. The caller
* supplies an out_buffer large enough to contain the entire ACPI
* table. The caller should call the acpi_get_table_header function
* first to determine the buffer size needed. Upon completion
* the out_buffer->Length field will indicate the number of bytes
* copied into the out_buffer->buf_ptr buffer. This table will be
* a complete table including the header.
*
********************************************************************************/
./drivers/pci/intel-iommu.c:
int __init intel_iommu_init(void)
{
int ret = 0;
if (no_iommu || swiotlb || dmar_disabled)
return -ENODEV;
if (dmar_table_init())
return -ENODEV;
iommu_init_mempool();
dmar_init_reserved_ranges();
init_no_remapping_devices();
ret = init_dmars();
if (ret) {
printk(KERN_ERR "IOMMU: dmar init failed/n");
put_iova_domain(&reserved_iova_list);
iommu_exit_mempool();
return ret;
}
printk(KERN_INFO
"PCI-DMA: Intel(R) Virtualization Technology for Directed I/O/n");
force_iommu = 1;
dma_ops = &intel_dma_ops;
return 0;
}
该函数在arch/x86_64/kernel/pci-dma.c的
static int __init pci_iommu_init(void)
{
#ifdef CONFIG_CALGARY_IOMMU
calgary_iommu_init();
#endif
intel_iommu_init();
#ifdef CONFIG_IOMMU
gart_iommu_init();
#endif
no_iommu_init();
return 0;
}
中被调用,同时在该文件中注册为初始化函数:
/* Must execute after PCI subsystem */
fs_initcall(pci_iommu_init);
解析DMAR table。逐一打印每个dmar项,
dmar_table_print_dmar_entry(entry_header);
类似如下的信息在dmesg中出现:
ACPI DMAR:Host address width 36
ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000
ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed91000
ACPI DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000
ACPI DMAR:RMRR base: 0x00000000000ed000 end: 0x00000000000effff
ACPI DMAR:RMRR base: 0x000000007f600000 end: 0x000000007fffffff
switch (entry_header->type) {
case ACPI_DMAR_TYPE_HARDWARE_UNIT:
ret = dmar_parse_one_drhd(entry_header);
break;
case ACPI_DMAR_TYPE_RESERVED_MEMORY:
ret = dmar_parse_one_rmrr(entry_header);
break;
default:
printk(KERN_WARNING PREFIX
"Unknown DMAR structure type/n");
ret = 0; /* for forward compatibility */
break;
}
解析如下两个表项:
DRHD - DMA Engine Reporting Structure
RMRR - Reserved memory Region Reporting Structure
对于DRHD表项,通过register函数,将每个DMA的物理设备放到一个list中。对于每个RMRR,同样放到一个全局列表中。
创建几个常用结构的slab_cache:
struct iova
struct iommu_domain
struct device_domain_info
初始化保留的区域。下面两种range是需要保留的:
1. IOAPIC ranges shouldn't be accessed by DMA
2. Reserve all PCI MMIO to avoid peer-to-peer access
Graphics driver workarounds to provide unity map
Digg This
Most GFX drivers don't call standard PCI DMA APIs to allocate DMA buffer,
Such drivers will be broken with IOMMU enabled. To workaround this issue,
we added two options.
Once graphics devices are converted over to use the DMA-API's this entire
patch can be removed...
a. intel_iommu=igfx_off. With this option, DMAR who has just gfx devices
under it will be ignored. This mostly affect intergated gfx devices.
If the DMAR is ignored, gfx device under it will get physical address
for DMA.
b. intel_iommu=gfx_workaround. With this option, we will setup 1:1 mapping
for whole memory for gfx devices, that is physical address equals to
virtual address.In this way, gfx will use physical address for DMA, this
is primarily for add-in card GFX device.
2.5 init_dmars
初始化dmar数据结构。
TBD:数据结构关系图
dma_ops = &intel_dma_ops;
static struct dma_mapping_ops intel_dma_ops = {
.alloc_coherent = intel_alloc_coherent,
.free_coherent = intel_free_coherent,
.map_single = intel_map_single,
.unmap_single = intel_unmap_single,
.map_sg = intel_map_sg,
.unmap_sg = intel_unmap_sg,
};
The system BIOS is responsible for detecting the remapping hardware functions in the platform and for locating the memory-mapped remapping hardware registers in the host system address space. The BIOS reports the remapping hardware units in a platform to system software through the DMA Remapping Reporting (DMAR) ACPI table described below.
Field |
Byte Length |
Byte Offset |
Description |
Signature |
4 |
0 |
“DMAR”. Signature for the DMA Remapping Description table. |
Length |
4 |
4 |
Length, in bytes, of the description table including the length of the associated DMAremapping structures. |
Revision |
1 |
8 |
1 |
Checksum |
1 |
9 |
Entire table must sum to zero. |
OEMID |
6 |
10 |
OEM ID |
OEM Table ID |
8 |
16 |
For DMAR description table, the Table ID is the manufacturer model ID. |
OEM Revision |
4 |
24 |
OEM Revision of DMAR Table for OEM Table ID. |
Creator ID |
4 |
28 |
Vendor ID of utility that created the table. |
Creator Revision |
4 |
32 |
Revision of utility that created the table. |
Host Address Width |
1 |
36 |
This field indicates the maximum DMA physical addressability supported by this platform. The system address map reported by the BIOS indicates what portions of this addresses are populated.
The Host Address Width (HAW) of the platform is computed as (N+1), where N is the value reported in this field. For example, for a platform supporting 40 bits of physical addressability, the value of 100111b is reported in this field. |
Flags |
1 |
37 |
? Bit 0: INTR_REMAP - If Clear, the platform does not support interrupt remapping. If Set, the platform supports interrupt remapping. ? Bits 1-7: Reserved. |
Reserved |
10 |
38 |
Reserved (0). |
Remapping Structures[] |
- |
48 |
A list of structures. The list will contain one or more DMA Remapping Hardware Unit Definition (DRHD) structures, and zero or more Reserved Memory Region Reporting (RMRR) and Root Port ATS Capability Reporting (ATSR) structures. These structures are described below. |
每个Remapping Structure的开始部分包含type和length两个字段。其中,type表示DMA-remapping structure的类型,而length表示该structure的长度。下表定义了type的可能值:
Value |
Description |
0 |
DMA Remapping Hardware Unit Definition (DRHD) Structure |
1 |
Reserved Memory Region Reporting (RMRR) Structure |
2 |
Root Port ATS Capability Reporting (ATSR) Structure |
>2 |
Reserved for future use. For forward compatibility, software skips structures it does not comprehend by skipping the appropriate number of bytes indicated by the Length field. |
注:BIOS implementations must report these remapping structure types in numerical order. i.e., All remapping structures of type 0 (DRHD) enumerated before remapping structures of type 1 (RMRR), and so forth.
A DMA-remapping hardware unit definition (DRHD) structure uniquely represents a remapping hardware unit present in the platform. There must be at least one instance of this structure for each PCI segment in the platform.
Field |
Byte Length |
Byte Offset |
Description |
Type |
2 |
0 |
0 - DMA Remapping Hardware Unit Definition (DRHD) structure |
Length |
2 |
2 |
Varies (16 + size of Device Scope Structure) |
Flags |
1 |
4 |
Bit 0: INCLUDE_PCI_ALL l If Set, this remapping hardware unit has under its scope all PCI compatible devices in the specified Segment, except devices reported under the scope of other remapping hardware units for the same Segment. If a DRHD structure with INCLUDE_PCI_ALL flag Set is reported for a Segment, it must be enumerated by BIOS after all other DRHD structures for the same Segment. A DRHD structure with INCLUDE_PCI_ALL flag Set may use the ‘Device Scope’ field to enumerate I/OxAPIC and HPET devices under its scope. l If Clear, this remapping hardware unit has under its scope only devices in the specified Segment that are explicitly identified through the ‘Device Scope’ field. Bits 1-7: Reserved. |
Reserved |
1 |
5 |
Reserved (0). |
Segment Number |
2 |
6 |
The PCI Segment associated with this unit. |
Register Base Address |
8 |
8 |
Base address of remapping hardware register-set for this unit. |
Device Scope [] |
- |
16 |
The Device Scope structure contains one or more Device Scope Entries that identify devices in the specified segment and under the scope of this remapping hardware unit. |
The Device Scope Structure is made up of one or more Device Scope Entries. Each Device Scope Entry may be used to indicate a PCI endpoint device, a PCI sub-hierarchy, or devices such as I/OxAPICs or HPET (High Precision Event Timer). In this section, the generic term ‘PCI’ is used to describe conventional PCI, PCI-X, and PCI-Express devices. Similarly, the term ‘PCI-PCI bridge’ is used to refer to conventional PCI bridges, PCI-X bridges, PCI Express root ports, or downstream ports of a PCI Express switch. A PCI sub-hierarchy is defined as the collection of PCI controllers that are downstream to a specific PCI-PCI bridge. To identify a PCI sub-hierarchy, the Device Scope Entry needs to identify only the parent PCI-PCI bridge of the sub-hierarchy.
Field |
Byte Length |
Byte Offset |
Description |
Type |
1 |
0 |
The following values are defined for this field. ? 0x01: PCI Endpoint Device - The device identified by the ‘Path’ field is a PCI endpoint device. This type must not be used in Device Scope of DRHD structures with INCLUDE_PCI_ALL flag Set. ? 0x02: PCI Sub-hierarchy - The device identified by the ‘Path’ field is a PCI-PCI bridge. In this case, the specified bridge device and all its downstream devices are included in the scope. This type must not be in Device Scope of DRHD structures with INCLUDE_PCI_ALL flag Set. ? 0x03: IOAPIC - The device identified by the ‘Path’ field is an I/O APIC (or I/O SAPIC) device, enumerated through the ACPI MADT I/O APIC (or I/O SAPIC) structure. ? 0x04: MSI_CAPABLE_HPET1 - The device identified by the ‘Path’ field is an HPET device capable of generating MSI (Message Signaled interrupts). HPET hardware is reported through ACPI HPET structure. Other values for this field are reserved for future use. |
Length |
1 |
1 |
Length of this Entry in Bytes. (6 + X), where X is the size in bytes of the “Path” field. |
Reserved |
2 |
2 |
Reserved (0). |
Enumeration ID |
1 |
4 |
When the ‘Type’ field indicates ‘IOAPIC’, this field provides the I/O APICID as provided in the I/O APIC (or I/O SAPIC) structure in the ACPI MADT (Multiple APIC Descriptor Table). This field is treated reserved (0) for all other ‘Type’ fields. |
Start Bus Number |
1 |
5 |
This field describes the bus number (bus number of the first PCI Bus produced by the PCI Host Bridge) under which the device identified by this Device Scope resides. |
Path |
2 * N |
6 |
Describes the hierarchical path from the Host Bridge to the device specified by the Device Scope Entry.
For example, a device in a N-deep hierarchy is identified by N {PCI Device Number, PCI Function Number} pairs, where N is a positive integer. Even offsets contain the Device numbers, and odd offsets contain the Function numbers.
The first {Device, Function} pair resides on the bus identified by the ‘Start Bus Number’ field. Each subsequent pair resides on the bus directly behind the bus of the device identified by the previous pair. The identity (Bus, Device, Function) of the target device is obtained by recursively walking down these N {Device, Function} pairs.
If the ‘Path’ field length is 2 bytes (N=1), the Device Scope Entry identifies a ‘Root-Complex Integrated Device’. The requester-id of ‘Root-Complex Integrated Devices’ are static and not impacted by system software bus rebalancing actions.
If the ‘Path’ field length is more than 2 bytes (N > 1), the Device Scope Entry identifies a device behind one or more system software visible PCI-PCI bridges. Bus rebalancing actions by system software modifying bus assignments of the device’s parent bridge impacts the bus number portion of device’s requester-id. |
BIOS may report each such reserved memory region through the RMRR structures, along with the devices that requires access to the specified reserved memory region. Reserved memory ranges that are either not DMA targets, or memory ranges that may be target of BIOS initiated DMA only during pre-boot phase (such as from a boot disk drive) must not be included in the reserved memory region reporting. The base address of each RMRR region must be 4KB aligned and the size must be an integer multiple of 4KB. BIOS must report the RMRR reported memory addresses as reserved in the system memory map returned through methods such as INT15, EFI GetMemoryMap etc. The reserved memory region reporting structures are optional. If there are no RMRR structures, the system software concludes that the platform does not have any reserved memory ranges that are DMA targets.
The RMRR regions are expected to be used only for USB and UMA Graphics legacy usages for reserved memory. Platform designers must avoid or limit reserved memory regions since these require system software to create holes in the DMA virtual address range available to system software and its drivers.
Field |
Byte Length |
Byte Offset |
Description |
Type |
2 |
0 |
1 - Reserved Memory Region Reporting Structure |
Length |
2 |
2 |
Varies (24 + size of Device Scope structure) |
Reserved |
2 |
4 |
Reserved. |
Segment Number |
2 |
6 |
PCI Segment Number associated with devices identified through the Device Scope field. |
Reserved Memory Region Base Address |
8 |
8 |
Base address of 4KB-aligned reserved memory region. |
Reserved Memory Region Limit Address |
8 |
16 |
Last address of the reserved memory region. The reserved memory region size (Limit - Base + 1) must be an integer multiple of 4KB. |
Device Scope[] |
- |
24 |
The Device Scope structure contains one or more Device Scope entries that identify devices requiring access to the specified reserved memory region. The devices identified in this structure must be devices under the scope of one of the remapping hardware units reported in DRHD. |
This structure is applicable only for platforms supporting Device-IOTLBs as reported through the Extended Capability register. For each PCI Segment in the platform that supports Device-IOTLBs, BIOS provides an ATSR structure. The ATSR structures identifies PCI Express Root-Ports supporting Address Translation Services (ATS) transactions. Software must enable ATS on endpoint devices behind a Root Port only if the Root Port is reported as supporting ATS transactions.
Field |
Byte Length |
Byte Offset |
Description |
Type |
2 |
0 |
2 - Root Port ATS Capability Reporting Structure |
Length |
2 |
2 |
Varies (8 + size of Device Scope Structure) |
Flags |
1 |
4 |
? Bit 0: ALL_PORTS: If Set, indicates all PCI Express Root Ports in the specified PCI Segment supports ATS transactions. If Clear, indicates ATS transactions are supported only on Root Ports identified through the Device Scope field.
? Bits 1-7: Reserved. |
Reserved |
1 |
5 |
Reserved (0). |
Segment Number |
2 |
6 |
The PCI Segment associated with this ATSR structure. |
Device Scope [] |
- |
8 |
If the ALL_PORTS flag is Set, the Device Scope structure is omitted. If ALL_PORTS flag is Clear, the Device Scope structure contains Device Scope Entries that identifies Root Ports supporting ATS transactions. All Device Scope Entries in this structure must have a Device Scope Entry Type of 02h. |