Getting Physical With Memory-garyybl-ChinaUnix博客

DDUP-Day Day Upgaryybl.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

garyybl

博客访问： 3080946
博文数量： 674
博客积分： 17881
博客等级：上将
技术积分： 4849
用户组：普通用户
注册时间： 2010-03-17 10:15

文章分类

全部博文（674）

Windows（0）
项目管理（0）
数据结构（3）

链表（0）

kfifo（3）
DEFY刷机（8）
pandaboard（10）
硬件知识（1）
编程语言（11）

C语言（10）
文件系统（17）
系统管理（54）

版本管理（16）
DSP（0）
业界新闻（2）
FPGA（3）
misc（17）
网络开发（18）
移植（40）

SSH（2）

系统移植（4）

驱动移植（3）

流媒体（4）

WebServer（3）

DHCP（3）

软件移植（9）

Android移植（10）
工具命令（9）
Linux Driver（200）

调试（1）

GPU（2）

FDT（0）

MIPI（2）

smd（1）

hdmi（3）

PMIC（1）

DDR（0）

DMA（6）

3G（10）

电源管理（30）

Bluetooth（4）

Input（2）

WIFI（34）

Flash（9）

Uart（7）

SPI（0）

I2C（2）

Audio（11）

SD（14）

LCD（8）

Camera（2）

TouchScreen（19）

USB（28）

启动顺序（4）
Andriod（88）

NDK（2）

Audio（6）

Android开发（34）

Android应用（18）

Andord驱动（18）
读书笔记（47）

深入理解LINUX内（2）

By 张斌（8）

By 卿子（16）

内核设计与实现（0）

LDD_Tekkaman Nin（21）

LDD（0）
BootLoader（11）
Linux Kernel（108）

通知链（4）

Kbuild（1）

系统调用（2）

参数传递（3）

并发&同步（1）

内存管理（19）

内核启动（44）

中断机制（17）

Timer&Clock（17）
未分配的博文（27）

文章存档

2013年（34）

2012年（146）

2011年（197）

2010年（297）

我的朋友

相关博文

Getting Physical With Memory

分类： LINUX

2010-04-26 21:02:31

When trying to understand complex systems, you can often learn a lot by stripping away abstractions and looking at their lowest levels. In that spirit we take a look at memory and I/O ports in their simplest and most fundamental level: the interface between the processor and bus. These details underlie higher level topics like thread synchronization and the need for the Core i7. Also, since I’m a programmer I ignore things EE people care about. Here’s our friend the Core 2 again:

Physical Memory Access

A Core 2 processor has 775 pins, about half of which only provide power and carry no data. Once you group the pins by functionality, the physical interface to the processor is surprisingly simple. The diagram shows the key pins involved in a memory or I/O port operation: address lines, data pins, and request pins. These operations take place in the context of a transaction on the front side bus. FSB transactions go through 5 phases: arbitration, request, snoop, response, and data. Throughout these phases, different roles are played by the components on the FSB, which are called agents. Normally the agents are all the processors plus the northbridge.

We only look at the request phase in this post, in which 2 packets are output by the request agent, who is usually a processor. Here are the juiciest bits of the first packet, output by the address and request pins:

FSB Request Phase, Packet A

The address lines output the starting physical memory address for the transaction. We have 33 bits but they are interpreted as bits 35-3 of an address in which bits 2-0 are zero. Hence we have a 36-bit address, aligned to 8 bytes, for a total of addressable physical memory. This has been the case since the Pentium Pro. The request pins specify what type of transaction is being initiated; in I/O requests the address pins specify an I/O port rather than a memory address. After the first packet is output, the same pins transmit a second packet in the subsequent bus clock cycle:

FSB Request Phase, Packet B

The attribute signals are interesting: they reflect the 5 types of memory caching behavior available in Intel processors. By putting this information on the FSB, the request agent lets other processors know how this transaction affects their caches, and how the memory controller (northbridge) should behave. The processor determines the type of a given memory region mainly by looking at page tables, which are maintained by the kernel.

Typically kernels treat all RAM memory as write-back, which yields the best performance. In write-back mode the unit of memory access is the cache line, 64 bytes in the Core 2. If a program reads a single byte in memory, the processor loads the whole cache line that contains that byte into the L2 and L1 caches. When a program writes to memory, the processor only modifies the line in the cache, but does not update main memory. Later, when it becomes necessary to post the modified line to the bus, the whole cache line is written at once. So most requests have 11 in their length field, for 64 bytes. Here’s a read example in which the data is not in the caches:

Memory Read Sequence Diagram

Some of the physical memory range in an Intel computer is mapped to devices like hard drives and network cards instead of actual RAM memory. This allows drivers to communicate with their devices by writing to and reading from memory. The kernel marks these memory regions as uncacheable in the page tables. Accesses to uncacheable memory regions are reproduced in the bus exactly as requested by a program or driver. Hence it’s possible to read or write single bytes, words, and so on. This is done via the byte enable mask in packet B above.

The primitives discussed here have many implications. For example:

Performance-sensitive applications should try to pack data that is accessed together into the same cache line. Once the cache line is loaded, further reads are much faster and extra RAM accesses are avoided.
Any memory access that falls within a single cache line is guaranteed to be atomic (assuming write-back memory). Such an access is serviced by the processor’s L1 cache and the data is read or written all at once; it cannot be affected halfway by other processors or threads. In particular, 32-bit and 64-bit operations that don’t cross cache line boundaries are atomic.
The front bus is shared by all agents, who must arbitrate for bus ownership before they can start a transaction. Moreover, all agents must listen to all transactions in order to maintain cache coherence. Thus bus contention becomes a severe problem as more cores and processors are added to Intel computers. The Core i7 solves this by having processors attached directly to memory and communicating in a point-to-point rather than broadcast fashion.

These are the highlights of physical memory requests; the bus will surface again later in connection with locking, multi-threading, and cache coherence. The first time I saw FSB packet descriptions I had a huge “ahhh!” moment so I hope someone out there gets the same benefit. In the next post we’ll go back up the abstraction ladder to take a thorough look at virtual memory.

阅读(965) | 评论(0) | 转发(0) |

上一篇：Page Cache, the Affair Between Memory and Files

下一篇：memory-translation-and-segmentation

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6