Chinaunix首页 | 论坛 | 博客
  • 博客访问: 239939
  • 博文数量: 33
  • 博客积分: 2511
  • 博客等级: 少校
  • 技术积分: 391
  • 用 户 组: 普通用户
  • 注册时间: 2008-06-06 09:24
文章分类
文章存档

2011年(3)

2010年(9)

2009年(3)

2008年(18)

我的朋友

分类: LINUX

2010-11-02 12:26:58

In the , I explained how to modify SWP code to make use of compiler intrinsics. Using intrinsics hides the underlying detail needed to use the load and store exclusive instructions and the use of memory barriers. In this article I look at implementing atomic memory accesses in assembler.

In order to describe memory barriers, what they are and how they should be used, I need to describe two types of memory model, strongly ordered and weakly ordered. The strongly-ordered model is very natural for programmers. In this model, the order that a program writes data to memory is the order in which the data is observed being written into memory, that is, other programs sharing the data will "see" the same ordering regardless of the CPU that they are executing on.

For example, if a CPU writes a new X then writes a new Y, all other CPUs that subsequently read Y then a read X, will access either the new Y and new X, the old Y and the new X, or old Y and the old X. However, because the order of the write is strongly ordered as write X first then write Y, n o CPU will access the new Y and the old X .

Modern CPUs, such as ARM, optimize memory accesses via hardware mechanisms avoiding memory accesses when it can, for example by using a cache. When it has to access memory, it makes those memory accesses as efficient as possible. An example of efficient memory accesses is the speculative reading of memory into caches before the CPU asks for it. This could happen when a program is accessing an array in memory. Whilst the program is working on element N, the memory subsystem could be reading element N+1 (and above) into cache ready for use. These mechanisms can cause the memory accesses to be reordered .

Systems that are allowed to reorder memory accesses in this way are called weakly ordered memory systems. The upshot of this is that, whilst a (program running on a) CPU has a coherent view of its own memory accesses, those memory accesses will be observed differently by (programs running on) other CPUs. To take my example above, another CPU might easily access the new Y and the old X.

Unfortunately, being able to observe the same order of memory accesses is key when accessing memory atomically. If, in the above example, X is a spinlock and Y is the data structure being modified within the critical region protected by X, it is important that the accesses to X are observed by all other CPUs in the system as being before accesses to Y. What you need is some mechanism to make sure that all CPUs observe accesses to X before the accesses to Y and this mechanism is a memory barrier.

Taking load and store exclusives and memory barriers into account, an assembly version of acquiring a spinlock looks like this:
CODE
  1. ENTRY (_spin_lock)
  2. mov r1,#1
  3. 1: ldrex r2,[r0]
  4. cmp r2,#0
  5. strexeq r2, r1, [r0]
  6. cmpeq r2, #0
  7. bne 1b
  8. dmb
  9. mov r0,#0
  10. END (_spin_lock)

As I described in the previous article, a pair of load and store exclusives are used to acquire the lock. The ‘cmpeq r2, #0’ instruction checks that the exclusive store to the lock location failed, which it will do if another CPU has accessed the lock between the LDREX and STREX operations. A failure will cause the code to retry the load exclusive.

You will note the Data Memory Barrier (DMB) instruction that is issued once the lock has been acquired. The DMB guarantees that all memory accesses before the memory barrier will be observed by all of the other CPUs in the system before all memory accesses made after the memory barrier. This makes more sense if you consider that once a lock has been acquired, a program will then access the data structure(s) locked by the lock. The DMB in the lock function above ensures that accesses to the locked data structure are observed after accesses to the lock.

The unlock code looks like:
CODE
  1. ENTRY (_spin_unlock)
  2. mov r1, #0
  3. dmb
  4. str r1, [r0]
  5. END (_spin_unlock)

The write to unlock the spinlock uses a simple STR instruction (not a store exclusive). This is fine if there is only one owner of the lock, the most common case. If a critical region allows more than one CPU to access it, load and store exclusives would need to be used.

The DMB instruction ensures that accesses to the locked data structure(s) are observed before the lock is unlocked .
阅读(1643) | 评论(0) | 转发(0) |
0

上一篇:Linux 内核时钟

下一篇:TDD 的基本流程

给主人留下些什么吧!~~