Deterministic Replay-wuweidan-ChinaUnix博客

wuweidan的ChinaUnix博客

首页　| 　博文目录　| 　关于我

wuweidan

博客访问： 42288
博文数量： 47
博客积分： 1000
博客等级：少尉
技术积分： 490
用户组：普通用户
注册时间： 2009-12-02 21:19

文章分类

全部博文（47）

DRAM Technology（3）
Others（8）
Engineer Jobs（3）
Undergraduate wo（3）
CPU microarchite（18）

Rigel（2）

interconnection （3）
Concurrency Bug（10）
未分配的博文（2）

文章存档

2010年（1）

2009年（46）

我的朋友

最近访客

cynthia

推荐博文

Deterministic Replay

分类：

2009-12-03 17:33:41

Deterministic reaply[1,2,3] behaves like a recorder; it accurately records all the execution, thus when encourntering an error, we can repaly the execution to see what happen. Note that only the global memory accesses are concerned here, as they are the source of non deterministic error. If we fix the order of them, we fix the order of the whole execution in the sense that we can get the same result later.

To record the order, we need an initial input and architecture state which contains all the register, all the load values because they are the input of the running program. Recording is essencially to record the memory races. Replaying the race is enough to replay the whole execution.

Related work

——software

InstantReplay[5]: software/no full system

Netzer[6]: transitive reduction to reduce log size

DejaVu[8]: assume single processor

——hardware

Bacon&Goldstein[7]:

large log size because of the snoop coherence messages;

FDR[1]:

full system recorder, piggy back on directory cache coherence messages; transitive reduce log size;

RDR[2]:

enhance FDR, further reduce the message by incuring false race edge

Strata[3]:

using strata to cut the program into segments, significatly reduce the log size from FDR and RDR, ease understanding(for me:-)), write global information of each hardware context;

ReRun[4],Delorean[8]:

record how long a thread does not conflict with other threads(episode); replay sequentially(scalar clock for every episode); record each episode's r/w sets;

execution in bulk(similar to episode, but confliction does not end the bulk); underlying hardware guarentee that the bulk execute one by one(for those with shared accesses), so recorder only needs to records bulk execution order;

Rerun has higher performance and larger log size, while DeLorean has smaller log size and lower performance (branch)

[1] Min Xu, Rastislav Bodik, Mark D. Hill1. "A “Flight Data Recorder” for Enabling Full-system Multiprocessor Deterministic Replay". ISCA'03

[2] Min Xu，Rastislav Bodík，Mark D. Hill. "A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording". ASPLOS XII,2006

[3] Satish Narayanasamyy, Cristiano Pereiray, Brad Calder "Recording Shared Memory Dependencies Using Strata" ASPLOS'06

[4] Derek R. Hower, Mark D. Hill. "Rerun: Exploiting Episodes for Lightweight Memory Race Recording" ISCA'08

[5] T. J. Leblanc and J. M. Mellor-Crummey. “Debugging Parallel Programs with Instant Replay”. IEEE Transactions on Computers, 1987

[6] R. H. B. Netzer. Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs. PADD'93

[7] D. F. Bacon and S. C. Goldstein. Hardware-Assisted Replay of Multiprocessor Programs 1991

[8] P. Montesinos, L. Ceze, and J. Torrellas. "DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Eciently", ISCA'08

阅读(944) | 评论(0) | 转发(0) |

上一篇：More aggressive technic in fighting concurrency

下一篇：GPU A closer look

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6