Deterministic reaply[1,2,3] behaves like a recorder; it accurately records all the execution, thus when encourntering an error, we can repaly the execution to see what happen. Note that only the global memory accesses are concerned here, as they are the source of non deterministic error. If we fix the order of them, we fix the order of the whole execution in the sense that we can get the same result later.
To record the order, we need an initial input and architecture state which contains all the register, all the load values because they are the input of the running program. Recording is essencially to record the memory races. Replaying the race is enough to replay the whole execution.
Related work
——software
InstantReplay[5]: software/no full system
Netzer[6]: transitive reduction to reduce log size
DejaVu[8]: assume single processor
——hardware
Bacon&Goldstein[7]:
large log size because of the snoop coherence messages;
FDR[1]:
full system recorder, piggy back on directory cache coherence messages; transitive reduce log size;
RDR[2]:
enhance FDR, further reduce the message by incuring false race edge
Strata[3]:
using strata to cut the program into segments, significatly reduce the log size from FDR and RDR, ease understanding(for me:-)), write global information of each hardware context;
ReRun[4],Delorean[8]:
record how long a thread does not conflict with other threads(episode); replay sequentially(scalar clock for every episode); record each episode's r/w sets;
execution in bulk(similar to episode, but confliction does not end the bulk); underlying hardware guarentee that the bulk execute one by one(for those with shared accesses), so recorder only needs to records bulk execution order;
Rerun has higher performance and larger log size, while DeLorean has smaller log size and lower performance (branch)
[1] Min Xu, Rastislav Bodik, Mark D. Hill1. "A “Flight Data Recorder” for Enabling Full-system Multiprocessor Deterministic Replay". ISCA'03
[2] Min Xu,Rastislav Bodík,Mark D. Hill. "A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording". ASPLOS XII,2006
[3] Satish Narayanasamyy, Cristiano Pereiray, Brad Calder "Recording Shared Memory Dependencies Using Strata" ASPLOS'06
[4] Derek R. Hower, Mark D. Hill. "Rerun: Exploiting Episodes for Lightweight Memory Race Recording" ISCA'08
[5] T. J. Leblanc and J. M. Mellor-Crummey. “Debugging Parallel Programs with Instant Replay”. IEEE Transactions on Computers, 1987
[6] R. H. B. Netzer. Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs. PADD'93
[7] D. F. Bacon and S. C. Goldstein. Hardware-Assisted Replay of Multiprocessor Programs 1991
[8] P. Montesinos, L. Ceze, and J. Torrellas. "DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Eciently", ISCA'08
阅读(932) | 评论(0) | 转发(0) |