Datarace detection will continue to be critical in the future. Past reserch[1,2] in this field generally divided into 3 groups: static, dynamic and postmotern.
Static analysis has higher coverage, it does not requires code execution (idle for OS); most of them based on type system, thus they are computational expensive and thorough test also incur scalability issue. They are also too conservative and would generate too many false alarm.
Dynamic and postmotern method can only detect race on the running path, but they have many runtime information which allow them to make more aggressive decisions. Deterministic Replay [3-9] is one type of postmotern method and can be used to detect data race. This type of method requires a whole tool chain from recorder to replayer and certain support to record and replay. It is not easy to make it right. FDR[4] model an one way issue 1G processor using simulator, with log size in dozens of Mb for 1s (34mb/1s on average, 6.25% for 4MB cache). Strata[6] record the global state, futher reduce the log size; it is attractive if it is scalable. RDR[5] reduce the log size of FDR to 4% by regulated the race edges and set/LRU timestamp. Rerun[7] and DeLorean[8] explore the fact that threads do not communicate all the time. Rerun has similar log size of RDR but with much lower hardware cost (0.2kb vs. 24kb). DeLorean has impressive low log size (2.1bits per kilo instructions vs. 3bytes of Rerun), but DeLorean runs on Bulk processor, a specially design platform; its performance is not as good as Rerun.
Dynamic test or on the fly test is the most worthy way. The advantage is that they detect datarace on the fly, thus can use most intimate run time information. Generally there are two types of algorithms in this field: happen before and lockset.
But there are several drawbacks in this type.
First, it require access history to performance the check. Although postmotern methods also require that, but their log could be saved in other places and only use when replay; in dynamic detection, the access history is needed immediately. So although theoratically the method can detect any races, but in real application they are limited by the access history. [18] increases the granularity of datarace detection in order to lower the cost. [19] proposed "weaker-than" relation to reduce the amount of history; it also incoporates static analysis to reduce detection candidate.[3] proposed a method to reduce the history in run time. [10] theoratically proved that the history can be limited to the last write and all the concurrent read after that write, on a new write the history could be discarded. [11] observes that 4 recent conflicting accesses are enough to exposed a datarace. This problem may not be so complicated as it sounds.
It relies on instrumentation, which makes it not practical in OS detection. [12] use profile technique to reduce the overhead of datarace detection. It is based on the notion that frequently tested code are less likely to have datarace. This rule applies to OS which is a thoroughly tested bulk of code. Furthermore, My current work using only instruction count to decide execution order and detect races; it can cheat OS and user space the same, so it may reduce the severness of this problem.
On the fly detection can only detect races on current execution path. Yes, it is right! But computers are invented for repeated work, so run the test as many times as possible can solve this problem for most cases. [13] use datarace detection to help compiler detect cycles in program execution; according to the author (UIUC), their method is to run the program repeatedly until the error report become stable.
In recent years, there are novel techniques proposed in this field. These techniques either do not rely on tranditional methods or incoporate new conception to significantly lower the overhead of them. See recent datarace detection.
In my view, as the hardware resource become abundant,
more and more software functionality can be offloaded to the hardware,
bringing higher speed. But the hardware space can not be as unlimited
as software, so careful tuning is still necessary. Hardware
implementation data race detection requires careful study of the
generated instruction information characteristic and design the
hardware accordingly. For example, the access history can be
dynamically discarded during run time, so we can allocate necessary
storing space for them, but the space does not need to be too large. On
the other hand, to select dynamical discarded information we need more
computation which will be speedup by hardware.
The other result of abundant hardware resource is that CPU core
number is increasing continually, thus brings scalability issue for
many aspects of system design. In my view, hardware data race detection
also encounters such a problem. A global checking module is not
practical if the core number is large. This is a general issue not only applicable for data race. Within future architecture, interconnection network would play an important role in scalable inter CPU core communication; its central idea is to decentralize the communication, thus allowing substantial overlap and distribution. Software reliability issues like data race detection can shift to this new platform without influencing the critical path operations. That is my current work on implementing data race on interconnection network.
There are even more aggressive techniques beyond datarace detection or deterministic replay which are surviving technique; there are avoiding technique.
[1] Aoun Raza. "A Review of Race Detection Mechanisms". 2006
[2] Nels E. Beckman. "A Survey of Methods for Preventing Race Conditions".
[3] M. Ronse and K. De Bosschere. RecPlay : A Fully Integrated Practical Record/Replay System. In ACM Transactions on Computer Systems,1999
[4] Min Xu, Rastislav Bodik, Mark D. Hill1. "A “Flight Data Recorder” for Enabling Full-system Multiprocessor Deterministic Replay". ISCA'03
[5] Min Xu,Rastislav Bodík,Mark D. Hill. "A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording". ASPLOS XII,2006
[6] Satish Narayanasamyy, Cristiano Pereiray, Brad Calder "Recording Shared Memory Dependencies Using Strata" ASPLOS'06
[7] Derek R. Hower, Mark D. Hill. "Rerun: Exploiting Episodes for Lightweight Memory Race Recording" ISCA'08
[8] P. Montesinos, L. Ceze, and J.Torrellas.
"DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently", ISCA'08
[9] R. H. B. Netzer. Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs. In Proceedings of the Workshop on Parallel and Distributed Debugging (PADD),1993.
[10] Utpal Banerjee, Brian Bliss, Zhiqiang Ma, Paul Petersen. "A Theory of Data Race Detection". PADTADIV,July 17, 2006
[11] Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou. "Learning from Mistakes— A Comprehensive Study on Real World Concurrency Bug Characteristics". ASPLOS'08
[12]
[13] Yuelu Duan, Xiaobing Feng, Lei Wang, Chao Zhang, Pen-Chung Yew. "Detecting and Eliminating Potential Violation of Sequential Consistency for Concurrent C/C++ Programs".
[14] Adrian Nistor,Darko Marinov,Josep Torrellas.
"Light64: Lightweight Hardware Support for Data RaceDetection during Systematic Testing of Parallel Programs"
[15] Tayfun Elmas, Shaz Qadeer, and Serdar Tasiran. "Goldilocks: Eciently Computing the Happens-Before Relation Using Locksets". PLDI'06
[17] Shantanu Gupta, Florin Sultan, Srihari Cadambi, Franjo Ivancic, Martin Rotteler, "Using hardware transactional memory for data race detection," ISPDP'09
[18] Christoph von Praun and Thomas R. Gross. "Object Race Detection". OOPSLA'01
[19] Jong-Deok Choi, Keunwoo Lee, Alexey Loginov, Robert O’Callahan, Vivek Sarkar, Manu Sridharan. "Efficient and Precise Datarace Detection for Multithreaded Object-Oriented Programs". PLDI’02.
阅读(1212) | 评论(0) | 转发(0) |