Please refer to
Thank you !
A data race[17] occurs when two threads concurrently access a shared memory location without knowing each, thus introduce inocherence result. Data race detection is essencially to detect whether a data race would not occurs, that is if two conflicing accesses are doom to happen in a fixed order, then the execution result must be coherent.
When we talk about datarace, we refer to apparent race[17] that can be detected by consiering the relation of data access and synchronization. All the proposed algorithm is based on detecting synchronization point and then deduce execution order to check potential races. The other kind, feasible race[17], is based on the possible behavior of the program; it may contains atomicity violoation[22] and other order errors[22]. This two types of races are included in data race, which consider the failure in critical section execution; while general race consier the failure of program that intend to be deterministic, such failure may result from an I/O or interrupt.
Data race detection algorithms discussed here deal with apparent data race. Feasible race exceed the ability of basic data race detection and requires the knowledge of program expection. There are technique[23,26] to handle it. These technique all based on execution invariant. For general race, the key is to make it deterministic. There are already some effort[24,25] on this issue. These work try to limit or even eliminate the possibility of non deterministic execution. They are not in the scope of this article.
Dynamic Data Race Detection
Traditionally, there are two types of detection algorithm: happen before[1,3,4,5,6] and lockset[7,8,9].
Happen before algorithm
Genreally speaking, happen before algorithm records the execution time of every thread, and threads associate their current time to each operations. So to detect whether two operations from two threads are concurrent or not, we just need to compare their time to decide the order. If there are global time, this work would be simple. But data race comes from distribute system where global time is not available. So what we have is a only execution time of each thread[14]. Thus in happen before algorithm, we record each thread's time and tranfer it on synchronizations. Every thread maintains other thread's time too, thus in implementation, every thread maintain a vector clock[14,21], each entry corresponds to the time of one thread. Comparison is as followed.
The idea behind this is that if a thread konws
the lates execution time of another thread, then
they are synchronized. If not, then they have no
interactions, so they may be concurrent.
Happen before is general as it considers all types of synchronizations; happen-before relation is a present of time. It needs to distinguish syncronization and do the transporation on them; thus the maintanance and update of vector clock introduce substantial work. Moreover, it suffers from the complexity in implementation and sentivity of thread interleaving, as shown in the next figure.
the unlock and lock sequense would be
detected by happen before algorithm, so
race on x can not always be detected; if
thread1 executes first, as the figure shows,
then the race is hidden; if thread2 executes
first, then race on x is exposed
Lockset algorithm
Lockset[8] is proposed to complement this shortcome. In the above example, there is no lock protecting x, so no matter in what sequence the two thread execute, lockset algorithm would report a race.
The algorithm works as followed: Every thread maintains a lock set that it currently holds, and every variable maintains a lockset that current protects it. When detecting, the two lockset intersept. An empty result indicates that the thread does not hold a lock that can protect the variable, thus it may conflicts with other thread without proper synchronizations.
Lockset is low overhead in implementation, as it only requires maintaining lockset and does intersecption. The lockset can be implemented as bloomfilter[15] and attached global variable. In software implementations such as [16], the framwork maintains a shallow variable for each variable and records its lock set in it, each thread maintains its own lock set; in hardware implementation[10], the cache line is expanded to hold the lock set for each cache line, CPU core has a lock set register for the lock set of currently running thread.
This algorithm is general in another aspect: it can detect races that do not occur in current execution; while happen before can only detect races happen now. But lockset algorithm assume that lock is the only type of synchronizations, so it does not work in other environment.
Hybrid algorithm
It is obvious that both algorithms have their advantage and disadvantage. So a combined one[18,19,20] is better. [18] bases on the observation that happen-before algorithm can prune the false alarm generated by lockset; so they start from a lockset algorithm and incoporate limited happen-before check. [19] can detect races on the granularity of object and variable, it combined jilt and lockset; [20]
[1] D. Perkovic and P. J. Keleher. "Online data-race detection via coherency guarantees" OSDI'06
[2] S. Lu, P. Zhou, W. Liu, Y. Zhou, and J. Torrellas. "PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection" MICRO'06
[3] J. Mellor-Crummey. "On-the-fly detection of data races for programs with nested fork-join parallelism". In Supercomputing'91
[4] A. Dinning and E. Schonberg. "An empirical comparison of monitoring algorithms for access anomaly detection". InPPOPP '90:
[5] M. Prvulovic. "CORD: Cost-effective (and nearly overheadfree) Order-Recording and Data race detection". In HPCA'06
[6] M. Prvulovic and J. Torrellas. "ReEnact: Using thread-level speculation mechanisms to debug data races in multithreaded codes". In ISCA'03.
[7] J.-D. Choi, K. Lee, A. Loginov, R. O'Callahan, V. Sarkar,and M. Sridharan. "Efficient and precise datarace detection for multithreaded object-oriented programs". In Proceeding of the ACM SIGPLAN 2002
[8] S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. "Eraser: A dynamic data race detector for multithreaded programs". ACM Transactions on Computer Systems,1997
[9] C. von Praun and T. R. Gross. "Object race detection". In OOPSLA'01
[10] Pin Zhouxy, Radu Teodorescuy and Yuanyuan Zhouy "HARD: Hardware Assisted Locksetbased Race Detection". HPCA'07
[11] S. L. Min and J.-D. Choi. An effcient cache-based access anomaly detection scheme. In ASPLOS-IV
[12] M. Prvulovic and J. Torrelas. Reenact: Using thread-level speculation mechanisms to debug data races in multithreaded codes. ISCA'03
[13] F. Qin, J. Tucek, J. Sundaresan, and Y. Zhou. Rx: treating bugs as allergies—a safe method to survive software failures. In SOSP ’05:
[14] L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558–565, 1978
[15] B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422~426, 1970.
[16] Nicholas Nethercote,Julian Seward. How to Shadow Every Byte of Memory Used by a Program. VEE 2007.
[17] Robert H. B. Netzer, Barton P. Miller. "What are Race Conditions? Some Issues and Formalizations". 1992
[18] Robert O'Callahan,Jong-Deok Choi. "Hybrid Dynamic Data Race Detection". PPoPP’03.
[19] E. Pozniansky and A. Schuster. Effcient on-the-fiy data race detection in multithreaded c++ programs. In PPoPP '03
[20] Y. Yu, T. Rodeheffer, and W. Chen. RaceTrack: Effcient Detection of Data Race Conditions via Adaptive Tracking. In SOSP '05:
[21] C. J. Fidge. Timestamp in message passing systems that preserves partial ordering.
[22] Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou. "Learning from Mistakes— A Comprehensive Study on Real World Concurrency Bug Characteristics". ASPLOS'08
[23] Shan Lu, Joseph Tucek, Feng Qin and Yuanyuan Zhou. AVIO: Detecting Atomicity Violations via Access Interleaving Invariants. ASPLOS'06
[24] Jie Yu, Satish Narayanasamy. "A Case for an Interleaving Constrained Shared-Memory Multi-Processor" ISCA09
[25] Joseph Devietti Brandon Lucia Luis Ceze Mark Oskin. "DMP: Deterministic Shared Memory Multiprocessing" ASPLOS’09
[26] P. Zhou, W. Liu, F. Long, S. Lu, F. Qin, Y. Zhou, S. Midkiff, and J. Torrellas. AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-based Invariants