Please refer to
Thank you!
In recent publication of datarace detection, reseracher proposed non traditional methods which incoporate many new conception or new enhancement.
Light64[1] obersers that if there is a race, then revert the order of conflicting accesses would result in different execution signiture[4]; this work is based on the bulk processor architecture[6], but I believe it works even there is no bulk support, requiring special signature generation during runtime.
[5] use profile technique to aid datarace detection, based on the fact that frequently tested code are less likely to have unexposed races, so they focus on profiling the infrequently tested code;
Gidilock [2] deduces happen before relationship using the conception of lock, it cheats other synchronization the same as lock and enhance the lockset with thread id to incoporate other synchronization primitive in the lockset framework, the paper also provide detailed provement of this conception.
Transactional memory seems to be a powerful method to eliminate datarace. However, as this framework divede program into TM and not TM (Transactional Memory), those code that require protection should be placed into TM. So using TM is similiar to using lock; inproper TM usage can lead to datarace as inproper lock usage. [3] proposed RaceTM to eliminate datarace in non TM section. It is not clear that how TM can be applied to lock based code[7].
Researcher also think about asymmethric datarace[8] where at least one thread access the shared variable safely while at least one access it unsafely. Their method is to maintain a local copy for safe access thus guarantee a correct result and at the same time detect datarace based on lockset algorithm. The performance is lowered by a factor of 2 !!!
My work
My work explores the transitive reduction of data accesses and synchronizations. According to the characterization of [9], they are apparent races. I try to develop a very low cost detection mechanism. This low cost requires a very simple conception that does not rely on special hardware or software support, or complex mechanism in implementation.
Traditional happen before algorithm use partial time to decide execution order; lockset use lock acquisition status to decide safty; Light64 and AVIO[10] detect execution deviations; Gdilock expand lock set to incoporate other synchronizations; [11] expand Helgrind and introduce detailed state machine to cover many corner cases or hybrid detection algorithm; while my work directly explore how data accesses and synchronization interact and remove timestamp as a medium between them.
Timestamp is the source of complexity, removing this complexity eases development and reduce performance degrade. My current algorithm only relies on instruction information (more accurately instruction count) to decide execution order. Every thread maintains its own instruction count, similart to the lockset cost as HARD[12]. This low requirement of execution information make it very feasible in all architectures. My algorithm needs to records access history, but limit it in a reasonable amount, about 3K for current 4core configuration. This low amount results from the observation that most access history can be dynamically dropped; I need to maintain only the last write and concurrent reads after that write, when the next write arrives, all the information can be discarded. There is another approximation here. The number of global variable may be large, but most of them are synchronized by the same synchronizations. This is very common in SPLASH2 where there is a lot of global arrary. So detecting one of them is enough to assure the safty of others. Thus there is no need to keep large amount of variabes, current access history only keep 16 variables. A NP hard problem can be practical only when we only want to deal with a part in time or in space. That is the reason while data race can be delt with without efficient theory.
FDR use transitive reduction in replay, but it does not deduce transitive reduction across multiple threads and sometims uses false race edges to reduce log size. In data race detection, no one has ever used this notion directly before. The problem needed to be handled is how to deduce transitive reduction across multiple threads. In my current work, I use a combined breath and depth first search to find transitive path one by one. This is an O(n2) algorithm, but within a small amount of history, this search is acceptable for hardware. This search process is discussed in Hardware datarace detection.
[1] Adrian Nistor,Darko Marinov,Josep Torrellas.
"Light64: Lightweight Hardware Support for Data RaceDetection during Systematic Testing of Parallel Programs". MICRO’09.
[2] Tayfun Elmas, Shaz Qadeer, and Serdar Tasiran. "Goldilocks: Eciently Computing the Happens-Before Relation Using Locksets". PLDI'06
[3] Shantanu Gupta, Florin Sultan, Srihari Cadambi, Franjo Ivancic, Martin Rotteler, "Using hardware transactional memory for data race detection," ISPDP'09
[4] Abdullah Muzahid,Darío Suárez,Shanxiang Qi,Josep Torrellas. "SigRace: Signature-Based Data Race Detection". ISCA’09
[5] Ali Jannesari and Walter F. Tichy. "LiteRace: Effective Sampling for Lightweight Data-Race Detection". PADTAD’08
[6] Josep Torrellas, Luis Ceze, James Tuck, Calin Cascaval†, Pablo Montesinos, Wonsun Ahn, and Milos Prvulovic. "The Bulk Multicore Architecture for Improved Programmability".
[7] C. Blundell, C. Lewis and M. Martin, Deconstructing Transactional
Semantics: The Subtleties of Atomicity, Fourth Annual Workshop on Duplicating, Deconstructing, and Debunking,Madison, Wisconsin, 2005
[8] Paruj Ratanaworabhan, Martin Burtscher, Darko Kirovski, Benjamin Zorn, Rahul Nagpal, Karthik Pattabiraman. "Detecting and Tolerating Asymmetric Races" PPoPP’09
[9] Robert H. B. Netzer, Barton P. Miller. "What are Race Conditions? Some Issues and Formalizations". 1992
[10] Shan Lu, Joseph Tucek, Feng Qin and Yuanyuan Zhou. AVIO: Detecting Atomicity Violations via Access Interleaving Invariants. ASPLOS'06
[11] Ali Jannesari and Walter F. Tichy. "On-the-fly Race Detection in Multi-Threaded Programs". PADTAD’08
[12] Pin Zhouxy, Radu Teodorescuy and Yuanyuan Zhouy "HARD: Hardware Assisted Locksetbased Race Detection". HPCA'07
阅读(426) | 评论(0) | 转发(0) |