Please refer to
thank you!
When I start the datarace detection project, I consider two major aspects.
The first one is the detection method. Traditional methods of using vector clock to decide happen before relationship is already too mature. Lock-set based algorithm has little to improve within the algorithm itself. It overcomes the limitation of happen before algorithm and serves as a good complementation of happen before, although it suffers from the insufficient coverage of synchronization types. There are already hybrid methods to combine these two methods together and report bettern performance and accuracy. Some optimizations are proposed, all dedicate to lower the complexity and improve detection performance. [6] use weaer-than analysis to reduce log size and combine with static analysis before dynamic check to prune the testing space; [7] raise the detection granularity from variable to object based on the observation that the variables are always belong to a larger structure and equivilant from the perspective of datarace detection. Godilock[8] starts from lock set and incorporate other synchronization using additional information of thread ID, thus overcome the drawback of lockset and retains its advantage over happen before algorithm.
Recent years, researcher proposed novel datarace detection mechanism. [9] in PLDI09 incoporate profile concept to aid to prune the detection space of datarace; it argus that those cold areas are more likely to have datarace than hot areas which have been thoroughly tested. [10] in PLDI09 porpose a technique to dynamically adjust the happen before presentation to lower the overhead of this algorithm; it is based on the observation that not all the variables would have datarace. [11] in ISCA09 study datarace detection on bulk processor, which leverage the fact that alter the executino of an execution with datarace would generate different signature; the amount of signature is very small. So there is a trend in recent year that using novel methods to prune the detection space or detect datarace on special platform; they do not work on traditional methods or optimzation on traditional methods any more, as they are already mature.
My method is a novel one too; currently, I focus on lower the complexity of traditional happen before implementation, the source of complexity is vector clock. I try to use only instruction count to decide execution order, thus there is no need to maintain and update vector clock among threads, eliminating the need of inter processor communication. There is similar notion in CORD, which use scalar clock to replace the vector clock. But CORD still relies on the notion of transportation of clock value among threads, so it still have inter processor communication, instrumentation and maintainance work on synchornization, and there are many subtle consideration in order to make the scalar clock work. My notion using transitive reduction and cheating the synchronization the same as common data access significantly lower the complexity of design.
The second consideration is the detection level, that is on which level of the system should I detect the data race. Actually, there are multiple levels to performance this functionality, most of which are in the software level. [1-3] detect datarace in the jave machine level. They instrucment the JVM to aid datarace detection for jave programs. Helgrind[4] is a Dynamic binary analysis (DBA) tools which assigns shadow memory for every variable under detection, it is a runtime system which instruments the original binary program and executes it on a virtual machine simultaneouosly with the tool. Most of the past work reside on the application level. For example, [5] detects datarace using hybrid lockset and limited happen before algorithm; it instruments the Java program and run the race detector on the same Java virtual mathine with the program and performs the check on it.
Detection in the application level is the most nature and most portable way, but it incoporate new process in the system, which may pollute the original address space. Detection from the system level seems more attractive. For example, by expanding the thread structure we can incoporate the datarace detection in OS; we can even bring this functionality down to the CPU level. HARD bu Lu. in 2007 proposes a hardware implementation of Lockset algorithm which expand the CPU core and level2 cache tag to detect lock violation. As hardware resource become abundant, more and more software algorithm should be offloeded into hardware to get higher speed. My work is a total hardware implementation; by modifying the Godson3 microarchitecture and introduce new module, I detect datarace during CPU execution.
阅读(815) | 评论(0) | 转发(0) |