PARALLEL ALGORITHMS OVER RED-BLACK TREES-zhoujianpingboy-ChinaUnix博客

健平的ChinaUnix博客

首页　| 　博文目录　| 　关于我

zhoujianpingboy

博客访问： 233948
博文数量： 48
博客积分： 0
博客等级：民兵
技术积分： 412
用户组：普通用户
注册时间： 2013-04-24 10:27

个人简介

Continuous pursuit technical details

文章分类

全部博文（48）

数据结构和算法（2）
STL（1）
KERNEL（6）
PAM（3）
C++（2）
C++ Example（18）
TOOL（2）
LINUX（2）
NETWORKS（12）
未分配的博文（0）

文章存档

2014年（1）

2013年（47）

我的朋友

相关博文

PARALLEL ALGORITHMS OVER RED-BLACK TREES

分类：高性能计算

2014-04-12 21:31:03

PARALLEL ALGORITHMS OVER RED-BLACK TREES

Until now, in many practical problems, the binary trees was not used , by the difficult for to distribute the elements stored between an arbitrary number of cores, and the difficult for to develop parallel algorithms. This limitation force to use only 1 thread with these data structures. And with big problems, usually it is not a good solution.

Recently I presented a new library named COUNTERTREE, in Boost () , for the final approval. This library provide a concurrent and thread safe implementation of the tree based data structures (set, multiset, map and multimap, plus a unordered tree named vector_tree) with random access iterators and access by position like a vector.

This permit to distribute the elements stored in the trees, between an arbitrary number of threads, identically as do with the vectors. Over it you can use many parallel algorithms designed for vectors ( as quicksort over an unsorted tree) and making easy the design of parallel algorithms applied to the tree based data structures.

These data structures are based on a kind of augmented red-black trees, where each node have a counter with with the number of nodes under which permit the access by position, like in a vector, to the elements stored in the tree, and random access iterators. I named this tree of counters CounterTree

Countertree diagram

The parallel algorithms over these trees, open a wide range of areas where they was not used until now, and make to be interested reevaluate the algorithms used in many applications.

These algorithms, many times present problems of lock contentions when the number of threads grows. The design of algorithms without this problem, and the strategies to adopt when you can't avoid, is commented, in the point . You have a brief resume

If you are interested, you have a brief guide , where provide a global vision of the project and their parts. I think is a good starting point in order to examine the library.

You can find the code and the documentation in my web pages in dropbox

or if you prefer in a git format

Please, if you have any idea , any comment or see any fail, any lack, please say me in order to improve the library, and provide between all ,a fast, robust and reliable tool to the C++ programmers.

If you want more information, need something, please, say me.

Thanks by your interest

Francisco Jose Tapia
e-mail :

*The project is pending of the final approval since more than a year ago due to the absence of reviewers
*All the links of this message are to my github, and dropbox pages, and the boost library

MANYCORE STRATEGIES

Francisco Jose Tapia (fjtapia@gmail.com)

PROBLEM DESCRIPTION

Until now, the main problem in the design of parallel algorithms over tree data structures had been the difficult to assign the elements stored in the tree between an arbitrary number of threads. With this library, this problem is removed. Now, it's time to face the next problems in the design of parallel algorithms.

The main problem is the lock contention when the number of threads grows, in the operations which need an exclusive lock over the data structure ( insertion, deletion, modification ...). The lock contention appear when the number of threads doing exclusively operations with exclusive lock is between 4 to 8, depending of the HW used.

In order to talk about parallel algorithms we have two kinds :

a) Algorithms with lock contention , when the number of threads grows.
b) Algorithms without lock contention when the number of threads grows.

All the algorithms mentioned here, or are implemented or are designed on paper, for to be implemented in next versions. Some of them can be implemented immediately, but others need to modify the internal algorithm of the tree, in order to implement operations of split and join over trees.

These new algorithms affect to the semantic of the functions, by example how to pass to the tree destructor the number of threads to use? The solution to these problems need a study in deep for to evaluate the options and select the best solution. In the documentation this is described in the point

SOLUTIONS

This document expose my viewpoint about the problems and the possible solutions. It's only an opinion, open to new ideas

With the trees we can find the next operations :

1.- CONSULT

Any number of threads can access simultaneously to the tree without lock contention problems.

2.- ACCESS TO THE ELEMENTS.

Any number of threads can access simultaneously to the tree without lock contention problems, but without the access by position it is very difficult to distribute between an arbitrary number of threads, imagine a tree with 200000000 nodes and a Xeon Phis with 244 threads.)

3.- INSERTION AND DELETION.

These operations present problems of lock contentions when the number of threads grow and intensive use. But there is a strategy in order to minimize the problem.

If the number of operations is small, the problem is small too. But if you have a great number of operations, you have an algorithm for to resolve it. This algorithm need support from the tree algorithms, operations of split the tree ( we have sub trees ordered) and the join operations of ordered sub trees. (These operations don't be in the actual version of the algorithm, and I expect to include in the next versions, as comment in the point

The idea is split the main tree in several sub trees and each thread manages a sub tree. When all the threads finish the operations, we join all the trees obtaining a new main tree.

You need to sort all the operations to do, and assign to each thread a range of operations. After this, split the tree between the number of threads and do all the operation over each sub tree. When finish, join the sub trees and it's done.

In the picture you can see, we have 8 operations to insert and we have 4 threads. The vector are the elements to insert and we have 4 threads.

Each sub tree have the same number of operations to do, and when finish, join the sub trees obtaining a new main tree.

This algorithm can be implemented with any number of threads without lock contentions. The operations of split and join the sub trees are similar in cost to the insertion or the deletion of a node in the tree.

4.- COPY AND DELETE THE TREE.

Actually the compilers are using an algorithm O(N) which clone the tree. Even with this algorithm, these operations spend a great time if the tree is big.

I have designed on paper for to be implemented with the actual version of the counter trees.

? Parallel algorithm for to copy and delete tree structures. These algorithms work with any number of threads
without lock contentions problems

? Parallel algorithms of copy the tree from a vector, and copy a vector from the tree. These algorithms work with any
number of threads without lock contentions problems

5.- UNION, INTERSECTION AND NON INTERSECTION OPERATIONS

With the access by position it's easy design parallel algorithms for to do these operations , but have lock contention in the insertion of the result in the final data structure. With the operations of split and join the tree we can design thesealgorithms without lock contentions.

The lock mechanism of the library, in the operations with exclusive lock ( Insert, delete , modification), permit in safe mode the concurrency with operations without exclusive lock ( read) in a part of the operation, trying to minimize the time of lock (you have a description in the point . This don't resolve the problem but try to minimize the problems of lock contentions.

I think the access by position is not a magic solution to the problems of the trees, but I think provide a new perspective, and perhaps it would be interested to reevaluate the optimal algorithm ( performance, space used, flexibility ...), in many real situations, according to this new perspective.

Thanks by your attention. If you have any question, comment, please, say me and I will try to respond you

Francisco Tapia
fjtapia@gmail.com

You can find the code and the documentation in my web pages in dropbox

or if you prefer in a git format

阅读(3359) | 评论(0) | 转发(0) |

上一篇：shm_client.c

下一篇：没有了

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6