Zookeeper学习笔记 1-齐立-ChinaUnix博客

三味

首页　| 　博文目录　| 　关于我

齐立

博客访问： 9875
博文数量： 2
博客积分： 0
博客等级：民兵
技术积分： 10
用户组：普通用户
注册时间： 2015-05-21 21:44

文章分类

全部博文（2）

Zookeeper（2）
未分配的博文（0）

文章存档

2015年（2）

我的朋友

相关博文

Zookeeper学习笔记 1

分类： Java

2015-05-25 00:32:59

先说结论，之后会通过编程以及新的学习来验证：

zookeeper是通过实现一个分布式的具有数据一致性的文件系统来实现分布式的事物控制。官方文档上关于数据访问有这么一句话。

The data stored at each znode in a namespace is read and written atomically. Reads get all the data bytes associated with a znode and a write replaces all the data. Each node has an Access Control List (ACL) that restricts who can do what.

ZooKeeper was not designed to be a general database or large object store. Instead, it manages coordination data. This data can come in the form of configuration, status information, rendezvous, etc. A common property of the various forms of coordination data is that they are relatively small: measured in kilobytes. The ZooKeeper client and the server implementations have sanity checks to ensure that znodes have less than 1M of data, but the data should be much less than that on average. Operating on relatively large data sizes will cause some operations to take much more time than others and will affect the latencies of some operations because of the extra time needed to move more data over the network and onto storage media. If large data storage is needed, the usually pattern of dealing with such data is to store it on a bulk storage system, such as NFS or HDFS, and store pointers to the storage locations in ZooKeeper.

这段话有这么几个概念，znode、ACL。znode就相当于文件系统的目录，ACL则相当于访问权限，但是这些在Zookeeper里面都是原子性的。这些实现，一定很复杂，但是官方文档中有这样的描述

The replicated database is an in-memory database containing the entire data tree. Updates are logged to disk for recoverability, and writes are serialized to disk before they are applied to the in-memory database.

Every ZooKeeper server services clients. Clients connect to exactly one server to submit irequests. Read requests are serviced from the local replica of each server database. Requests that change the state of the service, write requests, are processed by an agreement protocol.

As part of the agreement protocol all write requests from clients are forwarded to a single server, called the leader. The rest of the ZooKeeper servers, called followers, receive message proposals from the leader and agree upon message delivery. The messaging layer takes care of replacing leaders on failures and syncing followers with leaders.

ZooKeeper uses a custom atomic messaging protocol. Since the messaging layer is atomic, ZooKeeper can guarantee that the local replicas never diverge. When the leader receives a write request, it calculates what the state of the system is when the write is to be applied and transforms this into a transaction that captures this new state.

实现这些原子行读写的操作，在这段文档里，可以大致猜测出来，他是通过在一个集群中选出一个Leader，然后，通过其他server与leader之间的原子的通信（这个描述很诡异，后续了解如何实现的）然后实现的数据的一致性。搞清楚了Zookeeper的大体架构，他得一些设计思路，可以通过官方文档中给的几个例子中进一步体会。如何应用Zookeeper编写我们自己的分布式应用呢？

官方文档中有这么一个图。

可以看到在这幅图种，zookeeper其实是封装将数据的一致性操作封装在了内部服务中。这里面网上有很多的分析了，比如leader选举，Fast Paxos算法，等等，但是关于原理，可以后续研究，我们先来分析一下他得应用场景。

Zookeeper为我们封装了一系列的能够得到多个服务器，多个jdk之间数据同步的方法，换句话说，你可以向一个服务器中，通过client写数据，在其他服务器中也可以监测到你对于数据的变化。即Znode的变化。接下来就是脑洞大开的时刻，你可能会想，这跟我写分布式的服务有什么关系，其实不然，我们来看一下应用场景。

1）分布式Barrier

Barrier是一种控制和协调多个任务触发次序的机制，简单说来就是搞个闸门把欲执行的任务给拦住，等所有任务都处于可以执行的状态时，才放开闸门。它的机理可以见下图所示：

在单机上JDK提供了CyclicBarrier这个类来实现这个机制，但在分布式环境中JDK就无能为力了。在分布式里实现Barrer需要高一致性做保障，因此 ZooKeeper可以派上用场，所采取的方案就是用一个Node作为Barrer的实体，需要被Barrer的任务通过调用exists()检测这个Node的存在，当需要打开Barrier的时候，删掉这个Node，ZooKeeper的watch机制会通知到各个任务可以开始执行。

2）分布式 Queue

与 Barrier类似分布式环境中实现Queue也需要高一致性做保障， ZooKeeper提供了一个种简单的方式，ZooKeeper通过一个Node来维护Queue的实体，用其children来存储Queue的内容，并且 ZooKeeper的create方法中提供了顺序递增的模式，会自动地在name后面加上一个递增的数字来插入新元素。可以用其 children来构建一个queue的数据结构，offer的时候使用create，take的时候按照children的顺序删除第一个即可。 ZooKeeper保障了各个server上数据是一致的，因此也就实现了一个分布式 Queue。take和offer的实例代码如下所示：

点击(此处)折叠或打开

/**
* Removes the head of the queue and returns it, blocks until it succeeds.
* @return The former head of the queue
* @throws NoSuchElementException
* @throws KeeperException
* @throws InterruptedException
*/
public byte[] take() throws KeeperException, InterruptedException {
TreeMap<Long,String> orderedChildren;
// Same as for element. Should refactor this.
while(true){
LatchChildWatcher childWatcher = new LatchChildWatcher();
try{
orderedChildren = orderedChildren(childWatcher);
}catch(KeeperException.NoNodeException e){
zookeeper.create(dir, new byte[0], acl, CreateMode.PERSISTENT);
continue;
}
if(orderedChildren.size() == 0){
childWatcher.await();
continue;
}
for(String headNode : orderedChildren.values()){
String path = dir +"/"+headNode;
try{
byte[] data = zookeeper.getData(path, false, null);
zookeeper.delete(path, -1);
return data;
}catch(KeeperException.NoNodeException e){
// Another client deleted the node first.
}
}
}
}
/**
* Inserts data into queue.
* @param data
* @return true if data was successfully added
*/
public boolean offer(byte[] data) throws KeeperException, InterruptedException{
for(;;){
try{
zookeeper.create(dir+"/"+prefix, data, acl, CreateMode.PERSISTENT_SEQUENTIAL);
return true;
}catch(KeeperException.NoNodeException e){
zookeeper.create(dir, new byte[0], acl, CreateMode.PERSISTENT);
}
}
}

3）分布式lock

利用 ZooKeeper实现分布式lock，主要是通过一个Node来代表一个Lock，当一个client去拿锁的时候，会在这个Node下创建一个自增序列的child，然后通过getChildren()方式来check创建的child是不是最靠前的，如果是则拿到锁，否则就调用exist()来check第二靠前的child，并加上watch来监视。当拿到锁的child执行完后归还锁，归还锁仅仅需要删除自己创建的child，这时watch机制会通知到所有没有拿到锁的client，这些child就会根据前面所讲的拿锁规则来竞争锁。

以上应用场景，可以看出，Zookeeper只是提供了很基本的东西，如何去应用，需要我们自己开脑洞。接下来的学习会重点分析Zookeeper的API，然后结合具体的业务场景，具体分析。

阅读(1321) | 评论(0) | 转发(0) |

上一篇：没有了

下一篇：分布式经典论文之一：分布式系统中的时钟、时间以及事件时序

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6