分类: 服务器与存储
2010-07-13 14:31:30
Benefits of Blocks:
Having a block abstraction for a distributed filesystem brings several benefits.
The first benefit is the most obvious: a file can be larger than any single disk in the network.
Second, making the unit of abstraction a block rather than a file simplifies the storage subsystem.
Furthermore, blocks fit well with replication for providing fault tolerance and availability.
数据切片的好处
(1)文件尺寸不再受限于集群内单一磁盘大小
(2)简化存储子系统的设计,一是优化存储管理机制;二是降低元数据复杂。
(3)与备份机制结合,提供系统高可用可靠性
Namenode and Datanode
The namenode manages the filesystem namespace. It maintains the filesystem tree and the metadata for all the files and directories in the tree. This information is stored persistently on the local disk in the form of two files: the namespace image and the edit log. The namenode also knows the datanodes on which all the blocks for a given file are located, however, it does not store block locations persistently, since this information is reconstructed from datanodes when the system starts.
Datanodes are the work horses of the filesystem. They store and retrieve blocks when they are told to (by clients or the namenode), and they report back to the namenode periodically with lists of blocks that they are storing.
2)定期向控制节点报告所存储的切片的列表
HDFS Architecture
HDFS – Interfaces (exclude Java)
HDFS - Anatomy of a File Read
(1) The client opens the file it wishes to read by calling open() on the FileSystem object, which for HDFS is an instance of DistributedFileSystem.
(2) DistributedFileSystem calls the namenode, using RPC, to determine the locations of the blocks for the first few blocks in the file. The DistributedFileSystem returns a FSDataInputStream (an input stream that supports file seeks) to the client for it to read data from. FSDataInputStream in turn wraps a DFSInputStream, which manages the datanode and namenode I/O.
(3) The client then calls read() on the stream DFSInputStream, which has stored the datanode addresses for the first few blocks in the file, then connects to the first (closest) datanode for the first block in the file.
(4) Data is streamed from the datanode back to the client, which calls read() repeatedly on the stream.
(5) When the end of the block is reached, DFSInputStream will close the connection to the datanode, then find the best datanode for the next block. This happens transparently to the client.
(6) When the client has finished reading, it (DFSInputStream) calls close() on the FSDataInputStream.
HDFS - Anatomy of a File Write
(1) The client creates the file by calling create() on DistributedFileSystem.
(2) DistributedFileSystem makes an RPC call to the namenode to create a new file in the filesystem’s namespace, with no blocks associated with it. Namenode performs various checks. If all checks pass, DistributedFileSystem returns a FSDataOutputStream for the client to start writing data to. FSDataOutputStream wraps a DFSOutput Stream, which handles communication with the datanodes and namenode.
(3) The client writes data. DFSOutputStream splits data into packets, which it writes to an internal queue, called the data queue. The data queue is consumed by the Data Streamer.
(4) The DataStreamer streams the packets to the first datanode in the pipeline, which stores the packet and forwards it to the second datanode in the pipeline. Similarly, the second datanode stores the packet and forwards it to the third (and last) datanode in the pipeline. (Here we assume the replication level is 3)
(5) DFSOutputStream also maintains an internal queue of packets that are waiting to be acknowledged by datanodes, called the ack queue. A packet is removed from the ack queue only when it has been acknowledged by all the datanodes in the pipeline.
(6) When the client has finished writing data it calls close() on the stream. This action flushes all the remaining packets to the datanode pipeline and waits for acknowledgments.
(7) The client contacts the namenode to signal that the file is complete. The namenode waits for blocks to be minimally replicated before returning successfully.