关于hadoop的hostname-xjc2694-ChinaUnix博客

Xiajc - 工作笔记xjc2694.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

xjc2694

博客访问： 3086162
博文数量： 535
博客积分： 15788
博客等级：上将
技术积分： 6507
用户组：普通用户
注册时间： 2007-03-07 09:11

文章分类

全部博文（535）

Puppet（6）
Solaris（1）
hadoop（15）
虚拟化（8）
C（1）
DB（44）
perl（35）
云计算（27）
系统监控（26）
Others（27）
WWW（100）
Mail（20）
Linux（213）
未分配的博文（12）

文章存档

2016年（1）

2015年（1）

2014年（10）

2013年（26）

2012年（43）

2011年（86）

2010年（76）

2009年（136）

2008年（97）

2007年（59）

我的朋友

相关博文

关于hadoop的hostname

分类：云计算

2011-10-25 09:00:59

reduce到某个阶段时，卡在某处

到reduce运行的node上，查看logs/userlog/下相应的目录，找到对应的job、task目录，查看里面的syslog

可以看到：

2011-10-21 14:52:07,483 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201110210917_0234_r_000007_1 copy failed: attempt_201110210917_0234_m_000011_0 from sjz134.uniqlcik.com
2011-10-21 14:52:07,484 WARN org.apache.hadoop.mapred.ReduceTask: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at sun.net.NetworkClient.doConnect(NetworkClient.java:158)
at sun.net.(HttpClient.java:394)
at sun.net.(HttpClient.java:529)
at sun.net.(HttpClient.java:233)
at sun.net.(HttpClient.java:306)
at sun.net.(HttpClient.java:323)
at sun.net.(HttpURLConnection.java:975)
at sun.net.(HttpURLConnection.java:916)
at sun.net.(HttpURLConnection.java:841)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1618)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1575)

原因：

这是由于tasktracker无法拷贝其他tasktracker的map结果造成的。

reduce的Shuffle过程：通过网络（HTTP协议）访问并复制Mapper的输出记录；

每个reduce task都会有一个后台进程GetMapCompletionEvents，它获取heartbeat中（从JobTracker）传过来的已经完成的task列表，并将与该reduce task对应的数据位置信息保存到mapLocations中，mapLocations中的数据位置信息经过滤和去重（相同的位置信息因为某种原因，可能发过来多次）等处理后保存到集合scheduledCopies中，然后由几个拷贝线程（默认为5个）通过HTTP并行的拷贝数据。

通过http协议需要hostname，关于hostname：

注：关于slaves文件，默认情况下，里面写得域名仅仅用于主namenode启动slave时ssh使用。其余地方用不到这几个域名。tasktracker和jobtracker，datanode和namenode之间的heartbeat里面包含的信息都是hostname，而不是slaves文件里所写的域名。

TaskTracker (DataNode) will send to the
JobTracker (NameNode) status messages regularly, which contain its hostname.
Consequently, when a Map or Reduce task obtains the addresses of the
TaskTrackers (DataNodes) from the JobTracker (NameNode), e.g., for copying
the Map output or reading a HDFS block, it will get the hostnames specified
in the status messages and talk to the TaskTrackers (DataNodes) using those
hostnames.

如何让tasktracker和jobtracker，datanode和namenode之间使用slaves里所写的域名？

在源码中可以看到：/usr/local/hadoop/src/mapred/org/apache/hadoop/mapred/TaskTracker.java

localFs = FileSystem.getLocal(fConf);
if (fConf.get("slave.host.name") != null) {
this.localHostname = fConf.get("slave.host.name");
}
if (localHostname == null) {
this.localHostname =
DNS.getDefaultHost
(fConf.get("mapred.tasktracker.dns.interface","default"),
fConf.get("mapred.tasktracker.dns.nameserver","default"));
}

另外在haoop的wiki faq中也有说明：

When writing a New InputFormat, what is the format for the array of string returned by InputSplit\#getLocations()?

It appears that DatanodeID.getHost() is the standard place to retrieve this name, and the machineName variable, populated in DataNode.java\#startDataNode, is where the name is first set. The first method attempted is to get "slave.host.name" from the configuration; if that is not available, DNS.getDefaultHost is used instead.

所以，可以在 mapred-site.xml和hdfs-site.xml中指定该节点要使用的hostname。

You can bypass all of Hadoop's efforts to automatically figure out the slave's host name by specifying the slave.host.name parameter in the configuration files. If that is set, Hadoop will just take your word for it and use the name you provide.

<property>
<name>slave.host.name</name>
<value>hdp-datanode145</value>
</property>

需要注意：每天服务器仍然需要能够解析自己的主机名，因为在启动时，会解析自己的主机名，通过记录的日志名可以看出，仍然需要用到本机的主机名。

STARTUP_MSG: host = xxx.xxx.com/172.18.6.143

不写得话会报错：

STARTUP_MSG: host = java.net.UnknownHostException:

可以使用Groovy进行测试

groovy -e "print InetAddress.getLocalHost().getHostName()"

参考文档：

http://blog.sina.com.cn/s/blog_61ef49250100uul8.html

%2FIP%A5%A2%A5%C9%A5%EC%A5%B9%A4%F2%BB%C8%A4%C3%A4%BF%A5%AF%A5%E9%A5%B9%A5%BF%A4%CE%B9%BD%C0%AE

http://blog.csdn.net/baggioss/article/details/5462593

http://western-skies.blogspot.com/2010/11/fix-for-exceeded-maxfaileduniquefetches.html （需翻墙）

阅读(2565) | 评论(0) | 转发(0) |

上一篇：haoop HowManyMapsAndReduces

下一篇：puppet 自定义facter，及ruby使用正则

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6