set up hadoop/hive/spark cluster on ec2-laoliulaoliu-ChinaUnix博客

miraclemiracle.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

laoliulaoliu

博客访问： 4669241
博文数量： 1214
博客积分： 13195
博客等级：上将
技术积分： 9105
用户组：普通用户
注册时间： 2007-01-19 14:41

个人简介

C++,python,热爱算法和机器学习

文章分类

全部博文（1214）

cloud（3）
operation（9）
tornado（4）
mac_os（1）
golang（4）
架构（13）
git（4）
security（29）
shell（1）
macbook（1）
ruby（13）
javascript（15）
design（3）
testing（1）
mac（1）
bigdata（69）
nosql（46）
R（9）
gcj/acm（6）
NLP（10）
小说（3）
matlab（4）
web（44）
java（66）
product（7）
c#（1）
language（4）
machine learning（76）
science（4）
opencourse（2）
windows（3）
search（33）
algorithm（65）
database（51）
compiler（11）
ACE（5）
poem（1）
programming（29）
python（140）
assembly（1）
linux（49）
C++（16）
book（2）
cate（1）
phliosophy（3）
mental（30）
Science fiction（1）
Software（5）
c（23）
network（65）
CS（15）
thinking（10）
BSD（13）
solaris10（2）
life（57）
Debian（16）
economy（7）
Mathematics（57）
OS（8）
ibm（2）
gentoo（32）
未分配的博文（8）

文章存档

2021年（13）

2020年（49）

2019年（14）

2018年（27）

2017年（69）

2016年（100）

2015年（106）

2014年（240）

2013年（5）

2012年（193）

2011年（155）

2010年（93）

2009年（62）

2008年（51）

2007年（37）

我的朋友

相关博文

set up hadoop/hive/spark cluster on ec2

分类：大数据

2017-11-16 18:38:32

AWS EMR 启动的机器都很贵，想在3台t2.micro上搭建一个hadoop集群，因为t2.micro 内存1G不够跑spark，换用t2.small的2G内机器。

0. download java without auth:
wget --no-check-certificate -c --header "Cookie: oraclelicense=accept-securebackup-cookie"

1. choose AMI:
bitnami-hadoop-2.8.2-0-linux-debian-8-x86_64-hvm-ebs - ami-00654a65

2.configure security group ports ()
include: 22, 80, 443

Each daemon in Hadoop listens to a different port. The most relevant ones are:

ResourceManager:
- Service: 8032
- Web UI: 8088
NameNode:
- Metadata: 9000
- Web UI: 50070
Secondary NameNode:
- Metadata: 50090
DataNode:
- Data transfer: 50010
- Metadata: 50020
- WebUI: 50075
Timeline Server:
- Service: 10200
- WebUI: 8188
Hive:
- Hiveserver2 binary: 10000
- Hiveserver2 HTTP: 10001
- Metastore: 9083
- WebHCat: 50111
- Derby DB: 1527

3.hadoop configurations:
ssh -i jameson-keypair.pem bitnami@13.59.230.131
/opt/bitnami/hadoop/etc/hadoop/core-site.xml
hdfs://localhost:9000
/opt/bitnami/hadoop/etc/hadoop/hdfs-site.xml
1
/opt/bitnami/hadoop/etc/hadoop/yarn-site.xml
yarn.resourcemanager.hostname NEED TO BE SET!
/opt/bitnami/hadoop/etc/hadoop/mapred-site.xml

1. Typically you want 2-4 partitions for each CPU in your cluster. Normally, Spark tries to set the number of partitions automatically based on your cluster. However, you can also set it manually by passing it as a second parameter to parallelize (e.g. sc.parallelize(data, 10)).
2.The textFile method also takes an optional second argument for controlling the number of partitions of the file. By default, Spark creates one partition for each block of the file (blocks being 128MB by default in HDFS), but you can also ask for a higher number of partitions by passing a larger value.

阅读(854) | 评论(0) | 转发(0) |

上一篇：Hive和HBase整合用户指南

下一篇：hbase rowkey 查询

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6