C++,python,热爱算法和机器学习
全部博文(1214)
分类: 大数据
2017-11-16 18:38:32
Each daemon in Hadoop listens to a different port. The most relevant ones are:
3.hadoop configurations:
ssh -i jameson-keypair.pem bitnami@13.59.230.131
/opt/bitnami/hadoop/etc/hadoop/core-site.xml
/opt/bitnami/hadoop/etc/hadoop/hdfs-site.xml
/opt/bitnami/hadoop/etc/hadoop/yarn-site.xml
yarn.resourcemanager.hostname NEED TO BE SET!
/opt/bitnami/hadoop/etc/hadoop/mapred-site.xml
1. Typically you want 2-4 partitions for each CPU in your cluster. Normally, Spark tries to set the number of partitions automatically based on your cluster. However, you can also set it manually by passing it as a second parameter to parallelize (e.g. sc.parallelize(data, 10)).
2.The textFile method also takes an optional second argument for controlling the number of partitions of the file. By default, Spark creates one partition for each block of the file (blocks being 128MB by default in HDFS), but you can also ask for a higher number of partitions by passing a larger value.