Chinaunix首页 | 论坛 | 博客
  • 博客访问: 4817832
  • 博文数量: 971
  • 博客积分: 8199
  • 博客等级: 中将
  • 技术积分: 12712
  • 用 户 组: 普通用户
  • 注册时间: 2010-05-25 20:19
个人简介

脚踏实地、勇往直前!

文章分类

全部博文(971)

文章存档

2019年(62)

2018年(208)

2017年(81)

2016年(49)

2015年(50)

2014年(170)

2013年(52)

2012年(177)

2011年(93)

2010年(30)

分类: HADOOP

2014-12-03 17:07:44

linux下安装spark

环境:

OS:Rad Hat Linux As5

spark-1.0.2

scala-2.10.2

安装spark需要安装scala环境,所以这里我们先安装scala.


1.安装scala

1.4 下载安装介质

下载安装介质,下载地址为: http://www.scala-lang.org/download/all.html

根据情况选择下载的版本,我这里下载的版本是scala-2.10.2.tgz

1.5 解压并安装

使用hadoop登陆

拷贝安装文件到usr1目录

[hadoop1@node1 sacala]$ cp scala-2.10.2.tgz /usr1/

解压

[hadoop1@node1 usr1]$ tar -zxvf scala-2.10.2.tgz

目录改名

[hadoop1@node1 usr1]$ mv scala-2.10.2 scala

hive目录权限赋予hadoop用户

[root@node1 usr1]# chown -R hadoop1:hadoop1 ./ scala

 

 

1.6 添加环境变量

export SCALA_HOME= /usr1/scala

 

修改后的红色标识

[hadoop1@node1 ~]$ vi .bash_profile

 

# .bash_profile

 

# Get the aliases and functions

if [ -f ~/.bashrc ]; then

        . ~/.bashrc

fi

 

# User specific environment and startup programs

 

export JAVA_HOME=/usr/java/jdk1.8.0_05

export JRE_HOME=/usr/java/jdk1.8.0_05/jre

export HADOOP_HOME=/usr1/hadoop

HIVE_HOME=/usr1/hive

ZOOKEEPER_HOME=/usr1/zookeeper

export SCALA_HOME=/usr1/scala

export SQOOP_HOME=/usr1/sqoop

export HBASE_HOME=/usr1/hbase

export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib:$HADOOP_HOME/lib:$HBASE_HOME/lib

export PATH=$HADOOP_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$JAVA_HOME/bin:$JRE_HOME/bin:$SQOOP_HOME/bin:$SCALA_HOME/bin:$PATH

 

 

PATH=$PATH:$HOME/bin

 

export PATH

~          

1.7 验证

[hadoop1@node1 ~]$ scala -version

Scala code runner version 2.10.2 -- Copyright 2002-2013, LAMP/EPFL

 

2.安装spark

2.1 下载安装介质

下载安装介质,下载地址为: http://archive.apache.org/dist/spark/

根据情况选择下载的版本,我这里下载的版本是spark-1.0.2-bin-hadoop1.tgz

2.2 解压并安装

使用hadoop登陆

拷贝安装文件到usr1目录

[hadoop1@node1 spark]$ cp spark-1.0.2-bin-hadoop1.tgz /usr1/

解压

[hadoop1@node1 usr1]$ tar -zxvf spark-1.0.2-bin-hadoop1.tgz

目录改名

[hadoop1@node1 usr1]$ mv spark-1.0.2-bin-hadoop1 spark

hive目录权限赋予hadoop用户

[root@node1 usr1]# chown -R hadoop1:hadoop1 ./spark

 

2.3 添加环境变量

[hadoop1@node1 ~]$ vi .bash_profile

 

# .bash_profile

 

# Get the aliases and functions

if [ -f ~/.bashrc ]; then

        . ~/.bashrc

fi

 

# User specific environment and startup programs

 

export JAVA_HOME=/usr/java/jdk1.8.0_05

export JRE_HOME=/usr/java/jdk1.8.0_05/jre

export HADOOP_HOME=/usr1/hadoop

HIVE_HOME=/usr1/hive

ZOOKEEPER_HOME=/usr1/zookeeper

export SPARK_HOME=/usr1/spark

export SCALA_HOME=/usr1/scala

export SQOOP_HOME=/usr1/sqoop

export HBASE_HOME=/usr1/hbase

export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib:$HADOOP_HOME/lib:$HBASE_HOME/lib

export PATH=$HADOOP_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$JAVA_HOME/bin:$JRE_HOME/bin:$SQOOP_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$PATH

 

 

PATH=$PATH:$HOME/bin

 

export PATH

 

2.4 修改slaves文件

进入conf目录

cd $SPARK_HOME/conf

 

vi slaves

添加如下数据节点

192.168.56.102

192.168.56.103

192.168.56.104

 
这里填写数据节点的ip

2.5 配置spark-env.sh

进入到conf目录

cd $SPARK_HOME/conf

从模板复制一份

[hadoop1@node1 conf]$ cp spark-env.sh.template spark-env.sh

 

编辑spark-env.sh文件

添加如下内容

export JAVA_HOME=/usr/java/jdk1.8.0_05

export HADOOP_HOME=/usr1/hadoop

export SCALA_HOME=/usr1/scala

export SPARK_MASTER_IP=192.168.56.101

 

192.168.56.101是名称节点的ip

 

2.6 打包到其他机器

[hadoop1@node1 usr1]$ tar -cvf spark.tar ./spark

 

传到其他机器

scp spark.tar hadoop1@192.168.56.102:/home/hadoop1

scp spark.tar hadoop1@192.168.56.103:/home/hadoop1

scp spark.tar hadoop1@192.168.56.104:/home/hadoop1

 

 

在每个数据节点上解压缩并修改目录属主

[root@node2 usr1]# tar -xvf spark.tar

[root@node2 usr1]# chown -R hadoop1:hadoop1 ./spark

 

 

2.7 启动spark

在主节点上执行

[hadoop1@node1 usr1]$ cd $SPARK_HOME/sbin

[hadoop1@node1 sbin]$ ./start-all.sh

 

2.8 验证


2.8.1     检查进程

[hadoop1@node1 sbin]$ jps

15026 Master

9668 JobTracker

9433 NameNode

9595 SecondaryNameNode

15135 Jps

名称节点上多出了Master

[hadoop1@node2 ~]$ jps
5152 DataNode
5236 TaskTracker
24184 Jps
24125 Worker

数据节点上多了Worker,说明spark已经启动成功.

2.8.2     执行样例程序

 

cd $SPARK_HOME/bin

 

[hadoop1@node1 bin]$ ./run-example SparkPi

 

amp 1417597140066

14/12/03 16:59:00 INFO util.Utils: Fetching http://192.168.56.101:13361/jars/spark-examples-1.0.2-hadoop1.0.4.jar to /tmp/fetchFileTemp7599337252435878324.tmp

14/12/03 16:59:01 INFO executor.Executor: Adding file:/tmp/spark-5ce4253d-148a-48fb-a3f4-741778cc4a0b/spark-examples-1.0.2-hadoop1.0.4.jar to class loader

14/12/03 16:59:01 INFO executor.Executor: Serialized size of result for 0 is 675

14/12/03 16:59:01 INFO executor.Executor: Sending result for 0 directly to driver

14/12/03 16:59:01 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID 1 on executor localhost: localhost (PROCESS_LOCAL)

14/12/03 16:59:01 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as 1411 bytes in 1 ms

14/12/03 16:59:01 INFO scheduler.DAGScheduler: Completed ResultTask(0, 0)

14/12/03 16:59:01 INFO scheduler.TaskSetManager: Finished TID 0 in 701 ms on localhost (progress: 1/2)

14/12/03 16:59:01 INFO executor.Executor: Running task ID 1

14/12/03 16:59:01 INFO executor.Executor: Serialized size of result for 1 is 675

14/12/03 16:59:01 INFO executor.Executor: Sending result for 1 directly to driver

14/12/03 16:59:01 INFO executor.Executor: Finished task ID 1

14/12/03 16:59:01 INFO scheduler.DAGScheduler: Completed ResultTask(0, 1)

14/12/03 16:59:01 INFO scheduler.DAGScheduler: Stage 0 (reduce at SparkPi.scala:35) finished in 0.772 s

14/12/03 16:59:01 INFO scheduler.TaskSetManager: Finished TID 1 in 63 ms on localhost (progress: 2/2)

14/12/03 16:59:01 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool

14/12/03 16:59:01 INFO executor.Executor: Finished task ID 0

14/12/03 16:59:01 INFO spark.SparkContext: Job finished: reduce at SparkPi.scala:35, took 0.99398 s

Pi is roughly 3.14364

 

2.8.3   计算文件字符个数

[hadoop1@node1 bin]$cd $SPARK_HOME/bin

[hadoop1@node1 bin]$ ./spark-shell

scala> val distFile = sc.textFile("hdfs://192.168.56.101:9000/user/hadoop1/input/file1.txt")

scala> distFile.map(_.size).reduce(_+_)


2.8.4   登陆web界面

IE栏里输入:http://192.168.56.101:8080/

 

 
-- The End --

 

阅读(2638) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~
评论热议
请登录后评论。

登录 注册