Monday, May 15, 2017

[Spark] To install Spark environment based on Hadoop

This document is to record how to install Spark environment based on Hadoop as the previous one. For running Spark in Ubuntu machine, it should install Java first. Using the following command is easily to install Java in Ubuntu machine.

$ sudo apt-get install openjdk-7-jre openjdk-7-jdk
$ dpkg -L openjdk-7-jdk | grep '/bin/javac'
$ /usr/lib/jvm/java-7-openjdk-amd64/bin/javac

So, we can setup the JAVA_HOME environment variable as follows:
$ vim /etc/profile
  append this ==> export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

$ sudo tar -zxf ~/Downloads/spark-1.6.0-bin-without-hadoop.tgz -C /usr/local/
$ cd /usr/local
$ sudo mv ./spark-1.6.0-bin-without-hadoop/ ./spark
$ sudo chown -R hadoop:hadoop ./spark

$ sudo apt-get update
$ sudo apt-get install scala
$ wget
$ tar xvf spark-1.6.0-bin-hadoop2.6.tgz
$ cd /spark-1.6.0-bin-hadoop2.6/bin
$ ./spark-shell

$ cd /usr/local/spark
$ cp ./conf/ ./conf/
$ vim ./conf/
  append this ==> export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
