[Spark] To install Spark environment based on Hadoop

This document is to record how to install Spark environment based on Hadoop as the previous one. For running Spark in Ubuntu machine, it should install Java first. Using the following command is easily to install Java in Ubuntu machine.

$ sudo apt-get install openjdk-7-jre openjdk-7-jdk
$ dpkg -L openjdk-7-jdk | grep '/bin/javac'
$ /usr/lib/jvm/java-7-openjdk-amd64/bin/javac

So, we can setup the JAVA_HOME environment variable as follows:
$ vim /etc/profile
  append this ==> export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

$ sudo tar -zxf ~/Downloads/spark-1.6.0-bin-without-hadoop.tgz -C /usr/local/
$ cd /usr/local
$ sudo mv ./spark-1.6.0-bin-without-hadoop/ ./spark
$ sudo chown -R hadoop:hadoop ./spark

$ sudo apt-get update
$ sudo apt-get install scala
$ wget http://apache.stu.edu.tw/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
$ tar xvf spark-1.6.0-bin-hadoop2.6.tgz
$ cd /spark-1.6.0-bin-hadoop2.6/bin
$ ./spark-shell

$ cd /usr/local/spark
$ cp ./conf/spark-env.sh.template ./conf/spark-env.sh
$ vim ./conf/spark-env.sh
  append this ==> export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)


Popular posts from this blog

[Open vSwitch] How to get port statistics from interface in OVS

[Quagga] How to compile and install Quagga on Ubuntu 12.04

[JSON] How to use jansson lib to generate JSON data in C