Showing posts with label Hadoop. Show all posts
Showing posts with label Hadoop. Show all posts

Monday, May 15, 2017

[Hadoop] To build a Hadoop environment (a single node cluster)

For the purpose of studying Hadoop, I have to build a testing environment to do. I found some resource links are good enough to build a single node cluster of Hadoop MapReduce as follows. And there are additional changes from my environment that I want to add some comments for my reference.

http://www.thebigdata.cn/Hadoop/15184.html
http://www.powerxing.com/install-hadoop/

Login the user "hadoop"

$ sudo su - hadoop

Go to the location of Hadoop

$ /usr/local/hadoop

Add the variables in ~/.bashrc

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 export HADOOP_HOME=/usr/local/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_INSTALL=/usr/local/hadoop export PATH=$PATH:$HADOOP_INSTALL/bin export PATH=$PATH:$HADOOP_INSTALL/sbin export HADOOP_MAPRED_HOME=$HADOOP_INSTALL export HADOOP_COMMON_HOME=$HADOOP_INSTALL export HADOOP_HDFS_HOME=$HADOOP_INSTALL export YARN_HOME=$HADOOP_INSTALL

Modify $JAVA_HOME in etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

Start dfs and yarn

$ sbin/start-dfs.sh
$ sbin/start-yarn.sh

Finally, we can try the Hadoop MapReduce example as follows:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs[a-z.]+'

[Spark] To install Spark environment based on Hadoop

This document is to record how to install Spark environment based on Hadoop as the previous one. For running Spark in Ubuntu machine, it should install Java first. Using the following command is easily to install Java in Ubuntu machine.

$ sudo apt-get install openjdk-7-jre openjdk-7-jdk
$ dpkg -L openjdk-7-jdk | grep '/bin/javac'
$ /usr/lib/jvm/java-7-openjdk-amd64/bin/javac

So, we can setup the JAVA_HOME environment variable as follows:
$ vim /etc/profile
  append this ==> export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

$ sudo tar -zxf ~/Downloads/spark-1.6.0-bin-without-hadoop.tgz -C /usr/local/
$ cd /usr/local
$ sudo mv ./spark-1.6.0-bin-without-hadoop/ ./spark
$ sudo chown -R hadoop:hadoop ./spark

$ sudo apt-get update
$ sudo apt-get install scala
$ wget http://apache.stu.edu.tw/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
$ tar xvf spark-1.6.0-bin-hadoop2.6.tgz
$ cd /spark-1.6.0-bin-hadoop2.6/bin
$ ./spark-shell

$ cd /usr/local/spark
$ cp ./conf/spark-env.sh.template ./conf/spark-env.sh
$ vim ./conf/spark-env.sh
  append this ==> export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)

Thursday, August 11, 2016

[Hadoop] Setting up a Single Node Cluster

Basically these resource links are good enough to do a single node cluster of Hadoop MapReduce. But I still want to add some comments for my reference.
http://www.thebigdata.cn/Hadoop/15184.html
http://www.powerxing.com/install-hadoop/

Login the user "hadoop"
# sudo su - hadoop

Go to the location of Hadoop
# /usr/local/hadoop

Add the variables in ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL

Modify $JAVA_HOME in etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

Start dfs and yarn
# sbin/start-dfs.sh
# sbin/start-yarn.sh

Finally, we can try the Hadoop MapReduce example as follows:
# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs[a-z.]+'

P.S:
In order to forcefully let the namenode leave safemode, following command should be executed:
# hdfs dfsadmin -safemode leave