Apache Hadoop Installation (Single Node)

If you are new to Hadoop and getting curious that how to install Hadoop from scratch then this tutorial will help you to understand the setup process. By following these steps you can setup and run your single hadoop cluster.

Note:

Everyline with $ is a linux command
Every line starting with # represents comments or remarks
During installation keep internet connection open

1. Updating linux distribution and getting Java

$ sudo apt-get install update
$ sudo apt-get install default-jdk

2. Creating a new user and usergroup (best practice)

you can create user and group with any name that is suitable to you. Here, we have created user with name hduser and group with the name hadoop

$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser
$ sudo adduser hduser sudo

3. Getting ssh server and public key

# get open ssh server
$ sudo apt-get install openssh-server

$ su hduser
$ ssh-keygen -t rsa -P ""
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

# testing ssh installation
$ ssh localhost

4. Getting and installing Hadoop

$ wget http://www-eu.apache.org/dist/hadoop/common/stable/hadoop-2.7.3.tar.gz
$ tar xvzf hadoop-2.7.3.tar.gz
$ sudo mv hadoop-2.7.3 /usr/local/hadoop

$ sudo chown -R hduser /usr/local

5. Setting environment

open .bashrc file in gedit to add content

$ sudo gedit ~/.bashrc

Now, add following content and save the file. In case if you are not using java-8, please replace java-8-openjdk-amd64 with the name of java folder that you have installed.

# Java Configuration
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

# Hadoop Configurations

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMIN_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

6. Hadoop configuration

(i) - Edit File core-site.xml by using following command

$ sudo gedit core-site.xml

and add following content inside <configuration> … </configuration> tags

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

(ii) - Edit File hdfs-site.xml by using following command

$ sudo gedit hdfs-site.xml

and add following content inside <configuration> …</configuration> tags.

The hdfs-site.xml is used to specify the namenode and datanode directories. Before modifying this file, we create the namenode and datanode directories. You can create folder with any name in our case we named it hadoop_store.

$ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
$ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode

# grant access to hduser to the folder.
$ sudo chown -R hduser /usr/local/hadoop_store

<property>
    <name>dfs.replication</name>
    <value>4</value>
</property>
<property>
        <name>dfs.namenode.name.dir</name>
        <value> file:/usr/local/hadoop_store/hdfs/namenode </value>
</property>
<property>
        <name>dfs.datanode.data.dir</name>
        <value> file:/usr/local/hadoop_store/hdfs/datanode </value>
</property>

(iii) - Edit File yarn-site.xml by using following command

$ sudo gedit yarn-site.xml

and add following content inside <configuration> …</configuration> tags.

# linux command to open the file to edit
$ sudo gedit yarn-site.xml

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

(iv) - Edit File mapred-site.xml by using following command

$ sudo gedit mapred-site.xml

and add following content inside <configuration> …</configuration> tags.

$ cp mapred-site.xml.template mapred-site.xml
$ sudo gedit mapred_site.xml

<property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
</property>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>

(v) - Edit file hadoop-env.sh and locate for the line export JAVA_HOME=${JAVA_HOME} and change it with right java home path like in our case it will be export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

7. Format the Hadoop file system

To format the file system run following command. This will initialize the hadoop file system.

$ hdfs namenode -format

8. Running ‘Single Node’ cluster

$ start-dfs.sh
$ start-yarn.sh
$ jps

9. See the status on Web Interface

goto http://localhost:8088 to see Main Cluster
goto http://localhost:50070 to see detailed status

External Useful Links:

Prof. Anand Nayyar’s post
Tutorial posted on HadoopWorld Youtube Channel
Book “Hadoop: The Definitive Guide”
Edureka Hadoop Installation Guide