Hadoop Installation
1.Install Java: Hadoop requires Java to run. Install it using:
2.sudo apt update
3.sudo apt install openjdk-8-jdk -y
Verify the installation:
java -version
Configure SSH: Hadoop uses SSH for communication between nodes. Set up passwordless SSH:
4.sudo apt install openssh-server -y
5.ssh-keygen -t rsa -P “”
6.cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
7.chmod 640 ~/.ssh/authorized_keys
8.ssh localhost
9.yes
Download and Extract Hadoop: Download the latest stable version of Hadoop:
10.wget https://dlcdn.apache.org/hadoop/common/hadoop-3.4.0/hadoop-3.4.0.tar.gz
11.tar -xvzf hadoop-3.4.0.tar.gz
12.sudo mv hadoop-3.4.0 /usr/local/hadoop
13.sudo chown -R user:user /usr/local/hadoop
Set Environment Variables: Edit the .bashrc file to include Hadoop environment variables:
14.nano ~/.bashrc
Add the following lines:
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
Apply the changes:
source ~/.bashrc
$ readlink -f /usr/bin/java | sed “s:bin/java::”
Output
/usr/lib/jvm/java-8-openjdk-amd64/jre/
$ sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Then, modify the file by choosing one of the following options:
Option 1: Set a Static Value
#export JAVA_HOME=
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/
If you have trouble finding these lines, use CTRL+W to quickly search through the text. Once you are done, exit with CTRL+X and save your file.
Option 2: Use Readlink to Set the Value Dynamically
#export JAVA_HOME=
export JAVA_HOME=$(readlink -f /usr/bin/java | sed “s:bin/java::”)
Configure Hadoop: Edit the configuration files located in $HADOOP_HOME/etc/hadoop/:
15.cd /usr/local/hadoop/etc/hadoop
-
- core-site.xml
XML
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
-
- hdfs-site.xml
XML
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/data/datanode</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
</configuration>
sudo mkdir -p /usr/local/hadoop/data/namenode
sudo mkdir -p /usr/local/hadoop/data/datanode
sudo chown -R user:user /usr/local/hadoop/data
-
- mapred-site.xml
XML
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
</configuration>
-
- yarn-site.xml
XML
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Format the Hadoop File system:
16.hdfs namenode -format (Use one time)
Start Hadoop:
17.start-dfs.sh
18.start-yarn.sh
Verify Installation: Check the Hadoop processes:
hadoop@gyancs:~$ jps
3376 NodeManager
3030 SecondaryNameNode
3239 ResourceManager
2841 DataNode
5275 Jps
Access the Hadoop Web Interfaces:
-
- NameNode Web UI: Open your browser and go to
- http://localhost:9870
- http://192.168.88.128:9870/ (Your Computer IP )
- Resource Manager Web UI: Open your browser and go to
- http://localhost:8088
- http://192.168.88.128:8088/cluster (Your Computer IP )
Stop Hadoop
19.stop-dfs.sh
20.stop-yarn.sh
Hadoop Command
$ hadoop fs -ls /
$ hadoop fs -mkdir /GYANCS
$ hadoop fs -mkdir -p /CCAI/DSAI
$ hadoop fs -touch /GYANCS/file1
$ hadoop fs -ls /GYANCS/
$ hdfs dfs -ls /GYANCS
$ hadoop fs -copyToLocal /GYANCS/file1 (Hadoop Server to local)
$ hadoop fs -get /GYANCS/file3
$ hadoop fs -put file1 /GYANCS/file3 (Local to Hadoop Server)
$ hadoop fs -cat /GYANCS/file1 (Display content)
$ hadoop fs -rm -r /GYANCS3 (Remove Directory)
$ hdfs dfs -rm -f /GYANCS/file1.txt (File Delete)
MapReduce Program in Java on Hadoop
