Install Apache Pig on Hadoop Complete Setup

Install Pig
After Installing Java and Hadoop: Ensure you have Java and Hadoop installed, as Pig requires them. If not, you can install Java using:
1.sudo apt update
2.sudo apt install openjdk-8-jdk -y (if you already Installed then skip)
Follow the steps from my previous message to install Hadoop.
Download Apache Pig: Download the latest version of Apache Pig from the official website:
3.wget https://downloads.apache.org/pig/pig-0.18.0/pig-0.18.0.tar.gz
4.tar -xvzf pig-0.18.0.tar.gz
5.sudo mv pig-0.18.0 /usr/local/pig
Set Environment Variables: Edit the .bashrc file to include Pig environment variables:
6.nano ~/.bashrc
Add the following lines:
export PIG_HOME=/usr/local/pig
export PATH=$PATH:$PIG_HOME/bin
export PIG_CLASSPATH=$HADOOP_HOME/conf
Apply the changes:
source ~/.bashrc
Verify Installation: Check if Pig is installed correctly by running:
7.pig -version
Start Pig: You can start Pig in local mode or MapReduce mode:
o Local Mode:
o pig -x local
o MapReduce Mode:
o pig
1: Create txt File (For Wordcount Program)
$ echo -e “hello\nworld\nhello\npig\npig\nloves\ndata” > data.txt
2: Pig Script Write (local mode)
$ nano wordcount.pig
— Load text file
lines = LOAD ‘data.txt’ USING PigStorage(‘ ‘) AS (word:chararray);
— Group by word
grouped = GROUP lines BY word;
— Count occurrences
counted = FOREACH grouped GENERATE group, COUNT(lines);
— Store result locally
STORE counted INTO ‘output_wordcount’ USING PigStorage(‘,’);
3: Run Pig Script (Local Mode)
pig -x local wordcount.pig
4: Output
$ ls output_wordcount
$ cat output_wordcount/part-r-00000
pig,2
data,1
hello,2
loves,1
world,1

Short Questions
1.What prerequisite software must be installed before setting up Apache Pig?
2.Which command is used to install Java on Ubuntu before installing Pig?
3.How do you download Apache Pig version 0.18.0 from the official website?
4.Which command extracts the Pig tar file after downloading?
5.Where should the Pig folder be moved after extraction?
6.What environment variable should be set to define the Pig home directory?
7.How do you apply the changes made to the .bashrc file?
8.Which command verifies whether Apache Pig is successfully installed?
9.How do you start Pig in local mode?
10.What is the purpose of the wordcount.pig script?

(MCQs) with Answers
1.Which language does Apache Pig use for data analysis?
a) Java
b) Pig Latin
c) Python
d) SQL
✅ Answer: b) Pig Latin
2.Which command installs OpenJDK 8 on Ubuntu?
a) sudo apt-get openjdk install 8
b) sudo install java8
c) sudo apt install openjdk-8-jdk -y
d) install openjdk8
✅ Answer: c) sudo apt install openjdk-8-jdk -y
3.What is the correct command to extract the Pig tar file?
a) unzip pig-0.18.0.tar.gz
b) tar -xvzf pig-0.18.0.tar.gz
c) extract pig-0.18.0.tar.gz
d) tar -zvxf pig-0.18.0.tar
✅ Answer: b) tar -xvzf pig-0.18.0.tar.gz
4.Which environment variable defines Pig’s installation directory?
a) $JAVA_HOME
b) $HADOOP_HOME
c) $PIG_HOME
d) $PIG_PATH
✅ Answer: c) $PIG_HOME
5.Which file do you edit to set Pig’s environment variables?
a) .bash_profile
b) .bashrc
c) config.xml
d) environment.txt
✅ Answer: b) .bashrc
6.What command applies the newly set environment variables immediately?
a) reload ~/.bashrc
b) exec ~/.bashrc
c) source ~/.bashrc
d) refresh ~/.bashrc
✅ Answer: c) source ~/.bashrc
7.To verify Pig installation, you use:
a) pig -help
b) pig -v
c) pig -version
d) check pig
✅ Answer: c) pig -version
8.To start Pig in local mode, which command is correct?
a) pig -local
b) pig -x local
c) pig local
d) pig start local
✅ Answer: b) pig -x local
9.Which Pig command loads data from a text file?
a) LOAD
b) IMPORT
c) INPUT
d) FETCH
✅ Answer: a) LOAD
10.What does the command COUNT(lines) do in the Pig script?
a) Counts the number of words
b) Counts the number of files
c) Counts the occurrences of each word
d) Counts the number of lines only
✅ Answer: c) Counts the occurrences of each word

How to Use Apache Pig Commands in Hadoop

« 1 2 3 4 5 6 7 »