[Revisit] Intall Hadoop and Hive on Ubuntu as a single-node Hadoop cluster

I installed old 0.17 version at least one year ago. But thing changes so fast so the new 0.20 version introduces new API (I like the new one, very clean and easy to understand). Also Ubuntu reaches to 10.04 LTS – the Lucid Lynx – release. I keep using Cloudear VM for years (Save lots of time if you want to have a good first before jump into it and for demonstration). It handles all “dirty works” like installation and configuration for us. But this time I will build everything from scratch inside my  new SSD drive.

I do not want to simply copy the steps from other blogs here. So Please refer to the first link for a detail instructions (This is really great!). I try to add some tips or trouble shootings here:


fxu@fxu-t60:~$ sudo update-java-alternatives -s java-6-sun
update-alternatives: error: no alternatives for mozilla-javaplugin.so.
update-alternatives: error: no alternatives for xulrunner-1.9-javaplugin.so.
update-alternatives: error: no alternatives for mozilla-javaplugin.so.
update-alternatives: error: no alternatives for xulrunner-1.9-javaplugin.so.

You can use install new firefox plug-in

sudo apt-get install sun-java6-plugin

To solve another error, please refer to link 3.

You also need to setup JAVA_HOME and PATH variable. Open your $HOME/.bash_profile or /etc/profile (system wide) configuration. Open your .bash_profile file:

$ vi $HOME/.bash_profile

Append following line:

export JAVA_HOME=/usr/lib/jvm/java-6-sun
export PATH=$PATH:$JAVA_HOME/bin

My Ubuntu does not install the ssh server yet, type following command:

sudo apt-get install openssh-server

Type “exit” to quit the ssh login session.

But you can not directly run hadoop command via terminal unless you specify /usr/local/hadoop/bin/hadoop. To solve this and also give hadoop root privilege:

sudo adduser hadoop admin

export PATH=$PATH:/usr/local/hadoop/bin

Install the Hadoop is straightforward:

hadoop@fxu-t60:/usr/local$ sudo tar xzf /home/hadoop/Desktop/Software/hadoop-0.18.3.tar.gz
sudo chown -R hadoop:hadoop hadoop-0.18.3/

How to start Hadoop after reboot your system?
assume Hadoop is installed at /usr/local/hadoop
Login the system as user hadoop
type:

cd /usr/local/hadoop/bin/
./start-all.sh

./hadoop fs -ls
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2011-02-17 12:08 /user/hadoop/wordcount-output
-rw-r--r--   1 hadoop supergroup         44 2011-02-17 12:07 /user/hadoop/wordcountTest.txt

I install Hive at “/usr/local/hive” from a Stable Release not the source code, and run it using user “hadoop” not “root”. By default hive uses a directory called “/user/hive/warehouse”. You can change it via editing “/usr/local/hive/conf/conf/hive-default.xml”. I keep the default one. Add hadoop path before you can run it:

export PATH=$PATH:/usr/local/hadoop/bin

When I try to run hive, I got following error:

hadoop@fxu-t60:/usr/local/hive$ bin/hive
Invalid maximum heap size: -Xmx4096m
The specified size exceeds the maximum representable size.
Could not create the Java virtual machine.
hadoop@fxu-t60:/usr/local/hive$ bin/hive -- service hiveserver
Invalid maximum heap size: -Xmx4096m
The specified size exceeds the maximum representable size.
Could not create the Java virtual machine.

To solve this, modify “hive/bin/ext/util/execHiveCmd.sh” HADOOP_HEAPSIZE=4096 to a proper size according to your machine and there is no need to create related directory in HDFS before a table can be created in Hive 0.6 now.

hadoop@fxu-t60:/usr/local/hive$ vi bin/ext/util/execHiveCmd.sh
hadoop@fxu-t60:/usr/local/hive$ bin/hive
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201102270944_934721366.txt
hive> create table pokes (foo INT, bar STRING);
OK
Time taken: 7.305 seconds
hive>

reference:

  1. http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
  2. http://www.hackido.com/2010/05/install-hadoop-and-hive-on-ubuntu-lucid.html
  3. http://ubuntuforums.org/showthread.php?t=831235
  4. http://wiki.apache.org/hadoop/Hive/GettingStarted
Advertisements

One thought on “[Revisit] Intall Hadoop and Hive on Ubuntu as a single-node Hadoop cluster

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s