Monday, May 16, 2011

Building a Hadoop cluster

I've recently had to build a Hadoop cluster for a class in information retrieval. My final project involved building a Hadoop cluster.

Here are some of my notes on configuring the nodes in the cluster.

These links on configuring a single node cluster and multi node cluster were the most helpful.

I downloaded the latest Hadoop distribution then moved it into /hadoop. I had problems with this latest distribution (v.21) so I used v.20 instead.

Here are the configuration files I changed:

core-site.xml:
  
    fs.default.name
    hdfs://master:9000
  
  
    hadoop.tmp.dir
    /hadoop/tmp
    A base for other temporary directories.
  

hadoop-env.sh:
# Variables required by Mahout
export HADOOP_HOME=/hadoop
export HADOOP_CONF_DIR=/hadoop/conf
export MAHOUT_HOME=/Users/rpark/mahout
PATH=/hadoop/bin:/Users/rpark/mahout/bin:$PATH

# The java implementation to use.  Required.
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home

hdfs-site.xml:
  
    dfs.replication
    3
  

mapred-site.xml:
  
    mapred.job.tracker
    master:9001
  

masters:
master

slaves:
master
slave1
slave2
slave3
slave4

Be sure to enable password-less ssh between master and slaves. Use this command to create an SSH key with an empty password:
ssh-keygen -t rsa -P ""

Enable password-less ssh login for the master to itself:
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Then copy id_rsa.pub to each slave and do the same with each slave's authorized_keys file.

I ran into a few errors along the way. Here is an error that gave me a lot of trouble in the datanode logs:
2011-05-08 01:04:30,032 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_1804860059826635300_1001 received exception org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_1804860059826635300_1001 is valid, and cannot be written to.

The solution was to use hostnames every time I referenced a host, either itself or a remote host. I set a host's own name in /etc/hostname and the others in /etc/hosts. I used these hostnames in /hadoop/conf/masters, slaves, and the various conf files.

Every so often I ran into this error in the datanode logs:
... ERROR org.apache.hadoop.dfs.DataNode: java.io.IOException: Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID = 308967713; datanode namespaceID = 113030094
        at org.apache.hadoop.dfs.DataStorage.doTransition(DataStorage.java:281)
        at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:121)
        at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:230)
        at org.apache.hadoop.dfs.DataNode.(DataNode.java:199)
        at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:1202)
        at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1146)
        at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:1167)
        at org.apache.hadoop.dfs.DataNode.main(DataNode.java:1326)

I fixed this by deleting tmp/dfs/data on the datanodes where I saw the error. Unfortunately, I had to reformat the HDFS volume after I did this.

I had to raise the ulimit for open files. On Ubuntu nodes I edited /etc/security/limits.conf:
rpark  soft nofile  8192
rpark  hard nofile  8192

For OS X nodes I just edited ~/.profile:
ulimit -n 8192

I ran into this error when copying data into HDFS:
could only be replicated to 0 nodes, instead of 1

The solution was simply to wait for the datanode to start up. I usually saw the error when I immediately copied data into HDFS after starting the cluster.

Port 50070 on the namenode gave me a Web UI to tell me how many nodes were in the cluster. This was very useful.

35 Comments:

Blogger Tanu Chauhan said...

Thanks To Share This Very Useful Information With Us. php training in jalandhar

May 19, 2017 at 3:39 AM  
Blogger santhosh kumar said...

The blog gave me idea to build the hadoop cluster My sincere thanks for sharing this post and please continue to share this post
Hadoop Training in Chennai

June 23, 2017 at 11:40 PM  
Blogger vignesjoseph said...

To Setup docker on your computer. To Serve up a Hadoop cluster utilizing the log/Hadoop image. All preparation can be found here: big data foundation/docker-Hadoop.you will require to start up several cases here, i.e. name-node, data-node, secondary-name-node, yarn.Bang! You got a Hadoop cluster at home.Find your our docker IP. On Mac, you can do (I use Mac) config Ge Tifa DDR en0
If want become a to learn for Java Training.We have to real-time training and 100% job assistance and it's live instructor trained for real-time scenario and they explain about the all latest version update for Java Training Course, to reach us Java Training in Chennai | Java Training Institute in Chennai

June 24, 2017 at 11:29 PM  
Blogger Logavani G said...

really nice blog has been shared by you. before i read this blog i didn't have any knowledge about this but now i got some knowledge. so keep on sharing such kind of an interesting blogs.
hadoop training in chennai

June 27, 2017 at 2:01 AM  
Blogger Krishna Veni said...

Really impressive and informative blog post, thanks for sharing your views and ideas..

Java Training in chennai |
Java Course in chennai | Hadoop Training in chennai

July 12, 2017 at 6:49 AM  
Blogger ranasing rajkumar said...

Well Said, you have provided the right info that will be beneficial to somebody at all time. Thanks for sharing your valuable Ideas to our vision



August 23, 2017 at 1:01 AM  
Blogger Ishu Sathya said...


Very informative!! Thank you for this nice blog on JAVA programming language.


Java Training |
Java Courses in Chennai

September 18, 2017 at 3:09 AM  
Blogger Anoushka Sakthi said...

Wonderful Blog!!! Your post is very informative about the latest technology. Thank you for sharing the article with us.

Best Hadoop Training in Chennai |
Hadoop Training in Chennai

September 19, 2017 at 3:38 AM  
Blogger jhansi joe said...

I found this content is useful to me to learn something advanced, thanks admin for your informative post :)
Regards,
Best JAVA Training in Chennai|Best JAVA Training institute in Chennai

September 20, 2017 at 3:15 AM  
Blogger Shalini Mudhalayar said...


Wonderful information on recent technology. Keep following my profile to know about the Software courses like Selenium testing.

Selenium Training in Velachery |
Selenium Course in Chennai

September 20, 2017 at 5:05 AM  
Blogger jhansi joe said...

Great and impressive article!! Got to learn and know more about web development. To know more refer create website for much more unique ideas.PHP Training in Chennai | Best PHP training in Chennai

September 25, 2017 at 5:24 AM  
Blogger ALINAAMEL said...

Awesome blog. I enjoyed reading your articles. This is truly a great read for me.
Suchmaschinenoptimierung in Lüdenscheid

October 6, 2017 at 5:30 AM  
Blogger srihariparu said...

Your Blog is really wonderful..I have read your article,its very useful to us..keep updating..
Final Year Project Center in Chennai | IEEE Project Center in Chennai

October 12, 2017 at 6:57 AM  
Blogger sreelakshmi hospitals said...

Great post. I found your website perfect for my needs.
Best Kidney Doctor in Bangalore

October 21, 2017 at 9:30 AM  
Blogger Melisa said...

Hi Admin, I went through your article and it’s totally awesome. You can consider including RSS feed for easy content sharing, So that you can drive huge traffic to your blog. Hadoop Training in Chennai | Big Data Training in Chennai

October 31, 2017 at 3:16 AM  
Blogger Careen joseph said...

My spouse and I love your blog and find almost all of your post’s to be just what I’m looking for. can you offer guest writers to write content for you? I wouldn’t mind producing a post or elaborating on some the subjects you write concerning here. Again, awesome weblog!
digital training in chennai

November 2, 2017 at 11:18 PM  
Blogger Melisa said...

Highly precious informative, thanks for that.
Regards,
DOT NET Training Chennai|Dot net courses in chennai 

November 4, 2017 at 4:58 AM  
Blogger annika romy said...

The website is looking bit flashy and it catches the visitors eyes. A design is pretty simple .
office 2016 32 bit deutsch download

November 6, 2017 at 11:42 PM  
Blogger annika romy said...

Interesting post! This is really helpful for me. I like it! Thanks for sharing!
office 2016 32 bit deutsch download

November 11, 2017 at 3:27 AM  
Blogger katrinahelen said...

Informative and impressive. Keep Updating
jobbörse südwestfalen

November 12, 2017 at 9:21 PM  
Blogger Adam lee said...

I‘d mention that most of us visitors are endowed to exist in a fabulous place with very many wonderful individuals with very helpful things.
hadoop training in bangalore
hadoop training in chennai

November 13, 2017 at 3:22 AM  
Blogger Melisa said...

Nice tutorial on android technology hats-off to your effort. Your article explained the potential of android technology in coming years. Android Training in Chennai|Android Course in Chennai

November 15, 2017 at 5:01 AM  
Blogger ALINAAMEL said...

Interesting post! This is really helpful for me. I like it! Thanks for sharing!

Webdesign Lüdenscheid

November 18, 2017 at 1:35 AM  
Blogger Careen joseph said...

Thank you a lot for providing individuals with a very spectacular possibility to read critical reviews from this site.
sas training in bangalore

November 20, 2017 at 1:28 AM  
Blogger annika romy said...

Manhattan Fish Market Doha

November 27, 2017 at 1:41 AM  
Blogger venkata chalapathy said...

very informative and impressive blog..Keep sharing. Linux Certification Training in Chennai | Microsoft Certification Training in Chennai | MCSA Training in Chennai | MCSE Training in Chennai | Hardware and Networking Training in Chennai

November 27, 2017 at 2:10 AM  
Blogger Amirtha rao said...

The expansion of internet and intelligence in business process lead the way to huge volume of data. It is important to maintain and process these data to be efficient in data handling. Hadoop Training in Chennai | Big Data Training in Chennai

November 29, 2017 at 5:47 AM  
Blogger annika romy said...

Thanks For Your valuable posting, it was very informative
office 2010 professional download

December 2, 2017 at 1:12 AM  
Blogger giselle aga said...

I accept there are numerous more pleasurable open doors ahead for people that took a gander at your site.
Hadoop Training in Marathahalli

December 4, 2017 at 12:09 AM  
Blogger katrinahelen said...

Your website content nice nice and interesting to observe.
jobbörse Neunkirchen

December 7, 2017 at 1:40 AM  
Blogger katrinahelen said...

Really nice post.Thanks for sharing.
personalrekrutierung

December 8, 2017 at 2:59 AM  
Blogger Thanuja Sri said...

Really useful information about hadoop, i have to know information about hadoop online training institutes.

December 8, 2017 at 11:03 AM  
Blogger adaliadella said...

I like this post .Thanks for sharing.
Internet Marketing Dienstleistungen in Deutschland

December 11, 2017 at 10:29 PM  
Blogger adaliadella said...

I like this post .Thanks for sharing.
Internet Marketing Dienstleistungen in Deutschland

December 11, 2017 at 10:30 PM  
Blogger adaliadella said...

I like this post .Thanks for sharing.
Internet Marketing Dienstleistungen in Deutschland

December 11, 2017 at 10:31 PM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home