Using chef to build out a Hadoop cluster
After not posting for a while, I have about 3-4 posts that I'd like to get out there.
The first is about using chef to build a Hadoop cluster.
Chef is a configuration management tool that allows one to automate the process of provisioning servers. I had to create a Hadoop cluster of 4-5 servers and I wanted to use this opportunity to automate the process with chef.
I had to perform a series of the same steps on these Linux nodes:
The first step is to sign up for a Hosted Chef account on the Opscode site. An account is free for 5 nodes or less. Perform the following steps:
Click on your account and click "get private key" to download (private key).pem
Then install ruby and chef on your first host. Once you do the first host, you can quickly bootstrap the others.
Install Ruby:
Start by creating a new cookbook:
So start by creating your new recipe in your cookbook. I call it "hosts":
You should also upload this recipe to the Chef server:
So create a recipe to create the SSH key with empty password. I call it "sshlogin":
Now you can deploy this recipe:
Chef is a configuration management tool that allows one to automate the process of provisioning servers. I had to create a Hadoop cluster of 4-5 servers and I wanted to use this opportunity to automate the process with chef.
I had to perform a series of the same steps on these Linux nodes:
- Install ruby and chef
- Install Sun Java
- Install VMware Tools
- Install NTP
- Add its hostname to a shared /etc/hosts file
- Configure passwordless ssh login
Installing Chef and Ruby
I followed the steps in this link.The first step is to sign up for a Hosted Chef account on the Opscode site. An account is free for 5 nodes or less. Perform the following steps:
- Create a new organization
- Select "Generate knife config" to download knife.rb
- Select "Regenerate validation key" to download (validator).pem
sudo apt-get update sudo apt-get install ruby ruby-dev libopenssl-ruby rdoc ri irb build-essential wget ssl-cert git-coreInstall rubygems:
cd /tmp wget http://production.cf.rubygems.org/rubygems/rubygems-1.8.10.tgz tar zxf rubygems-1.8.10.tgz cd rubygems-1.8.10 sudo ruby setup.rb --no-format-executableInstall chef:
sudo gem install chef cd ~ git clone https://github.com/opscode/chef-repo.git mkdir -p ~/chef-repo/.chef cp (private key).pem ~/chef-repo/.chef cp (validator).pem ~/chef-repo/.chef cp knife.rb ~/chef-repo/.chefConnect to Hosted Chef and configure workstation as a client:
cd ~/chef-repo knife configure client ./client-config sudo mkdir /etc/chef sudo cp -r ~/chef-repo/client-config/* /etc/chef sudo chef-clientOnce the client is installed on the first host, you can bootstrap the clients on the other hosts by using this command, as described here. Bootstrap the other clients. This assumes you have created a user called hadoop who is the main hadoop user.
knife bootstrap (node IP) -x hadoop -P (password) --sudoRepeat this for all of your other chef nodes.
Installing some Chef recipes
Now that chef is installed on all the nodes, it's time to run some chef recipes. A recipe is a set of configuration instructions. In my case, I want to install some packages. I started with VMware Tools, Sun Java, and NTP.Start by creating a new cookbook:
knife cookbook create MYCOOKBOOKThen download some existing cookbooks from the Chef Repository.
knife cookbook site install vmtools knife cookbook site install java knife cookbook site install ntpAdd these recipes to each node's run list:
knife node run_list add NODE_NAME "recipe[java:sun]" knife node run_list add NODE_NAME "recipe[vmtools]" knife node run_list add NODE_NAME "recipe[ntp]"You'll then need to run "sudo chef-client" on each node to execute the run list and install these packages.
Populate /etc/hosts
The next step is to create a recipe that will populate the /etc/hosts file from the Chef repository. One of Hadoop's requirements is to store the name-IP mapping for every node in the cluster in /etc/hosts. The easiest way to do this is to populate /etc/hosts from the list of hosts that Chef knows about.So start by creating your new recipe in your cookbook. I call it "hosts":
knife cookbook create hostsYour cookbook will now have a subdirectory called hosts with some skeleton files already created. Create your default ruby script in hosts/recipes/default.rb:
# Gets list of names from all nodes in repository and rewrites /etc/hosts hosts = {} localhost = nil search(:node, "name:*", %w(ipaddress fqdn)) do |n| hosts[n["ipaddress"]] = n end template "/etc/hosts" do source "hosts.erb" mode 0644 variables(:hosts => hosts) endNow edit the hosts.erb file, stored in hosts/templates/default/hosts.erb:
hosts/templates/default/hosts.erb: 127.0.0.1 localhost <% @hosts.keys.sort.each do |ip| %> <%= ip %> <%= @hosts[ip]["fqdn"] %> <% end %>Now deploy it to all your hosts:
knife node run_list add NODE_NAME "recipe[hosts]"Don't forget to run "sudo chef-client" on each node.
You should also upload this recipe to the Chef server:
knife cookbook upload hosts
Installing passwordless ssh login
A Hadoop cluster requires passwordless ssh login between the master and its slave nodes. The easiest way to do this is to have each node create its own SSH keys with an empty password, and then copy the public keys for all nodes to the master node.So create a recipe to create the SSH key with empty password. I call it "sshlogin":
knife cookbook create sshloginCreate your default ruby script in sshlogin/recipes/default.rb:
# Create empty RSA password execute "ssh-keygen" do command "sudo -u hadoop ssh-keygen -q -t rsa -N '' -f /home/hadoop/.ssh/id_rsa" creates "/home/hadoop/.ssh/id_rsa" action :run end # Copy public key to node1; if key doesn't exist in authorized_keys, append it to this file execute <<EOF cat /home/hadoop/.ssh/id_rsa.pub | sudo -u hadoop ssh hadoop@node1 "(cat > /tmp/tmp.pubkey; mkdir -p .ssh; touch .ssh/authorized_keys; grep #{node[:fqdn]} .ssh/authorized_keys > /dev/null || cat /tmp/tmp.pubkey >> .ssh/authorized_keys; rm /tmp/tmp.pubkey)" EOFNote that when you run this recipe on each host, it will prompt you to type the password of node1 each time because you are essentially scp'ing the key to this master node.
Now you can deploy this recipe:
knife cookbook upload sshlogin knife node run_list add node2 "recipe[sshlogin]"Type this command to run the recipes on each host:
sudo chef-client
13 Comments:
I think that thanks for the valuabe information and insights you have so provided here. Discover More
The website is looking bit flashy and it catches the visitors eyes. A design is pretty simple .
office 2016 32 bit deutsch download
This is an awesome article a debt of gratitude is in order for sharing this instructive data. I will visit your website routinely for some most recent post. I will visit your website frequently for Some most recent post. www.website.com
Really useful information about hadoop, i have to know information about hadoop online training institutes.
amazing
jobbörse Neunkirchen
,
iso 27001 certification services
iso 27001 certification in delhi
ISO 9001 Certification in Noida
iso 22000 certification in Delhi
iso certification in noida
iso certification in delhi
ce certification in delhi
iso 14001 certification in delhi
iso 22000 certification cost
iso consultants in noida
we have provide the best fridge repair service.
Washing Machine Repair In Faridabad
LG Washing Machine Repair In Faridabad
Bosch Washing Machine Repair In Faridabad
Whirlpool Washing Machine Repair In Faridabad
Samsung Washing Machine Repair In Faridabad
Washing Machine Repair in Noida
godrej washing machine repair in noida
whirlpool Washing Machine Repair in Noida
IFB washing Machine Repair in Noida
LG Washing Machine Repair in Noida
we have provide the best ppc service.
ppc company in gurgaon
website designing company in Gurgaon
PPC company in Noida
seo company in gurgaon
PPC company in Mumbai
PPC company in Chandigarh
Digital Marketing Company
Rice Bags Manufacturers
Pouch Manufacturers
fertilizer bag manufacturers
Lyrics with music
Great Article. Thank you for sharing! Really an awesome post for every one.
IEEE Final Year projects Project Centers in Chennai are consistently sought after. Final Year Students Projects take a shot at them to improve their aptitudes, while specialists like the enjoyment in interfering with innovation. For experts, it's an alternate ball game through and through. Smaller than expected IEEE Final Year project centers ground for all fragments of CSE & IT engineers hoping to assemble. Final Year Project Domains for IT It gives you tips and rules that is progressively critical to consider while choosing any final year project point.
Spring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Spring Framework Corporate TRaining the authors explore the idea of using Java in Big Data platforms.
Specifically, Spring Framework provides various tasks are geared around preparing data for further analysis and visualization. Spring Training in Chennai
Thanks for sharing such a nice info.I hope you will share more information like this. please keep on sharing!
Inplant training in chennai
Inplant training in chennai
Inplant training in chennai for cse
Inplant training in chennai for ece
Inplant training in chennai for mechanical
Inplant training in chennai for ece students
Inplant training in chennai for eee
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. big data projects for students But it’s not the amount of data that’s important.Project Center in Chennai
Spring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Corporate TRaining Spring Framework the authors explore the idea of using Java in Big Data platforms.
Spring Training in Chennai
The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training
Post a Comment
Subscribe to Post Comments [Atom]
<< Home