Friday, 22 February 2013

Cloudera setup on a Multi-Node Cluster

Some days back I installed and configured Cloudera Manager and related services on 2 nodes with CentOS. I faced some issues while installation, so I thought to share those.

My Environment Setup:

CentOS release 6.2 (Final)
Cloudera Manager 3.7
Cloudera 4.1.3 (#456)

Prerequisites for Cloudera Manager:

  • Cloudera Manager is supported only on 64-bit Unix OS. Check Cloudera site for more details on supported OS.
  • For Hadoop to work properly, the same version of the same operating system must be installed on all cluster hosts.
  • Cluster hosts must have DNS and reverse DNS properly configured i.e. your /etc/hosts file on all nodes must be able to resolve IP and hostname. You should add IP addresses, FQDN of all nodes in  hosts file. I am pasting my hosts file content as a reference:
127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
10.16.1.125     bigdata1.localdomain bigdata1
10.16.1.126     bigdata2.localdomain bigdata2
10.16.1.127     bigdata3.localdomain bigdata3
10.16.1.128     bigdata4.localdomain bigdata4
          You should copy the same hosts file to all nodes.
  • The Cloudera Manager Server must have SSH access to the cluster hosts when you run the installation or upgrade wizard.
  • Disable SELinux (set SELinux=disabled in /etc/selinux/config).
  • Make sure port 7180 is open. I generally disable iptables during installation and and configure the ports after installation is complete. You can check Configuring Ports for Cloudera Manager Free Edition for more information on required ports.
  • It is advisable to set a local Cloudera repository to save network bandwidth and your time, follow this link Creating a Local Yum Repository.
          Different Cloudera repo files can be downloaded directly to CentOS using below commands:
wget http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/cloudera-cdh4.repo
wget http://archive.cloudera.com/cloudera-manager/redhat/6/x86_64/cloudera-manager/cloudera-manager.repo
wget http://beta.cloudera.com/impala/redhat/6/x86_64/impala/cloudera-impala.repo
Note: It is not required to install JDK manually, Cloudera Manager (CM) will download it from its own repository.

Install Cloudera Manager:

To install Cloudera Manager Free Edition, you will:
  • Install the Cloudera Manager Server on one cluster host machine. 
  • Install CDH and the Cloudera Manager Agents on the other cluster host machines. 
Download and Install Cloudera Manager:
  • Download cloudera-manager-installer.bin from the Cloudera Downloads page on a single node.
  • Give executable permission to cloudera-manager-installer.bin:
chmod u+x cloudera-manager-installer.bin
  • Run cloudera-manager-installer.bin.
  • Note: To use local repositoriesrun the cloudera-manager-installer.bin with the --skip_repo_package=1 option.
sudo ./cloudera-manager-installer.bin --skip_repo_package=1
  • Read and accept the licenses.
  • Note the complete URL provided for the Cloudera Manager Admin Console, including the port
    number, which is 7180 by default.
Start the Cloudera Manager Admin Console
  • In a web browser, enter the URL, including the port, for the Cloudera Server.
  • Log into Cloudera Manager. The default credentials are:
Username: admin
Password: admin

Use CM for Automated CDH Installation and Configuration:

Step1: Enter the IP address of hosts and click on Search.

Step2: Next screen will show the matched hosts and display if CDH is already running on them. Click on Install CDH on Selected Hosts.
Step3: In next screen, you can choose versions of CDH, Impala and CM which you want to install on your hosts. Please remember to use local repository which we created earlier.

Step4: In next screen, enter ssh login credentials (I used root user for installation) and click on Start Installation.

Step5: It will take some time for installation on all the nodes, you can see below screen on successful installation:

Step6: In next screen, choose the services which you want to start on your cluster:
 

Step7: In screens to follow you will get options to change default configuration for the services. Leave it unchanged unless you are sure of your actions. Suggest to take a note of all these values.

Step8: After all configuration CM will start the services on selected hosts:

Step9: Click on continue:
 
And everything is done :)

Hope this visual guide helps some readers to setup Cloudera cluster easily and more efficiently.

No comments:

Post a Comment