TechMediaToday
BigData cluster
Big Data

BigData cluster monitor – CDH how to enable HA for production deployments

1) Introduction

The term big data is buzzing across all industries and processing huge data is a big deal to extract trend and other meaningful information. The volume being generated has been unfathomable in the past few years. Hadoop plays a greater role in processing such huge data with commodity hardware. 

Hadoop is a distributed data processing system and we will be required to have more independent hardware to process data from gigabytes to petabytes. So, installing and managing such distributed applications requiring a number of automated scripts and resources to get it to work. 

Cloudera Manager makes this simple in managing distributed parallel processing Hadoop services as a cluster. Let us see how Cloudera manager manages Hadoop stack with ease and the importance of making CM highly available. 

2) About Cloudera Manager and its services

There are distributions available to manage Hadoop stack but Cloudera is the first one who released the commercial Hadoop distribution and it has been widely used.  Cloudera has two major services which do the installation, configuration, monitoring and management of the whole Hadoop stack.

A. Cloudera Manager (Cloudera Manager Server)

B. Cloudera Manager Services.

A. Cloudera Manager

Cloudera Manager is the agent-based application which controls the whole Hadoop cluster end to end. Agents are responsible for starting, stopping, configuring and unpacking individual hosts in the cluster through a web-based UI administrator. 

Cloudera Manager does the following management services:

  • State Management
  • Configuration Management
  • Process Management
  • Software Distribution Management
  • Host Management
  • Resource Management
  • User Management
  • Security Management

B. Cloudera Management Services

Cloudera Management Services collects various information from the agents installed in the host of the Hadoop cluster, agents collect host and service state information.

Based on the role, Cloudera offers the following services:

  • Activity Monitor – collects information about activities run by the MapReduce service. 
  • Host Monitor – collects health and metric information about hosts
  • Service Monitor – collects health and metric information about services and activity information from the YARN and Impala services
  • Event Server – aggregates relevant Hadoop events and makes them available for alerting and searching
  • Alert Publisher – generates and delivers alerts for certain types of events.

The above services are responsible for creating a state chart of the individual services running the cluster.

Also Read: Best Practices for Deploying Quality Mobile Apps

3) Role of CM and why HA required?

Organization manages the Hadoop cluster with hundreds of nodes and scaling the cluster in both horizontal and vertical bases on the data growth rate. Scaling and monitoring will be tedious and consumes more human resource and time to deep dive the log files in the absence of Cloudera manager and its services. 

Cloudera Manager relies on any of the RDBMS where the cluster related metadata were stored in a relational database to manage the Hadoop services. So, it’s important to preserve the Cloudera manager’s database to ensure the uninterrupted monitoring of the Hadoop cluster. 

As we said earlier, Cloudera manager controls the clusters end to end so we always ensure its high availability and we have to take extra care in preserving its databases. Let us see how to enable HA for both Cloudera manager and its services.

4) Enabling HA for Cloudera Manager Server

Cloudera Manager database can be externalized, and we do have more options to make the databases highly available. One such method is externalizing and replicating the databases. We discuss further steps to enable HA for Cloudera Manager. 

We implemented a slightly different approach which is also available on Cloudera’s official website. This is an active-passive method where we have not used any external load balancer to make this an automatic failover.

System Requirements:

  1. Anyone of the RDBMS like MySQL, Postgres or Oracle installed externally out of Hadoop nodes.
  2. Five Ubuntu 14.04 LTS installed servers.
  3. 32 RAM for master nodes and 16 GB RAM for slave machines.
  4. Cloudera Manager installer bin.
  5. One of the above machines is considered as an NFS server to mount the service directories

Steps to configure Cloudera Manager HA.

a) Setting up NFS Mounts for Cloudera Manager Server

 1. On the NFS server.

       mkdir -p /media/cloudera-scm-server

 2. Mark these mounts by adding these lines to the /etc/exportsfile on the NFS server:

       /media/cloudera-scm-server host1(rw,sync,no_root_squash,no_subtree_check)

       /media/cloudera-scm-server host2(rw,sync,no_root_squash,no_subtree_check)

 3. Export the mounts by running the following command on the NFS server:

       $ exportfs -a

 4. Set up the filesystem mounts on host1 and host2:

  1. Stop the Cloudera Manager Server if it is running on either of the host1 or host2 hosts by running the following command:
    1. $ service cloudera-scm-server stop
  2. Make sure that the NFS mount helper is installed:
    1. $ apt-get install nfs-common
  3. Make sure that rpcbind is running and has been restarted:
    1. service rpcbind restart

 5. Create the mount points on both host1, host2:

  1. Set up the /var/lib/cloudera-scm-server directory on host1 and host2 hosts:
    • $ rm -rf /var/lib/cloudera-scm-server
    • $ mkdir -p /var/lib/cloudera-scm-server
  2. Mount the following directory to the NFS mounts, on both host1, host2:
    • $mount -t nfs NFS:/media/cloudera-scm-server /var/lib/cloudera-scm-server
  3. Setup Fstab To persist the mounts across restarts by editing the /etc/fstab file on host1 and   host2 and adding the following lines:
    • NFS:/media/cloudera-scm-server /var/lib/cloudera-scm-server nfs
      auto,noatime,nolock,intr,tcp,actimeo=1800 0 0          

b) Installing Cloudera Manager Server on the primary host

$ Sudo apt-get install Cloudera-scm-server 

You can now start the freshly-installed Cloudera Manager Server on host1:

$ service Cloudera-scm-server start

(Before proceeding, verify that you can access the Cloudera Manager Admin Console at http://host1:7180)

c) Installing Cloudera Manager Server on the secondary host

Setting up the Cloudera Manager Server secondary requires copying certain files from the primary to ensure that they are consistently initialized.

  1. On host2, install the Cloudera-manager-server package using,
    1. sudo apt-get install Cloudera-scm-server
  2. When setting up the database on the secondary, copy the /etc/Cloudera-scm-server/db.properties file from host host1 to host host2 at /etc/Cloudera-scm-server/db.properties

For example:

  • $ mkdir -p /etc/Cloudera-scm-server
  • $ scp host1:/etc/Cloudera-scm-server/db.properties /etc/Cloudera-scm-server/db.properties

d) Testing Failover

Test failover manually by using the following steps:

  1. Stop Cloudera-scm-server on your primary host (host1): $ service Cloudera-scm-server stop
  2. Start Cloudera-scm-server on your secondary host (host2): $ service Cloudera-scm-server start
  3. Wait a few minutes for the service to load, and then access the Cloudera Manager Admin Console through a web browser. Now, failback to the primary before configuring the Cloudera Management Service on your installation.
  4. Stop Cloudera-scm-server on your secondary machine (host2): $ service Cloudera-scm-server stop.
  5. Start  Cloudera-scm-server on your primary machine (host1): $ service Cloudera-scm-server start.
  6. Wait a few minutes for the service to load, and then access the Cloudera Manager Admin Console through a web browser, using the load-balanced hostname (for example: http://host2:7180).

Also Read: Creating an AR Fitness Application

5) Enabling HA for Cloudera Manager Services

NFS Mounts for Cloudera Management ServiceCreate directories on the NFS server:

1) Create directories on the NFS server:

    $ mkdir -p /cmservicedir/cloudera-host-monitor

    $ mkdir -p /cmservicedir/cloudera-scm-agent

    $ mkdir -p /cmservicedir/cloudera-scm-eventserver

    $ mkdir -p /cmservicedir/cloudera-scm-headlamp

    $ mkdir -p /cmservicedir/cloudera-service-monitor

    $ mkdir -p /cmservicedir/etc-cloudera-scm-agent

2) Mark these mounts by adding the following lines to the /etc/exports file on the NFS server:

    /cmservicedir/cloudera-host-monitor host1(rw,sync,no_root_squash,no_subtree_check)

    /cmservicedir/cloudera-scm-agent host1(rw,sync,no_root_squash,no_subtree_check)

    /cmservicedir/cloudera-scm-eventserver host1(rw,sync,no_root_squash,no_subtree_check)

    /cmservicedir/cloudera-scm-headlamp host1(rw,sync,no_root_squash,no_subtree_check)

    /cmservicedir/cloudera-service-monitor host1(rw,sync,no_root_squash,no_subtree_check)

    /cmservicedir/etc-cloudera-scm-agent host1(rw,sync,no_root_squash,no_subtree_check)

    /cmservicedir/cloudera-host-monitor host2(rw,sync,no_root_squash,no_subtree_check)

    /cmservicedir/cloudera-scm-agent host2(rw,sync,no_root_squash,no_subtree_check)

    /cmservicedir/cloudera-scm-eventserver host2(rw,sync,no_root_squash,no_subtree_check)

    /cmservicedir/cloudera-scm-headlamp host2(rw,sync,no_root_squash,no_subtree_check)

    /cmservicedir/cloudera-service-monitor host2(rw,sync,no_root_squash,no_subtree_check) 

    /cmservicedir/etc-cloudera-scm-agent host2(rw,sync,no_root_squash,no_subtree_check)

3) Export the mounts running the following command on the NFS server:

   $ exportfs -a

4) Set up the filesystem mounts on host1 and host2 hosts:

   a. $ apt-get install nfs-common

   b. Create the mount points on both host1 and host2:

$ mkdir -p /var/lib/cloudera-host-monitor

$ mkdir -p /var/lib/cloudera-scm-agent

$ mkdir -p /var/lib/cloudera-scm-eventserver

$ mkdir -p /var/lib/cloudera-scm-headlamp

$ mkdir -p /var/lib/cloudera-service-monitor

    $ mkdir -p /etc/cloudera-scm-agent

   c. Mount the following directories to the NFS mounts, on both host1 and host2 (NFS refers to the server NFS hostname or IP address):

$ mount -t nfs NFS:/cmservicedir/cloudera-host-monitor /var/lib/cloudera-host-monitor

$ mount -t nfs NFS:/cmservicedir/cloudera-scm-agent /var/lib/cloudera-scm-agent

$ mount -t nfs NFS:/cmservicedir/cloudera-scm-eventserver /var/lib/cloudera-scm-eventserver

$ mount -t nfs NFS:/cmservicedir/cloudera-scm-headlamp /var/lib/cloudera-scm-headlamp

$ mount -t nfs NFS:/cmservicedir/cloudera-service-monitor /var/lib/cloudera-service-monitor

$ mount -t nfs NFS:/cmservicedir/etc-cloudera-scm-agent /etc/cloudera-scm-agent

5) Set up fstab to persist the mounts across restarts. Edit the /etc/fstab file and add these lines:

NFS:/cmservicedir/cloudera-host-monitor /var/lib/cloudera-host-monitor nfs auto,noatime,nolock,intr,tcp,actimeo=1800 0 0

NFS:/cmservicedir/cloudera-scm-agent /var/lib/cloudera-scm-agent nfs auto,noatime,nolock,intr,tcp,actimeo=1800 0 0

NFS:/cmservicedir/cloudera-scm-eventserver /var/lib/cloudera-scm-eventserver nfs auto,noatime,nolock,intr,tcp,actimeo=1800 0 0

NFS:/cmservicedir/cloudera-scm-headlamp /var/lib/cloudera-scm-headlamp nfs auto,noatime,nolock,intr,tcp,actimeo=1800 0 0

NFS:/cmservicedir/cloudera-service-monitor /var/lib/cloudera-service-monitor nfs auto,noatime,nolock,intr,tcp,actimeo=1800 0 0

NFS:/cmservicedir/etc-cloudera-scm-agent /etc/cloudera-scm-agent nfs auto,noatime,nolock,intr,tcp,actimeo=1800 0 0

Installing the Primary

1)On the host1 install the Cloudera-manager-daemons and Cloudera-manager-agent packages:

  • $ Sudo apt-get install Cloudera-manager-daemons 
  • $ Sudo apt-get install Cloudera-manager-agent packages
  • Install oracle JDK-8, if it is not already installed

2) Update Cloudera-agent configuration,

Edit the /etc/cloudera-scm-agent/config.ini

server_host=host1

listening_hostname=host1

3) Make sure that the cloudera-scm user and the cloudera-scm group have access to the mounted directories under /var/lib,

  • $ chown -R cloudera-scm:cloudera-scm /var/lib/cloudera-scm-eventserver
  • $ chown -R cloudera-scm:cloudera-scm /var/lib/cloudera-service-monitor
  • $ chown -R cloudera-scm:cloudera-scm /var/lib/cloudera-host-monitor
  • $ chown -R cloudera-scm:cloudera-scm /var/lib/cloudera-scm-agent
  • $ chown -R cloudera-scm:cloudera-scm /var/lib/cloudera-scm-headlamp

4) Restart the agent on host1

          service Cloudera-scm-agent restart

5) Goto to Cloudera manager admin console running on host1 ex: host1:7180

  1. Go to the Hosts tab and make sure that a host with name <host1> is reported.
  2. Click Add Cloudera Management Services like Activity Monitor, Host Monitor, Service Monitor, Event Server, Alert Publisher on the host host1
    1. Use defaults for the storage directory for Host Monitor or Service Monitor. 

6) Check whether all the services are added without any errors.

Installing the Secondary

1) Stop all Cloudera Management Service roles using the Cloudera Manager Admin Console:

  • On the Home page, click the right side of Cloudera Management Service and select Stop.
  • Click Stop to confirm. The Command Details window shows the progress of stopping the roles.
  • When Command completed with n/n successful subcommands appears, the task is complete. Click Close.

2) Stop the Cloudera-scm-agent service on host1:

  • $ service Cloudera-scm-agent stop

3) On the host2 install the Cloudera-manager-daemons and Cloudera-manager-agent packages:

  • sudo apt-get install Cloudera-manager-daemons 
  • sudo apt-get install Cloudera-manager-agent packages
  • Install oracle JDK-8, if it is not already installed

4) Update Cloudera-agent configuration

Edit the/etc/cloudera-scm-agent/config.ini

server_host=host2

listening_hostname=host2

5) Goto to Cloudera manager admin console running on host1 ex: host1:7180

  1. Go to the Hosts tab and make sure that a host with name <host1> is reported.
  2. Click Add Cloudera Management Services like Activity Monitor, Host Monitor, Service Monitor, Event Server, Alert Publisher on the host host1
    1. Use defaults for the storage directory for Host Monitor or Service Monitor. 

6) Check whether all the services are added without any errors.

7) Restart the agent on host2

  • service Cloudera-scm-agent restart

Testing failover

Before checking failover, it is mandatory to ensure Cloudera manager, Cloudera-agent and Cloudera management service is running any one of the host host1/host2. 

  1. If host1 is considered as primary, on host1
    1. Stop the Cloudera-scm-agent services on host1 through host1’s Cloudera manager admin console
    2. Stop Cloudera-agent service using sudo service Cloudera-scm-agent hard_stop_confirmed
    3. stop Cloudera-server sudo service Cloudera-scm-server stop
  2. on host2,
    1. Start the Cloudera-scm-server service using sudo service Cloudera-scm-server start
    2. Start Cloudera-agent service using sudo service Cloudera-scm-agent start
    3. Start the Cloudera-scm-agent services on host2 through host2’s Cloudera manager admin console

Now check all the Cloudera services running fine and shows other Hadoop services up and running. 

 6) Conclusion 

The availability of Cloudera manager and its management service ensures the uninterrupted processing of big data on the Hadoop environment. It also helps us with scaling and monitoring the Hadoop architecture to process the data which is growing frequently in nature. 

Configuring high availability for Cloudera manager server and its management services keeps the Hadoop services up and running continuously and gives us a single point of monitoring regardless of cluster size.

Related posts

Why Augmented Analytics is the Future of the Data Industry

Asim Rahal

What is the Difference Between Data Science and Big Data Analytics?

Niti Sharma

What is Apache Cassandra? Features, Benefits and Applications

Team TMT

Unlock Some Essential Facts About ETL and Data Warehousing

Team TMT

Top 20 Best ETL Tools for 2020

Team TMT

Top 15 Best Data Mining Tools

Team TMT

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More