Sei sulla pagina 1di 17

12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

Products
Products Industries
Industries Support
Support Training
Training Community
Community Developer
Developer Partner
Partner

About
About

Home / Community / Blogs + Actions

Hadoop Installation on Monsoon


December 5, 2017 | 90 Views |

Mike DeHart
more by this author

SAP Vora
cluster | hadoop | hdp | hortonworks | monsoon

share
0 share
1 tweet share
4 like
0

Follow

Description:

This tutorial is offered as a quick and dirty guide to installing


a Hadoop 3-node cluster on SAP Monsoon. Most
https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 1/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

configurations are kept close to default, and as such this


guide is ideal for development and testing environments.

Due to some proxy constraints on Monsoon, I will note


workarounds or additional steps related to Monsoon with the
label Monsoon only.

Environment:

Nodes: 4 CPU / 16G RAM x 3

OS: SUSE 12 SP01

HortonWorks Ambari 2.6.0

Contents:

1. Setting up the cluster


2. Prerequisites
3. Ambari Installation
4. HDP installation

1. Setting up the cluster

I will be using SAPs Monsoon in order to provision the


servers where Hadoop will be installed. If you do not have
access to space on Monsoon clusters can be provisioned
using another cloud platform (AWS, Azure, etc.), but will not
be covered in this tutorial.

Ambari will run on nodes with as little as 8 GB RAM, but


more is recommended especially if the cluster will be used
for testing.

Generally 3 medium (4 CPU / 16 GB) nodes is


recommended. Type: medium_4_16

https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 2/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

In this instance I have one master node of type HANA:


hana_4_32 (4 CPU / 32 GB) and two medium nodes for my
workers:

All nodes are running SuSE 12 SP01. HDP 2.6.2 can


support SUSE (64-bit) 11.3, 11.4, 12.1, and 12.2. Make sure
all nodes are running the same operating system and patch
level.

Make sure your SSH key is provisioned for these nodes.


Under your user profile, on the Authentication tab, create an
SSH key if one is not already created and click Provision. If
your are creating a new key, be sure to save your private
key to a local file as there is no way to get it back!

Save your private key to your local machine. Well be using


PuTTY to connect to our nodes. You will need only
putty.exe client and puttygen.exe in order to convert our
private key to a readable format.

From your local user folder, create a folder named .ssh. On


Windows this will likely have to be done through the

https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 3/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

command-line as it doesnt like to create folder that begin


with a period.

mkdir "C:\Users\<username>\.ssh"

In the new .ssh folder is where well store our private key.
Once saved, launch puttygen.exe.

Click on the Load button and select your saved private key
(if your key isnt listed, make sure All Files is selected in the
loading screen.)

Once loaded, select Save Private Key select Yes when


prompted to save the key without a passphrase, and save it
to a file in your .ssh directory.

This new file is saved as a .ppk file and is what well use with
PuTTY to connect to our servers.

Once saved, launch putty.exe. On the main page specify


the hostname for one of the nodes,

Under Connection > Data specify your username under


Auto-login username.

Under Connection > SSH > Auth make sure:

Allow agent forwarding is checked


https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 4/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

Under Private Key for Authentication, browse to your


puttygen-created private key file

Add any other customization you want (appearance, select


behavior, etc.) then navigate back to the Session page, give
a name to your profile and click Save.

Once the profile is saved, click Open to connect to the node.

Once connection is successful, repeat the above process for


the other two nodes.

2. Prerequisites

Now that we can connect via SSH to all three nodes, we will
do a quick update and create our administration user.

Monsoon instance users are pre-configured with


passwordless sudo access as long as your user is part of
the sysadmin group on the server.

This means issuing a sudo su command will allow you to run


as root.

https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 5/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

On all three nodes well first do an update to make sure


were running the latest version:

> sudo su
# zypper update -t patch

Since we already have a sysadmin group with password-


less sudo access, we need to only create a new user and
make sure it is added to the sysadmin group (as well as any
other groups you may need). Im naming my user
cadmin (cluster admin):

# /usr/sbin/useradd -m -g users -G sysadmin,monsoon cad

Create this user on all three nodes in the cluster.

Well use this user to connect our servers when installing our
Ambari cluster. As such, this user needs password-less
connectivity to all nodes as well as sudo access.

From your primary node, well create an RSA key for the
new cadmin user to allow key-based SSH authentication.

Log in as the cadmin user and run ssh-keygen to create this


RSA key:

> sudo su cadmin


> ssh-keygen -t rsa
[ENTER]
[ENTER]

Two files are created in the users .ssh folder


(/home/cadmin/.ssh/):

id_rsa this is the private key. Well need this during Ambari
installation so save this to notepad so we can access it
quickly later.

https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 6/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

id_rsa.pub this is the public key, well need to add this to


an authorized_keys file on all nodes in order for cadmin to
connect using the private key.

The authorized_keys file is located in the users .ssh folder


and is read whenever that user is trying to connect to the
server. As such well first copy this to the file on our main
node:

> cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys

Save this public key to notepad as well since we will add it to


the other two nodes under cadmin home directory.

Connect to your other nodes via PuTTY and run:

> sudo su cadmin


> echo "XXXXXX" > ~/.ssh/authorized_keys

Where XXXXXX is the cadmin public key saved from the


above step.

You can run cat on the file to make sure it was written
correctly:

> cat ~/.ssh/authorized_keys

Now we can test to make sure cadmin is able to connect.

From your primary node (where you first generated the


keypair) run:

> ssh cadmin@<NODE-2>

https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 7/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

Where NODE-2 is the hostname of one of your worker


nodes. You may get a prompt regarding the authenticity of
the host, answer yes and you should be connected.

If the key was not accepted or it prompts you for a


password, double-check that the public key is listed in the
authorized_keys file and try troubleshooting via this link.

3. Ambari Installation

Assuming we now have a working cadmin user, in this


section well add the Ambari repository and install the cluster
manager.

Ambari manager will only be installed on our primary


(master) node, so the below steps only need to be applied
once:

First, see this HortonWorks Ambari Repositories page and


copy the Repo File link for your flavor of OS. In my case,
for SLES 12.1, my link is:

http://public-repo-
1.hortonworks.com/ambari/sles12/2.x/updates/2.6.0.0/
ambari.repo

Connect via SSH to your primary node, if you arent already,


and issue the following:

> sudo su
# cd /etc/zypp/repos.d
# wget http://public-repo-1.hortonworks.com/ambari/sles
# zypper ref

This will add the ambari repository to zypper package


manager and refresh the repository list. You should see a
https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 8/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

line after the refresh pulling packages from ambari Version


ambari-2.6.0.0 repository.

Now well install ambari server:

# zypper install ambari-server

Once installed, run the below to setup Ambari (as root):

# ambari-server setup

Accept the defaults for all prompts.

****Monsoon Only****:

Due to the built-in proxy, in some Monsoon instances


ambari-server setup will be unable to get JDK 1.8 or the JCE
policy files from the public internet. The easiest workaround
for this is to kill the setup process (Ctrl-C) and manually use
curl or wget to download and save the files to their
respective directories.

The setup output will hang after a prompt similar to:

Downloading JDK from http://public-repo-


1.hortonworks.com/ARTIFACTS/jdk-8u112-linux-
x64.tar.gz to /var/lib/ambari-server/resources/jdk-8u112-
linux-x64.tar.gz

In this case, after killing the process a simple wget command


will use the correct OS proxy to obtain the file:

# wget http://public-repo-1.hortonworks.com/ARTIFACTS/j

And again for the JCE Policy file:

Downloading JCE Policy archive from http://public-repo-


1.hortonworks.com/ARTIFACTS/jce_policy-8.zip to
https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 9/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

/var/lib/ambari-server/resources/jce_policy-8.zip

# wget http://public-repo-1.hortonworks.com/ARTIFACTS/j

Finally re-run the setup command and both files should be


picked up.

********

Once setup completes, restart ambari server and in the next


section we will install Hadoop services:

# /usr/sbin/ambari-server restart

4. HDP Installation

Now that we have the Ambari manager running, we can


access the UI from the web via port 8080:

http://<node1>.mo.sap.corp:8080

Once the page loads, you can log in with the default
credentials:

Username: admin

Password: admin

https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 10/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

Once logged in, you can access the Users link on the left to
change the admin password if desired.

Otherwise, click on Launch Install Wizard to begin creating


your Hadoop cluster.

Enter a name for your cluster and click next.

Make sure Use Public Repository is selected. If it is not


this may be due to a proxy issue (especially on Monsoon)
see below.

****Monsoon Only****:

By default, Ambari wont be able to read the public repository


until we update the proxy.

Close the UI and stop the Ambari server:

> sudo ambari-server stop

We must add the proxy to /var/lib/ambari-server/ambari-


env.sh

Open the file and under AMBARI_JVM_ARGS we need to


add the following:

-Dhttp.proxyHost=<yourProxyHost> -Dhttp.proxyPort=<your

To confirm your OS-level proxy you can issue:

> echo $http_proxy

Which should provide the host and port to enter under


AMBARI_JVM_ARGS.

For more advanced proxy configurations or proxies that


require authentication, see the HortonWorks documentation.

https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 11/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

Once added, save the file and restart the ambari server:

> sudo ambari-server start

********

Under Select Version select your HDP version, in my case


HDP 2.6, and click Next.

Under Install Options we need to enter the domains of all


three of our hosts as well as connectivity information
(remember that cadmin private key I told you to save?)

Add all three fully-qualified Monsoon domains to the Target


Hosts text box and copy/paste the cadmin private key under
Host Registration Information. Make sure to update the user
from root to cadmin as well:

Then click Register and Confirm to continue.

At this point, Ambari will connect to and provision the hosts


in the cluster. If any errors occur click the Failed status to

https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 12/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

view the install log and troubleshoot further via the web.

In my case, registration failed with an error <host> failed due


to EOF occurred in violation of protocol (_ssl.c:661)

From a web search I was able to fix the issue by adding:

force_https_protocol=PROTOCOL_TLSv1_2

Under the [security] section of /etc/ambari-


agent/conf/ambari-agent.ini on all nodes.

Once all nodes succeed, you can see the results of all health
checks and address any other warnings that may have been
raised.

When finished, click Next.

Here is where we choose the services for your Hadoop


installation. Services chosen will differ depending on your
needs.

HDFS and Yarn are required for the majority of Hadoop


installations, in my case I am using this for Vora and Spark
testing so Ive selected:

HDFS, YARN + MR2, Tez, Hive, ZooKeeper, Pig, Spark, and


Ambari Metrics.

Any prerequisites that are needed for selected services will


automatically be added.

Next, we can assign services to their respective nodes. In


most situations these can remain the defaults:

https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 13/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

On the next page we can assign slaves and clients to our


nodes. Generally, it is a good idea to assign more rather
than less. I assign Clients, Sparkservers, NodeManager, and
Datanodes to all nodes.

Next we have to configure all the services. There will be


errors that need addressing, indicated by red circles:

Most of these are easily fixed by taking out the directories


starting with /home, with the exception of Hive.

Hive requires we set up a database. For this well use


PostgreSQL.

SSH to your master node and log in as root:

https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 14/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

> sudo su
# zypper install postgresql-jdbc
# ls /usr/share/java/postgresql-jdbc.jar
# chmod 644 /usr/share/java/postgresql-jdbc.jar
# ambari-server setup --jdbc-db=postgres --jdbc-driver=

Now we need to log in to postgres, create our database and


user / password. In this case were using hive for all three:

# sudo su postgres
> psql
postgres=# create database hive;
postgres=# create user hive with password 'hive';
postgres=# grant all privileges on database hive to hiv
postgres=# \q
> exit

Now we just need to backup and update the pg_hba


configuration file:

As root:

# cp /var/lib/pgsql/data/pg_hba.conf /var/lib/pgsql/dat
# vi /var/lib/pgsql/data/pg_hba.conf

Add hive to the list of users at the bottom of the file (so it
reads hive,ambari,mapred)

Save and exit with :wq

Then restart postgres:

https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 15/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

> sudo service postgresql restart

Now, back to the cluster setup, select Existing PostgreSQL


Database and make sure hive is set for the DB name,
username, and password.

Make sure the Database URL also correctly reflects the


node where we installed and configured postgresql and test
the database connection.

Once successful, click Next and the deployment should


begin.

Similar to when we registered the hosts, the logs for any


failures can be viewed by clicking on the respective Failed
status.

Possible errors are too vast to cover here, but web searches
or searches of the hortonworks forums will most likely
provide answers.

Once all deployments are successful, click Next to access


the Ambari dashboard and view your services. Any alerts
can also be addressed and service customization can be
configured.

Congratulations! You now have a deployed Hadoop cluster!

https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 16/17
12/7/2017 Hadoop Installation on Monsoon | SAP Blogs

Alert Moderator

Be the first to leave a comment

Add Comment

Share & Follow


Privacy Terms of Use Legal Disclosure Copyright Trademark Sitemap Newsletter

https://blogs.sap.com/2017/12/05/hadoop-installation-on-monsoon/ 17/17

Potrebbero piacerti anche