Fujitsu
Not logged in » Login
X

Please login

Please log in with your Fujitsu Partner Account.

Login


» Forgot password

Register now

If you do not have a Fujitsu Partner Account, please register for a new account.

» Register now
Sep 19 2017

Practical Tips for SPARC: Using Hadoop and Spark on SPARC Servers/Solaris Platform – Configuring Hadoop Cluster Environments (Part 1)

/data/www/ctec-live/application/public/media/images/blogimages/32136_Fujitsu_M10-1_front_view_3D_open_scr.jpg

In our previous blogs, we explained how to configure a Hadoop Single Node environment here and here. In today's edition, we'll explain the introductory steps that are necessary for configuring a Hadoop Cluster Node environment. This is part 1 of a three-piece series.

About Hadoop Cluster
In Hadoop cluster, Zookeeper (a separate download) provides the distributed synchronization needed to coordinate multiple NameNode and ResourceManager nodes while the actual NameNode information is synchronized between active and standby nodes by JournalNode.
The minimum number configuration of nodes within a single cluster for both Zookeeper and JournalNode is three. Please be careful when preparing the environment.

For more details about Zookeeper, please refer to:
https://zookeeper.apache.org/

System Configuration

Image

 

In order to satisfy the redundancy requirement of the Hadoop cluster, three physical nodes should be arranged. Each Hadoop environment is configured as a guest domain using "Oracle VM for SPARC".
The guest domains are as follows.

Physical node 1
ctrl-node1 (for NameNode, ResourceManager, JobHistoryServer, Zookeeper, JournalNode)
data-node1 (for DataNode, NodeManager)
Physical node 2
ctrl-node2 (for NameNode, ResourceManager, Zookeeper, JournalNode)
data-node2 (for DataNode, NodeManager)
Physical node 3
jrnl-node (for Zookeeper, JournalNode)

Testing configuration is the same as the previous one. In the cluster environment, Zookeeper is included.

  • Server Fujitsu M10-1
  • OS Oracle Solaris11.3 SRU 10.7
  • Java jdk 1.7.0_111
  • Hadoop 2.7.3
  • Zookeeper 3.4.9

Configuring Virtual Machines: Primary Domain Installation
The OS is installed in a primary domain of a Fujitsu M10 server. Please refer to the Solaris manual for OS installation and initial setting.
http://docs.oracle.com/cd/E53394_01/

The hostname of each primary domain is arbitrary.

Guest Domain Installation
The five Domains mentioned above should be created. The hostname should be the same as the domain name.

Please refer to the Solaris manual for Guest Domain installation.
http://docs.oracle.com/cd/E69554_01/html/E69557/index.html

Devices for Hadoop data should be added to each guest domain as a virtual disk separately from the system volume. Please run the following commands as "root" user. Unless otherwise noted, the following procedures are run similarly in all guest domains.

Configuring ZFS Storage Pool
Configure ZFS storage pool to store Hadoop data. The pool name is "hdpool".
# zpool create hdpool <devices for hadoop data>

Required Packages Installation
As indicated in the previous blog, Java 7 should be installed.
http://blog.global.fujitsu.com/index.php/1-using-hadoop-and-spark-on-sparc-servers-solaris-platform-configuring-hadoop-single-node-environment-part-1/

Editing /etc/hosts
Edit "/etc/hosts" file to set IP address of local host. The following example is for ctrl-node1.

::1 localhost
127.0.0.1 localhost loghost
xxx.xxx.xxx.xxx ctrl-node1 crtl-node1.local
xxx.xxx.xxx.xxx ctrl-node2
xxx.xxx.xxx.xxx jrnl-node
xxx.xxx.xxx.xxx data-node1
xxx.xxx.xxx.xxx data-node2

Adding Hadoop Users and Groups
Add the Hadoop group ID as "hadoop". The group ID of all additional Hadoop users should be "hadoop".
# groupadd -g 200 hadoop

Add the user ID for running "NameNode" and "DataNode" as "hdfs", and set the password.
# useradd -u 200 -m -g hadoop hdfs
# passwd hdfs

Add the user ID for running "ResourceManager" and "NodeManager" as "yarn", and set the password.
# useradd -u 201 -m -g hadoop yarn
# passwd yarn

Add the user ID for running "History Server" as "mapred", and set the password.
# useradd -u 202 -m -g hadoop mapred
# passwd mapred

Add the user ID for running user program as "spark", and set the password.
# useradd -u 101 -m -g hadoop spark
# passwd spark

Hadoop Installation
Download Hadoop and Zookeeper, and transfer to the node to install.
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
https://zookeeper.apache.org/releases.html#download

Install Hadoop.
# cd /opt
# <Extracting Hadoop and Zookeeper archives>
# ln -s hadoop-2.7.3 hadoop
# ln -s zookeeper-3.4.9 zookeeper

Change the owner of Hadoop and Zookeeper files to root, and change the group ID to hadoop. All permissions of files are "755".
# chown -R root:hadoop /opt/hadoop-2.7.3
# chmod -R 755 /opt/hadoop-2.7.3
# chown -R root:hadoop zookeeper-3.4.9
# chmod -R 755 /opt/zookeeper-3.4.9

Setting Up SSH
Hadoop uses SSH for connecting to each Hadoop process. So, all users should set up public and private key pair for SSH pass_phraseless authentication.

# su - hdfs
# ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
# logout
# su - yarn
# ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
# logout
# su - mapred
# ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
# logout

Copy the public key of other nodes to the local node, and them add them to the authorized_keys file.
# su - hdfs
# cd .ssh
# scp <other node>:/export/home/<user name>/.ssh/id_dsa.pub <other node>_id_dsa.pub
# cat <other node>_id_dsa.pub >> authorized_keys
# logout
# su - yarn
# cd .ssh
# scp <other node>:/export/home/<user name>/.ssh/id_dsa.pub <other node>_id_dsa.pub
# cat <other node>_id_dsa.pub >> authorized_keys
# logout
# su - mapred
# cd .ssh
# scp <other node>:/export/home/<user name>/.ssh/id_dsa.pub <other node>_id_dsa.pub
# cat <other node>_id_dsa.pub >> authorized_keys
# logout

When the following message is displayed during the procedure, enter "yes".

The authenticity of host 'XXXXXXXX (xx.xx.xx.xxx)' can't be established.
RSA key fingerprint is xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx.
Are you sure you want to continue connecting (yes/no)?

If you are prompted for a password, enter the password.
Confirm that each user can authenticate without being prompted for a password.

Setting Up Environment Variables
Environment variables for running Hadoop should be set for each user.
Set the following environment variables in "$HOME/.profile" for the "hdfs" user.

export JAVA_HOME=/usr/java
export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_PID_DIR=/hdpool/run/hdfs
export ZOOKEEPER_USER=hdfs
export ZOO_LOG_DIR=/hdpool/log/zookeeper
export ZOO_PID_DIR=/hdpool/run/zookeeper
export ZOOPIDFILE=$ZOO_PID_DIR/zookeeper_server.pid
export ZOO_DATADIR=/hdpool/data/zookeeper
export ZOOCFG=/opt/zookeeper/conf
export HADOOP_GROUP=hadoop

Set the following environment variables in "$HOME/.profile" for the "yarn" user.
export JAVA_HOME=/usr/java
export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_PID_DIR=/hdpool/run/yarn

Set the following environment variables in "$HOME/.profile" for the "mapred" user.
export JAVA_HOME=/usr/java
export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export MAPRED_PID_DIR=/hdpool/run/mapred
Set the following environment variables in "$HOME/.profile" for the "spark" user.
export JAVA_HOME=/usr/java
export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

In our next post, we will cover the second part of configuring a Hadoop cluster environment.

Disclaimer
The information contained in this blog is for general information purposes only. While we endeavour to keep the information up-to-date and correct through testing on a practical system, we make no warranties of any kind about the completeness, accuracy, reliability, suitability or availability. Any reliance you place on such information is strictly at your own risk.

The information in this blog is subject to change without notice.

Shinichiro Asai

 

About the Author:

Shinichiro Asai

Technical Sales Support, Platform Solution Unit, FUJITSU Japan

SHARE

Comments on this article

No comments yet.

Please Login to leave a comment.