Fujitsu
Not logged in » Login
X

Please login

Please log in with your Fujitsu Partner Account.

Login


» Forgot password

Register now

If you do not have a Fujitsu Partner Account, please register for a new account.

» Register now
Sep 22 2017

Practical Tips for SPARC: Using Hadoop and Spark on SPARC Servers/Solaris Platform – Configuring Hadoop Cluster Environments (Part 2)

/data/www/ctec-live/application/public/media/images/blogimages/32136_Fujitsu_M10-1_front_view_3D_open_scr.jpg

In our previous blogs, we explained how to configure a Hadoop Single Node environment here and here. In today's edition, we'll explain the introductory steps that are necessary for configuring a Hadoop Cluster Node environment for Oracle VMs on SPARC. This is the second installment in a three-piece series.

Creating Directories for Hadoop
Create directories for storing Hadoop data. Each directory must be created as a ZFS file system for production operation.

Create a directory for log files for "hdfs" users.
# zfs create -p hdpool/log/hdfs
# chown hdfs:hadoop /hdpool/log/hdfs

Create a directory for log files for "yarn" users.
# zfs create -p hdpool/log/yarn
# chown yarn:hadoop /hdpool/log/yarn

Create a directory for log files for "mapred" users.
# zfs create -p hdpool/log/mapred
# chown mapred:hadoop /hdpool/log/mapred

Create a directory for the HDFS metadata.
# zfs create -p hdpool/data/1/dfs/nn
# chmod 700 /hdpool/data/1/dfs/nn
# chown -R hdfs:hadoop /hdpool/data/1/dfs/nn

Create a directory for the HDFS data blocks.
# zfs create -p hdpool/data/1/dfs/dn
# chown -R hdfs:hadoop /hdpool/data/1/dfs/dn

Create directories for JournalNode.
# zfs create -p hdpool/data/1/dfs/jn
# chown -R hdfs:hadoop /hdpool/data/1/dfs/jn

Create directories for "yarn" users.
# zfs create -p hdpool/data/1/yarn/local
# zfs create -p hdpool/data/1/yarn/logs
# chown -R yarn:hadoop /hdpool/data/1/yarn/local
# chown -R yarn:hadoop /hdpool/data/1/yarn/logs

Create runtime directories for "yarn" users.
# zfs create -p hdpool/run/yarn
# chown yarn:hadoop /hdpool/run/yarn
# zfs create -p hdpool/run/hdfs
# chown hdfs:hadoop /hdpool/run/hdfs
# zfs create -p hdpool/run/mapred
# chown mapred:hadoop /hdpool/run/mapred

Create a directory for temporary data.
# zfs create -p hdpool/tmp

Create directories for Zookeeper.
# zfs create -p hdpool/run/zookeeper
# chown -R hdfs:hadoop /hdpool/run/zookeeper
# zfs create -p hdpool/data/zookeeper
# chown -R hdfs:hadoop /hdpool/data/zookeeper
# zfs create -p hdpool/log/zookeeper
# chown -R hdfs:hadoop /hdpool/log/zookeeper

Confirm that all the directories have been created, as follows:
# zfs list -r hdpool

When the result is as shown below, they have been created successfully.
NAME USED AVAIL REFER MOUNTPOINT
hdpool 9.57M 1.94G 352K /hdpool
hdpool/data 2.96M 1.94G 320K /hdpool/data
hdpool/data/1 2.36M 1.94G 320K /hdpool/data/1
hdpool/data/1/dfs 1.17M 1.94G 336K /hdpool/data/1/dfs
hdpool/data/1/dfs/dn 288K 1.94G 288K /hdpool/data/1/dfs/dn
hdpool/data/1/dfs/jn 288K 1.94G 288K /hdpool/data/1/dfs/jn
hdpool/data/1/dfs/nn 288K 1.94G 288K /hdpool/data/1/dfs/nn
hdpool/data/1/yarn 896K 1.94G 320K /hdpool/data/1/yarn
hdpool/data/1/yarn/local 288K 1.94G 288K /hdpool/data/1/yarn/local
hdpool/data/1/yarn/logs 288K 1.94G 288K /hdpool/data/1/yarn/logs
hdpool/data/zookeeper 296K 1.94G 296K /hdpool/data/zookeeper
hdpool/log 1.47M 1.94G 352K /hdpool/log
hdpool/log/hdfs 288K 1.94G 288K /hdpool/log/hdfs
hdpool/log/mapred 288K 1.94G 288K /hdpool/log/mapred
hdpool/log/yarn 288K 1.94G 288K /hdpool/log/yarn
hdpool/log/zookeeper 288K 1.94G 288K /hdpool/log/zookeeper
hdpool/run 1.47M 1.94G 352K /hdpool/run
hdpool/run/hdfs 288K 1.94G 288K /hdpool/run/hdfs
hdpool/run/mapred 288K 1.94G 288K /hdpool/run/mapred
hdpool/run/yarn 288K 1.94G 288K /hdpool/run/yarn
hdpool/run/zookeeper 288K 1.94G 288K /hdpool/run/zookeeper
hdpool/tmp 288K 1.94G 288K /hdpool/tmp

Log files should be compressed using ZFS function as they were for the single node case.
# zfs set compression=lz4 hdpool/log

Run the following command on "ctrl-node1".
# su - hdfs -c 'echo "1" > /hdpool/data/zookeeper/myid'

Run the following command on "ctrl-node2".
# su - hdfs -c 'echo "2" > /hdpool/data/zookeeper/myid'

Run the following command on "jrnl-node".
# su - hdfs -c 'echo "3" > /hdpool/data/zookeeper/myid'

Setting Hadoop Configuration Files
Change the current directory to the directory of Hadoop configuration files (/opt/hadoop/etc/hadoop).

Add the following to "hadoop-env.sh":
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
export JAVA_HOME=/usr/java
export HADOOP_LOG_DIR=/hdpool/log/hdfs
Add the following to "yarn-env.sh".
export JAVA_HOME=/usr/java
export YARN_LOG_DIR=/hdpool/log/yarn
export HADOOP_HOME=/opt/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

Add the following to "mapred-env.sh":
export JAVA_HOME=/usr/java
export HADOOP_MAPRED_LOG_DIR=/hdpool/log/mapred
export HADOOP_MAPRED_IDENT_STRING=mapred

Edit "slaves" and add the write hostnames managed by "DataNode". In this configuration, the hostnames should be "data-node1" and "data-node2".
data-node1
data-node2

Edit "core-site.xml" as follows:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/hdpool/data/1/dfs/jn</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>ctrl-node1:2181,ctrl-node2:2181,jrnl-node:2181</value>
</property>
</configuration>

Edit "mapred-site.xml" as follows:
<?xml version="1.0"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>ctrl-node1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>ctrl-node1:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
</configuration>

Edit "yarn-site.xml" as follows:
<?xml version="1.0"?>
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ctrl-node1</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///hdpool/data/1/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///hdpool/data/1/yarn/logs</value>
</property>
<property>
<name>yarn.log.aggregation.enable</name>
<value>true</value>
</property>
<property>
<description>Where to aggregate logs</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>ctrl-node1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>ctrl-node2</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>ctrl-node1:2181,ctrl-node2:2181,jrnl-node:2181</value>
</property>
</configuration>

Edit "hdfs-site.xml" as follows. In this configuration, the parameter "dfs.replication" should be "2", since the number of "DataNode" is 2.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hdpool/data/1/dfs/dn</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hdpool/data/1/dfs/nn</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions.supergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>ctrl-node1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>ctrl-node2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>ctrl-node1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>ctrl-node2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal:// ctrl-node1:8485;ctrl-node2:8485;jrnl-node:8485/mycluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/export/home/hdfs/.ssh/id_dsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>

Setting Zookeeper Configuration Files
Change the current directory to the directory of Zookeeper configuration files (/opt/zookeeper/conf) and edit "zoo.cfg" as follows:

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/hdpool/data/zookeeper
clientPort=2181

server.1=ctrl-node1:2888:3888
server.2=ctrl-node2:2888:3888
server.3=jrnl-node:2888:3888

At this point, we have completed the Hadoop and Zoopkeeper configurations. In part 3 of this blog, we will confirm that the system runs correctly.

Shinichiro Asai

 

About the Author:

Shinichiro Asai

Technical Sales Support, Platform Solution Unit, FUJITSU Japan

SHARE

Comments on this article

No comments yet.

Please Login to leave a comment.