Fujitsu
Not logged in » Login
X

Please login

Please log in with your Fujitsu Partner Account.

Login


» Forgot password

Register now

If you do not have a Fujitsu Partner Account, please register for a new account.

» Register now
Sep 01 2017

Practical Tips for SPARC: Using Hadoop and Spark on SPARC Servers/Solaris Platform – Configuring Hadoop Single Node Environments (Part 1)

/data/www/ctec-live/application/public/media/images/blogimages/32136_Fujitsu_M10-1_front_view_3D_open_scr.jpg

When I started my career in IT, SPARC/Solaris was the platform of choice for system development, from small-scale systems to mission critical systems. Most open source software (OSS) was also first developed on Solaris and then migrated to other platforms. In recent years the performance and reliability of the Intel Architecture (IA) platform have grown, and most OSS has been developed using IA/Linux, but SPARC/Solaris continues to evolve. Integrated development of OS and hardware create strong advantages for each platform. For example, in Fujitsu M10/Fujitsu SPARC M12, as CPU functions such as SWoC (Software on Chip) are enhanced, extensions to automatically use SWoC functions are implemented in Solaris OS. This results in higher reliability of Fujitsu M10/Fujitsu SPARC M12 hardware to ensure stable operation of the IT system. In addition, many OSS have been optimized for the latest version of Solaris, which makes stable OSS operation possible. Oftentimes, I receive inquiries about the combination of SPARC/Solaris and OSS from customers who understand the advantages of SPARC/Solaris and continue to use it. In this blog, I explore the use of SPARC/Solaris in a wider range of fields.

Big Data Processing & Solaris
Open source software such as Hadoop and Spark are often used for big data processing, but many people may not know that Hadoop and Spark not only work on various Linux distributions, but also on Solaris.
In this blog, I will explain how to configure a Spark cluster environment using Hadoop YARN to focus on the advantages of Solaris. This blog series consist of the following sections.
1. Configure Hadoop single node environment
2. Configure Hadoop cluster environment on Oracle VM for SPARC
3. Spark cluster environment using Hadoop cluster
For this purpose, I won't go into a detailed explanation of Hadoop and Spark. If you need more information about their basics, please refer to the following sources as appropriate:

Advantages of Running Hadoop on SPARC/Solaris: Configuring Hadoop Single Node Environment
Information on Hadoop states that "a reliability of individual nodes is unnecessary since the data is distributed and stored". But while it's correct that the data is duplicated and stored in multiple nodes, NameNode, which keeps the directory tree of all files in the file system, limited to two server nodes. Thus, NameNode becomes the Single Point of Failure (SPOF) of Hadoop. In a development environment, operations may not be affected if developers have to rebuild NameNode, should it fail. But in a business environment, any downtime – even a short interruption – could have a negative impact on the business. One solution to this problem is to use highly reliable hardware like Fujitsu M10 (pcitured above)/Fujitsu SPARC M12 servers for nodes that run NameNode and ResourceManager.

System Configuration

Image

The data volume for Hadoop should be configured as a dedicated ZFS storage pool separately from the normal system volume (rpool). This makes it possible to perform operations such as adding disks independently from the system volume.

You can create high-speed, high-capacity storage in Fujitsu M10/Fujitsu SPARC M12 by using flashcards instead of internal disks.

Testing configuration is as follows:

  • Server Fujitsu M10-1
  • OS Solaris11.3 SRU 10.7
  • Java JDK 1.7.0_111
  • Hadoop 2.7.3


Preparing Hadoop Installation
OS installation
The OS is installed in a primary domain of a Fujitsu M10 server. Please refer to the Solaris manual for OS installation and initial setting.
http://docs.oracle.com/cd/E53394_01/
The hostname of this system is set as "m10spark".

Next, run the following commands as "root" user.

Configuring ZFS Storage Pool
Configure ZFS storage pool to store Hadoop data. The pool name is "hdpool".
# zpool create hdpool <devices for hadoop data>

Required packages installation
By default, Java 8 is installed on Solaris 11, but since the latest version of Java on which Hadoop was tested is Java 7, JDK 7 package should be installed. Associated packages will also be installed. Please note that JDK should be installed as well as JRE for using the "jps" command to check the Hadoop processed status. Run "pkg install" with the "--accept" option for license agreement.

# pkg install --accept developer/java/jdk-7

In general, just after installing JDK 7, the default Java version remains Java 8. Switch Java versions with the following procedure.
At first, check the current version of Java.

# java -version
java version "1.8.0_102"
Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
Confirm that two types of Java are installed.
# pkg mediator -a java
MEDIATOR VER. SRC. VERSION IMPL. SRC. IMPLEMENTATION
java system 1.8 system
java system 1.7 system
Switch Java version to Java7.
# pkg set-mediator -V 1.7 java

Confirm that Java version is Java 7.
# java -version
java version "1.7.0_111"
Java(TM) SE Runtime Environment (build 1.7.0_111-b13)
Java HotSpot(TM) Server VM (build 24.111-b13, mixed mode)

Editing /etc/hosts
Edit "/etc/hosts" file to set IP address of local host.
::1 localhost
127.0.0.1 localhost loghost
xxx.xxx.xxx.xxx m10spark m10spark.local

Adding Hadoop Users and Group
Add the Hadoop group ID as "hadoop". The group ID of all additional Hadoop users should be "hadoop".
# groupadd -g 200 hadoop
Add the user ID for running "NameNode" and "DataNode" as "hdfs", and set the password.
# useradd -u 200 -m -g hadoop hdfs
# passwd hdfs
Add the user ID for running "ResourceManager" and "NodeManager" as "yarn", and set the password.
# useradd -u 201 -m -g hadoop yarn
# passwd yarn
Add the user ID for running "History Server" as "mapred", and set the password.
# useradd -u 202 -m -g hadoop mapred
# passwd mapred
Add the user ID for running user program as "spark", and set the password.
# useradd -u 101 -m -g hadoop spark
# passwd spark

Hadoop Installation
Download Hadoop, and transfer to the node to install.
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
Install Hadoop.
# cd /opt
# <Extracting Hadoop archive>
# ln -s hadoop-2.7.3 hadoop
Change the owner of Hadoop files to root, and change the group ID to hadoop. All permissions of files are "755".
# chown -R root:hadoop /opt/hadoop-2.7.3
# chmod -R 755 /opt/hadoop-2.7.3

Setting Up SSH
Hadoop uses SSH for connecting to each Hadoop process even in a single node. So, all users should set up public and private key pair for SSH passphrase-less authentication.
# su - hdfs
# ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
# logout
# su - yarn
# ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
# logout
# su - mapred
# ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
# logout
Confirm that each user can establish SSH communication to localhost.

Setting Up Environment Variables
Environment variables for running Hadoop should be set for each user.

Set the following environment variables in "$HOME/.profile" of "hdfs" user.
export JAVA_HOME=/usr/java
export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_PID_DIR=/hdpool/run/hdfs
export HADOOP_GROUP=hadoop


Set the following environment variables in "$HOME/.profile" of "yarn" user.
export JAVA_HOME=/usr/java
export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_PID_DIR=/hdpool/run/yarn


Set the following environment variables in "$HOME/.profile" of "mapred" user.
export JAVA_HOME=/usr/java
export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export MAPRED_PID_DIR=/hdpool/run/mapred


Set the following environment variables in "$HOME/.profile" of "spark" user.
export JAVA_HOME=/usr/java
export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

At this point, we have completed the initial configurations. In the second part of this blog, we will configure a Hadoop single node environment

Shinichiro Asai

 

About the Author:

Shinichiro Asai

Technical Sales Support, Platform Solution Unit, FUJITSU Japan

SHARE

Comments on this article

No comments yet.

Please Login to leave a comment.