Fujitsu
Not logged in » Login
X

Please login

Please log in with your Fujitsu Partner Account.

Login


» Forgot password

Register now

If you do not have a Fujitsu Partner Account, please register for a new account.

» Register now
Jan 09 2018

Practical Tips for SPARC: Using Hadoop and Spark on SPARC Servers/Solaris Platform – Configuring Apache Spark™ (Part 3)

/data/www/ctec-live/application/public/media/images/blogimages/32136_Fujitsu_M10-1_front_view_3D_open_scr.jpg

In our previous 3-part blog, we explained how to configure a Hadoop Cluster Node environment. The following entry illustrates how to properly set up Apache Spark™ as part of the intended environment. Unfortunately, presenting our case in a single piece would likely provoke many TL;DR reactions, so we opted for another mini-series. Below is part 3, which describes the required software installation; part 1 is available here; and part  2 resides here.

Configuring Spark: Spark Installation
Download Spark, and transfer to the node to install. Choose "Pre-built for Hadoop 2.7 and later" at the "Choose a package type:" tab on the download page.

http://spark.apache.org/downloads.html

Install Spark.

# cd /opt
# <extract archive of Spark>
# ln -s spark-2.1.0-bin-hadoop2.7 spark
Change the owner of Spark files to root, and change the group ID to hadoop. All permissions of files are "755".
# chown -R root:hadoop /opt/spark-2.1.0-bin-hadoop2.7
# chmod -R 755 /opt/spark-2.1.0-bin-hadoop2.7

Configuring Spark: Setting Up Environment Variables
Set the following environment variables in "$HOME/.profile" for the "spark" user.

export LD_LIBRARY_PATH_64=/usr/local/lib/sparcv9:/opt/R/lib/R/lib
export JAVA_HOME=/usr/java
export PATH=$PATH:/opt/hadoop/bin:/opt/hadoop/sbin:/opt/spark/bin:/opt/R/bin
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

Configuring Spark: Configuring Hadoop Environment
Use the previous Hadoop configuration.

https://techcommunity.ts.fujitsu.com/en/servers/d/uid-04e115c9-e4df-da95-223c-ebe2e494e59e.html

Before configuring Spark, confirm that Hadoop works well by referring to the previous blog.

https://techcommunity.ts.fujitsu.com/en/servers/d/uid-95f1befd-b09f-bd86-c9dd-11caab95b19c.html

Configuring Spark; Configuring Spark Cluster
In the cluster mode, Spark temporarily copies the necessary jar files to Hadoop file system to make access possible from the distributed machines. Every time Spark runs in cluster mode, this procedure is executed, and it will take time to complete. To reduce the processing time, copy the necessary jar files to Hadoop file system.

In this example, the jar files are copied to the directory "/apl/spark-2.1.0/jars".

# su - spark
$ cd /opt/spark/jars
$ hadoop fs -put *.jar /apl/spark-2.1.0/jars
$ exit

Move to the Spark configuration file directory "/opt/spark/conf".

Edit "spark-defaults.conf" as below.

spark.yarn.jars hdfs://mycluster/apl/spark-2.1.0/jars/*.jar

Testing Spark
Run the sample application attached to the Spark archive. Below is the example running the sample application within cluster mode.

When it ends successfully, all the procedures are completed.

spark@apl-node:cd /opt/spark
spark@apl-node:~$ run-example --master yarn --deploy-mode cluster SparkPi
17/02/27 15:58:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/27 15:58:30 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
17/02/27 15:58:30 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
17/02/27 15:58:30 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
17/02/27 15:58:30 INFO yarn.Client: Setting up container launch context for our AM
17/02/27 15:58:30 INFO yarn.Client: Setting up the launch environment for our AM container
17/02/27 15:58:30 INFO yarn.Client: Preparing resources for our AM container
17/02/27 15:58:33 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs://mycluster/apl/spark-2.1.0/jars/JavaEWAH-0.3.2.jar

(Omits)

17/02/27 15:58:58 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.20.98.138
ApplicationMaster RPC port: 0
queue: default
start time: 1488178734503
final status: SUCCEEDED
tracking URL: http://ctrl-node1:8088/proxy/application_1484800892214_0016/
user: spark
17/02/27 15:58:58 INFO util.ShutdownHookManager: Shutdown hook called
17/02/27 15:58:58 INFO util.ShutdownHookManager: Deleting directory /var/tmp/spark-3b401dfa-a2c9-4c1c-86f0-dafb5e2404b5

Congratulations, Spark cluster configuration is completed.

Appendix1: Manual Patch for Python Interface
In this version of Spark, an error will occur starting a Python program with the spark-submit command.

Below is an example of the Python error.

$ spark-submit /opt/spark/examples/src/main/python/pi.py
Traceback (most recent call last):
File "/opt/spark/examples/src/main/python/pi.py", line 32, in <module>
.appName("PythonPi")\
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 169, in getOrCreate
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 307, in getOrCreate
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 115, in __init__
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 256, in _ensure_initialized
File "/opt/spark/python/lib/pyspark.zip/pyspark/java_gateway.py", line 117, in launch_gateway
File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 174, in java_import
File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 881, in send_command
File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 829, in _get_connection
File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 834, in _create_connection
File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 943, in __init__
socket.gaierror: [Errno 9] service name not available for the specified socket type

Below is an interim manual patch for the error.

Move to the directory "/opt/spark/python/lib", and backup the file "py4j-0.10.4-src.zip".

# cd /opt/spark/python/lib
# cp py4j-0.10.4-src.zip old_py4j-0.10.4-src.zip

Extract "py4j-0.10.4-src.zip" in the directory "/tmp".

# cd /tmp
# unzip /opt/spark/python/lib/py4j-0.10.4-src.zip

Modify 943th and 2004th line of "py4j/java_gateway.java" as follows.
af_type = socket.getaddrinfo(self.address, self.port)[0][0]

af_type = socket.getaddrinfo(self.address, None)[0][0]

Apply the modified source in the system as follows.

# zip -u -r /opt/spark/python/lib/py4j-0.10.4-src.zip py4j

Run the same program as before. When it ends successfully, the procedure is completed.

$ spark-submit /opt/spark/examples/src/main/python/pi.py
17/03/15 11:00:39 INFO spark.SparkContext: Running Spark version 2.1.0
17/03/15 11:00:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/03/15 11:00:40 INFO spark.SecurityManager: Changing view acls to: spark
17/03/15 11:00:40 INFO spark.SecurityManager: Changing modify acls to: spark
17/03/15 11:00:40 INFO spark.SecurityManager: Changing view acls groups to:
17/03/15 11:00:40 INFO spark.SecurityManager: Changing modify acls groups to:

(Omits)

17/03/15 11:00:46 INFO scheduler.DAGScheduler: Job 0 finished: reduce at /opt/spark/examples/src/main/python/pi.py:43, took 2.932051 s
Pi is roughly 3.143480
17/03/15 11:00:46 INFO server.ServerConnector: Stopped ServerConnector@61399f47{HTTP/1.1}{0.0.0.0:4040}
17/03/15 11:00:46 INFO handler.ContextHandler: Stopped

(Omits)

Appendix 2: Compiling R to Optimize for Fujitsu SPARC M12
This is a tip for compiling R to optimize for Fujitsu SPARC M12.

Required software is as follows:

  • OS Solaris11.3 SRU 20 or later
  • Oracle Developer Studio 12.6

Set environment variables as follows. Set "sparc64xii" to "-xtarget". For more details about compiler options, refer to the Oracle Developer Studio manual.

export CFLAGS="-m64 -xO5 -fma=fused -xlibmieee -xlibmil -xmemalign=8s -xtarget=sparc64xii -xvector=simd,lib -xpagesize=4M"
export CC="cc -std=c99"
export FFLAGS="-m64 -xO5 -fma=fused -xlibmil -dalign -xtarget=sparc64xii -xvector=simd,lib -xpagesize=4M"
export F77="f95"
export CXXFLAGS="-m64 -xO5 -fma=fused -xlibmil -xmemalign=8s -xtarget=sparc64xii -xvector=simd,lib -xpagesize=4M"
export CXX="CC -library=stlport4"
export FC="f95"
export FCFLAGS=$FFLAGS
export FCLIBS=""
export LDFLAGS="-m64 -xO5 -xmemalign=8s -xvector=simd,lib -xpagesize=4M -L/usr/local/lib"

Compile procedure is the same as with Fujitsu M10.

In this blog series, I have have explained how to configure Hadoop and Spark on SPARC Servers/Solaris. I welcome your feedback, tips and comments, since I would like to continue delivering various application solutions on Solaris in the future.

Disclaimer
The information contained in this blog is for general information purposes only. While we endeavor to keep the information up-to-date and correct through testing on a practical system, we make no warranties of any kind about the completeness, accuracy, reliability, suitability or availability. Any reliance you place on such information is strictly at your own risk. The information in this blog is subject to change without notice.

Shinichiro Asai

 

About the Author:

Shinichiro Asai

Technical Sales Support, Platform Solution Unit, FUJITSU Japan

SHARE

Comments on this article

No comments yet.

Please Login to leave a comment.