Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reproducible Environment for Scientific Applications (Lab session) Tak-Lon (Stephen) Wu.

Similar presentations


Presentation on theme: "Reproducible Environment for Scientific Applications (Lab session) Tak-Lon (Stephen) Wu."— Presentation transcript:

1 Reproducible Environment for Scientific Applications (Lab session) Tak-Lon (Stephen) Wu

2 Overview Introduction VirtualBox Prepackaged Image Example 1: Sandbox Hadoop WordCount Example 2: Cloud Twister WordCount Exercises – Sandbox Hadoop/Twister Kmeans – Cloud Hadoop/Twister Kmeans

3 Motivations Background knowledge – Environment setting – Different cloud infrastructure tools – Software dependencies – Long learning path Automatic these complicated steps? Solution: Salsa Dynamic Provisioning Infrastructure (SalsaDPI). – batch-like program

4 What is SalsaDPI? (Sandbox) OS chef-solo SalsaDPI Jar S/W Applications 1. Read a Conf. file and execute software run-list 2. Install software 3. Run apps User Configuration

5 OS Chef Apps S/W VM OS Chef Apps S/W VM OS Chef Apps S/W VM OS Chef Client SalsaDPI Jar Chef Server 1.Bootstrap VMs with a conf. file 4. VM(s) Information 2. Retrieve conf. Info. and request Authentication and Authorization 3. Authenticated and Authorized to execute software run-list 5. Submit application commands 6. Obtain Result What is SalsaDPI? (Cloud) * Chef architecture http://wiki.opscode.com/display/chef/Architecture+Introductionhttp://wiki.opscode.com/display/chef/Architecture+Introduction User Conf.

6 What is SalsaDPI? (Cont.) Chef features – On-demand install software when starting VMs – Monitor software installation progress – Easy to use SalsaDPI features – Provide configurable interface – Automate Hadoop/Twister/other binary execution *Chef Official website: http://www.opscode.com/chef/http://www.opscode.com/chef/

7 Hands-on Session

8 Online Tutorial page http://salsahpc.indiana.edu/ScienceCloud/rep roduce-intro.html http://salsahpc.indiana.edu/ScienceCloud/rep roduce-intro.html

9 Prerequisites Install VirtualBox on your laptop, download and import a prepackaged image Install VirtualBox on your laptop, download and import a prepackaged image Setup FutureGrid Eucalyptus environment Make sure you setup the shared folder between host and guest machine correctly # login to FutureGrid India Headnode i136 $ ssh -i ~/fg_private_key.pem johnny@india.futuregrid.orgjohnny@india.futuregrid.org ~/fg_private_key.pem have to be replaced to your own private key file name

10 About Pre-packaged Image It has the following software installed and configured under /root/software/: – Java JDK – Chef – Hadoop – Twister and ActiveMQ – Hbase – Pig – salsaDPI (/root/salsaDPI/)

11 Important Notes If you have activemq.log and kahadb in directory /root/software/apache-activemq-5.4.2/, please remove them. Otherwise, it will cause errors when running sandbox Twister applications. $ cd /root/software/apache-activemq-5.4.2/ $ ls activemq.log kahadb $ rm -rf activemq.log kahadb

12 Examples Example 1: Sandbox Hadoop WordCount Example 2: Cloud Twister WordCount Goals – Learn and modify SalsaDPI json configuration file – Execute SalsaDPI java executable with passing the configuration file – http://salsahpc.indiana.edu/ScienceCloud/handso n1_chef_sandbox.html http://salsahpc.indiana.edu/ScienceCloud/handso n1_chef_sandbox.html * Json metadata format example : http://json.org/example.html

13 Step 1. Open the Conf. File Locate and open the configuration file. – /root/salsaDPI/sandbox/templates/sandbox _hadoopTemplate.json – /root/salsaDPI/sandbox/templates/sandbox _twisterTemplate.json

14 Step 2. Modify Conf. File 'applicationParameters': { 'applicationType':'Hadoop', 'localPathOfProgramBinary':'/root/salsaDPI/apps/hadoopWordCount.jar', 'localPathOfProgramInput':'/root/salsaDPI/input/hadoopWordCountInput.txt', 'localPathOfBinaryDependency':'', 'programExecuteLocation':'', 'programArgs':'bin/hadoop jar #_JAR_# #_HDFS_INPUTDIR_# #_HDFS_OUTPUTDIR_#' }

15 Detail description could be see here: – http://salsahpc.indiana.edu/ScienceCloud/handso n1_chef_sandbox.html http://salsahpc.indiana.edu/ScienceCloud/handso n1_chef_sandbox.html applicationParameters A json object that contains user-defined application's information applicationTypeType of user-defined application, options: Hadoop or Twister localPathOfProgramBinary Full path of user-defined Hadoop or Twister compiled jar executable on the working machine localPathOfProgramInput Full path of user-defined input file on the working machine, normally, a plaintext or a *.tar.gz file localPathOfBinaryDependency Full path of user-defined program dependency file on the working machine, such as Twister Kmeans initial cluster file programExecuteLocation Path to Twister program execution script refer to Twister package, such as samples/wordcount/bin or samples/kmeans/bin twisterInputFilesPreFix Twister Input files prefix. Refer to the provided package, for Twister WordCount, the file prefixed is wc_data, for Twister Kmeans is km_data. programArgsUser-defined program execution command

16 Sandbox Hadoop WordCount { // Useful general variables of programArgs for applicationParameters object // #_JAR_#, #_JOB_ID_#, // #_HDFS_INPUTDIR_#, #_HDFS_OUTPUTDIR_#, // #_TWISTER_INPUTDIR_#, #_TWISTER_OUTPUTDIR_#, #_TWISTER_PARTITION_FILE_#, #_BINARY_DEPENDENCY_# // 'mode':'sandbox', | 'mode':'cloud', 'mode':'sandbox', // chef-solo related parameters 'chef':{'chefSoloRecipeUrls':'http://129.79.49.248/chef-solo.tar.gz', 'chefSoloConfFilePath':'/root/salsaDPI/solo.rb'}, // ssh passwordless related parameters 'ssh':{'SSHLoginUsername':'root', 'SSHPrivateKeyPath':'/root/.ssh/id_rsa' }, // runtime softwares such as recipe[hadoopSandbox] or recipe[twisterSandbox] 'softwareRecipes':['recipe[hadoopSandbox]'], // please don't change this line // user-defined application parameters 'applicationParameters':{'applicationType':'Hadoop', 'localPathOfProgramBinary':'/root/salsaDPI/apps/hadoopWordCount.jar', 'localPathOfProgramInput':'/root/salsaDPI/input/hadoopWordCountInput.txt', 'localPathOfBinaryDependency':'', 'programExecuteLocation':'', 'programArgs':'bin/hadoop jar #_JAR_# #_HDFS_INPUTDIR_# #_HDFS_OUTPUTDIR_#'} }

17 Step 3. Execute SalsaDPI with Conf. Execute SalsaDPI with command: $ cd ~/salsaDPI $ java -cp salsaDPI.jar cgl.salsa.salsadpi.Driver The output will be stored at /salsaDPI_output/ /output/*.

18 Demo Demo video – Video hands-on 1 Sandbox Hadoop WordCount Video hands-on 1 – YouTube link (1080p) YouTube link (1080p)

19 Examples Example 1: Sandbox Hadoop WordCount Example 2: Cloud Twister WordCount Goals – Make sure FutureGrid Eucalyptus setup and download required files correctly – Learn and modify SalsaDPI json configuration file – Execute SalsaDPI java executable with passing the configuration file – http://salsahpc.indiana.edu/ScienceCloud/handson2_chef _cloud.html http://salsahpc.indiana.edu/ScienceCloud/handson2_chef _cloud.html – For live testing, please make sure your name is herehere * Json metadata format example : http://json.org/example.html

20 Step 1. Open the Conf. File Locate and open the configuration file. – /root/salsaDPI/cloud/templates/cloud_had oopTemplate.json – /root/salsaDPI/cloud/templates/cloud_twi sterTemplate.json

21 Step 2. Modify Conf. File 'eucaInfo':{ 'eucarcFilePath':'#_FullPath_to_eucarc_File_#', 'eucaImageEmi':'emi-A8F63C29', 'eucaSSHPublicKey':'#_Euca_Keypair_PublicKeyName_#', 'eucaVmType':'m1.small', 'amountOfInstances':2 },

22 Step 2. Modify Conf. File (Cont.) 'ssh': { 'SSHLoginUsername':'root', 'SSHPrivateKeyPath':'/root/#_yourPrivatekey_FileName_#' },

23 Step 2. Modify Conf. File (Cont.) 'applicationParameters': { 'applicationType':'Twister', 'localPathOfProgramBinary':'/root/salsaDPI/apps/Twister-WordCount- 0.9.jar', 'localPathOfProgramInput':'/root/salsaDPI/input/twisterWordCountInp ut.tar.gz', 'localPathOfBinaryDependency':'', 'programExecuteLocation':'samples/wordcount/bin', 'twisterInputFilesPreFix':'wc_data', 'programArgs':'./run_wc.sh #_TWISTER_PARTITION_FILE_# #_TWISTER_OUTPUTDIR_#/wc.out 4 1' }

24 Detail description could be see here: – http://salsahpc.indiana.edu/ScienceCloud/handso n2_chef_cloud.html http://salsahpc.indiana.edu/ScienceCloud/handso n2_chef_cloud.html eucaInfo A json object that contains cloud mode Eucalyptus related information, 'eucarcFilePath', 'eucaImageEmi', 'eucaSSHPublicKey', 'eucaVmType', and 'amountOfInstances' eucarcFilePathFull path to downloaed eucarc file eucaImageEmiEucalyptus VM image registered on FutureGrid, e.g. emi-52C93AC2 eucaSSHPublicKey Eucalyptus public key name (which you setup during the FutureGrid Eucalyptus setting) eucaVmTypeEucalypus VM type, e.g. c1.medium amountOfInstancesAmount of instances for this job, e.g. 2 ssh A json object that contains ssh information, SSHLoginUsername and SSHPrivateKeyPath SSHLoginUsernameSsh login username, for cloud mode, it must be root. SSHPrivateKeyPathFull path to ssh private key which uses to login to VM.

25 Step 3. Execute SalsaDPI with Conf. Execute SalsaDPI with command: $ cd ~/salsaDPI $ java -cp salsaDPI.jar cgl.salsa.salsadpi.Driver The output will be stored at /salsaDPI_output/ /output/*.

26 Cloud Twister WordCount { // Useful general variables of programArgs for applicationParameters object // #_JAR_#, #_JOB_ID_#, // #_HDFS_INPUTDIR_#, #_HDFS_OUTPUTDIR_#, // #_TWISTER_INPUTDIR_#, #_TWISTER_OUTPUTDIR_#, #_TWISTER_PARTITION_FILE_#, #_BINARY_DEPENDENCY_# // 'mode':'sandbox', | 'mode':'cloud', 'mode':'cloud', // euca cloud parameters 'eucaInfo':{'eucarcFilePath':'/root/eucarc', 'eucaImageEmi':'emi-A8F63C29', 'eucaSSHPublicKey':'stephen', 'eucaVmType':'m1.small', 'amountOfInstances':2}, // ssh passwordless related parameters 'ssh':{'SSHLoginUsername':'root', 'SSHPrivateKeyPath':'/root/stephen.pem' }, // runtime softwares such as recipe[hadoopSandbox], recipe[twisterSandbox], // recipe[hadoopCloud], and recipe[twisterCloud] 'softwareRecipes':['recipe[twisterCloud]'], // user-defined application parameters 'applicationParameters':{ 'applicationType':'Twister', 'localPathOfProgramBinary':'/root/salsaDPI/apps/Twister-WordCount-0.9.jar', 'localPathOfProgramInput':'/root/salsaDPI/input/twisterWordCountInput.tar.gz', 'localPathOfBinaryDependency':'', 'programExecuteLocation':'samples/wordcount/bin', 'twisterInputFilesPreFix':'wc_data', 'programArgs':'./run_wc.sh #_TWISTER_PARTITION_FILE_# #_TWISTER_OUTPUTDIR_#/wc.out 4 1'} }

27 Demo Demo video – Video Hands-on 2 Cloud Twister WordCount Video Hands-on 2 – YouTube link (1080P) YouTube link (1080P)

28 'applicationParameters':{ 'applicationType':'Twister', 'localPathOfProgramBinary':'#_FullPath_To_TwisterKmeans_JAR_#', 'localPathOfProgramInput':'#_FullPath_To_TwisterKmeans_Inputs_GZ_File_#', 'localPathOfBinaryDependency':'#_FullPath_To_TwisterKmeans_InitClusterFile_#', 'programExecuteLocation':'samples/kmeans/bin', 'twisterInputFilesPreFix':'km_data', 'programArgs':'./run_kmeans.sh #_BINARY_DEPENDENCY_# 80 #_TWISTER_PARTITION_FILE_# > #_TWISTER_OUTPUTDIR_#/#_JOB_ID_#.txt' } Twister Kmeans Modify Sandbox/Cloud conf. file for Twister Kmeans. Below are hints for Twister Kmeans conf. file.

29 Hadoop Kmeans 'applicationParameters': { 'applicationType':'Hadoop', 'localPathOfProgramBinary':'#_Path_HadoopKmeans_Jar_#', 'localPathOfProgramInput':'', 'localPathOfProgramDB':'', 'localPathOfBinaryDependency':'', 'programExecuteLocation':'', 'programArgs':'bin/hadoop jar #_JAR_# 500 10 8 3 #_JOB_ID_# > ~/#_JOB_ID_#/#_JOB_ID_#.txt' } Modify a Sandbox/Cloud conf. file for Hadoop Kmeans. Below snapshot provides hints for Kmeans’ programArgs.

30 Thank you

31 Cloud Hadoop WordCount { // mode = 'cloud' 'mode':'cloud', // euca cloud parameters 'eucaInfo':{'eucarcFilePath':'/root/eucarc', 'eucaImageEmi':'emi-A8F63C29', 'eucaSSHPublicKey':'stephen', // replace stephen to your pub key name 'eucaVmType':'m1.small', 'amountOfInstances':2}, 'ssh':{'SSHLoginUsername':'root', 'SSHPrivateKeyPath':'/root/stephen.pem'}, // replace stephen.pem to your private key 'softwareRecipes':['recipe[hadoopCloud]'], 'applicationParameters':{ 'applicationType':'Hadoop', 'localPathOfProgramBinary':'/root/salsaDPI/apps/hadoopWordCount.jar', 'localPathOfProgramInput':'/root/salsaDPI/input/hadoopWordCountInput.txt', 'localPathOfProgramDB':'', 'programExecuteLocation':'', 'programArgs':'bin/hadoop jar #_JAR_# #_HDFS_INPUTDIR_# #_HDFS_OUTPUTDIR_#'} }

32 Demo Cloud Hadoop WordCount http://salsahpc.indiana.edu/ScienceCloud/video /salsaDPI/cloudHadoopWordCount.wmv

33 Sandbox Twister WordCount { // mode = 'sandbox' 'mode':'sandbox', // chef solo parameters 'chef':{'chefSoloRecipeUrls':'http://129.79.49.248/chef-solo.tar.gz', 'chefSoloConfFilePath':'/root/solo.rb'}, 'ssh':{'SSHLoginUsername':'root', 'SSHPrivateKeyPath':'/root/.ssh/id_rsa'}, 'softwareRecipes':['recipe[twisterSandbox]'], 'applicationParameters':{ 'applicationType':'Twister', 'localPathOfProgramBinary':'/root/salsaDPI/apps/Twister-WordCount- 0.9.jar', 'localPathOfProgramInput':'/root/salsaDPI/input/twisterWordCountInput.tar.gz', 'localPathOfBinaryDependency':'', 'localPathOfProgramDB':'', 'programExecuteLocation':'samples/wordcount/bin', 'twisterInputFilesPreFix':'wc_data', 'programArgs':'./run_wc.sh #_TWISTER_PARTITION_FILE_# #_TWISTER_OUTPUTDIR_#/wc.out 4 1'} }

34 Demo Sandbox Twister WordCount http://salsahpc.indiana.edu/ScienceCloud/video /salsaDPI/sandBoxTwisterWordCount.wmv


Download ppt "Reproducible Environment for Scientific Applications (Lab session) Tak-Lon (Stephen) Wu."

Similar presentations


Ads by Google