Presentation on theme: "Software Installation Deck Big Data Workshop Saturday March 10 th, 2012."— Presentation transcript:
Software Installation Deck Big Data Workshop Saturday March 10 th, 2012
Outline Local Installation – Python – Word Count Code and Files – R and R-Studio – Hadoop Local Installation Cloud Access – Amazon Web Services Account – Cloud-Based Software Demos – R and R-Studio in the Cloud – Cloudera Virtual Manager – Virtualization Software – R and Hadoop: rmr
Python Installation Mac/Linux comes with Python (should be able to run). Windows use the following website to download and install: –
Python Wikipedia Word Count Files WhatURL Python Word Count Scripthttps://s3.amazonaws.com/com.hadoopinboston.scripts/seq.py Very Small File: 10 lines, 251 words:https://s3.amazonaws.com/com.hadoopinboston.inputdata/input-lines Small: lines, 1.65M words (10MB)https://s3.amazonaws.com/com.hadoopinboston.inputdata/input.txt Large: lines, 12M words (76 MB)https://s3.amazonaws.com/com.hadoopinboston.inputdata/input2.txt Very Large: 85 million lines, (8 GB)https://s3.amazonaws.com/com.hadoopinboston.inputdata/all.txt Mapper.py – mapper in pythonhttps://s3.amazonaws.com/com.hadoopinboston.scripts/mapper.py Reducer.py – reducer in pythonhttps://s3.amazonaws.com/com.hadoopinboston.scripts/reducer-all.py Mapper in Rhttps://s3.amazonaws.com/com.hadoopinboston.scripts/mapper.R Reducer in Rhttps://s3.amazonaws.com/com.hadoopinboston.scripts/reducer.R The four files of different sizes were created by Vipin to test out the time to run each one locally.
LOCAL INSTALLATION: Rhttp://lib.stat.cmu.edu/R/CRAN/http://lib.stat.cmu.edu/R/CRAN/ R-Studiohttp://rstudio.org/http://rstudio.org/ R and R-Studio Local Installation
Hadoop Installation Mac/Linux Macbook – – Install ports package to get Hadoop (www.macports.org).www.macports.org sudo port install hadoop (DONE!) Linux – – Use yum/apt-get package to get hadoop. sudo yum install hadoop (your mirror should have hadoop binaries) Please note that the local installation is for test and debug, and that production jobs will be ran on the cloud.
Hadoop Installation Windows Microsoft is working with Hortonworks on contributing to the Apache Hadoop project for Windows. Microsoft is working on a Community Technology Preview for Hadoop on Windows Azure (http://hadooponazure.com) and the release for on-premises installation is forthcoming. Those interested in running Hadoop on their own Windows hardware can follow technologies/business-intelligence/big-data-solution.aspx to sign up for the preview when its available.http://hadooponazure.com technologies/business-intelligence/big-data-solution.aspx TODAY, it is possible to install Hadoop on Windows, but those distributions require Cygwin, whereas the upcoming release will not. There are some instructions for Windows (see for instance apache.html) that people can try. apache.html Please note that the local installation is for test and debug, and that production jobs will be ran on the cloud.
The first example will be through Amazon's Elastic Map/Reduce. Similar in nature to: Cloud Account
R-Studio in the Cloud: R or R-Studio in the Cloud: R and R-Studio Cloud Access (No VM)
Cloudera Hadoop Package https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hado op+Demo+VM https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hado op+Demo+VM There are 3 options that relate to different Virtualization Software one of which also need to be installed (next slide) SSH Software (Windows) ad.html ad.html Virtual Manager with Hadoop Please note that these are 64-bit versions, and that the Virtualization Software will require a laptop that supports virtualization. If you are unsure, one way this can be checked by looking at your BIOS and seeing if Virtualization is Enabled. Most chips support virtualization; however a handful of MFG installed BIOS do not enable virtualization.
VMware Player: Jeffrey Uses This One in his Session KVM: VirtualBox: Jim uses this one. – https://www.virtualbox.org/ https://www.virtualbox.org/ Virtual Manager with Hadoop Jeffrey will be walking through this process.
https://github.com/RevolutionAnalytics/RHadoop/wiki/rmr Session 6: R and Hadoop: rmr Jeffrey will be walking through this process. We realize the VM and R and Hadoop parts are very detailed, and that there may be questions on other workshop parts. Following the last session we will try to have a post-workshop help session.