Data Science Background and Course Software setup Week 1.

Slides:



Advertisements
Similar presentations
ESafe Reporter V3.0 eSafe Learning and Certification Program February 2007.
Advertisements

ComfortLink™ II Control. ComfortLink™ II Smart Control This is not just a thermostat. It’s an energy command center. Trane ComfortLink™ II is an easy-to-use,
MCT260-Operating Systems I Operating Systems I Introduction to Operating Systems.
Understanding Operating Systems 1 Overview Introduction Operating System Components Machine Hardware Types of Operating Systems Brief History of Operating.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Lesson 4-Installing Network Operating Systems. Overview Installing and configuring Novell NetWare 6.0. Installing and configuring Windows 2000 Server.
Spring 2007Introduction to OS1 IT 3423: Operating System Concepts and Administration Instructor: Wayne (Weizheng) Zhou
Offering your Windows Server Class Online. Tony Basilico Community College of Rhode Island
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
 Contents 1.Introduction about operating system. 2. What is 32 bit and 64 bit operating system. 3. File systems. 4. Minimum requirement for Windows 7.
MKCL’s (ERA)Online Examination Software Installation & User Guide For use by Yashwantrao Chavan Maharashtra Open University (YCMOU)
Basic Computer Maintenance Basic Computer Maintenance Clean and Cool Deleting Temporary Files Scandisk Backup Your Data How to.
Lesson 4 Computer Software
©2012 Microsoft Corporation. All rights reserved. Content based on SharePoint 15 Technical Preview and published July 2012.
TC2-Computer Literacy Mr. Sencer February 8, 2010.
DB2 (Express C Edition) Installation and Using a Database
1 Network Statistic and Monitoring System Wayne State University Division of Computing and Information Technology Information Technology.
Chapter-4 Windows 2000 Professional Win2K Professional provides a very usable interface and was designed for use in the desktop PC. Microsoft server system.
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Herb Brown Appalachian State University. State of Networking Instruction  Many programs are adding networking instruction  Networking instruction is.
CS110/CS119 Introduction to Computing (Java)
Chromium OS is an open-source project that aims to build an operating system that provides a fast, simple, and more secure computing experience for people.
Review of Building Intelligent.NET Applications Stu Egli Andre Inistotov Frenny Saldana Kate Styers Nishant Zinzuwadia MSE 614 February 26, 2008.
Introduction to Windows XP Professional Chapter 2 powered by dj.
Tutorial 11 Installing, Updating, and Configuring Software
By: Paul Hill Technology Coordinator Gwinn Area Community Schools.
Hands-On Virtual Computing
DELTA TAU Data Systems, Inc. 1 UMAC TurboTurbo PMAC PCIGeo Drive Single Source Machine Control motion logic data Power PMAC Project Management November.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
3 3 3 CHAPTER System Software. 3 Objectives By the end of this lecture, you should know how to: –Describe the differences between system software and.
Installing and Configuring IIS. Reliable IIS 6.0 uses a new request-processing architecture and application-isolation environment that enables individual.
Please Note: Information contained in this document is considered LENOVO CONFIDENTIAL For Lenovo Internal Use Only Do Not Copy or Distribute!! For Lenovo.
Introduction to Interactive Media Interactive Media Tools: Software.
Bonrix SMPP Client. Index Introduction Software and Hardware Requirements Architecture Set Up Installation HTTP API Features Screen-shots.
Vagrant workflow Jul. 15, 2014.
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
Systems Software Operating Systems. What is software? Software is the term that we use for all the programs and data that we use with a computer system.
Grid MP at ISIS Tom Griffin, ISIS Facility. Introduction About ISIS Why Grid MP? About Grid MP Examples The future.
Systems Software Operating Systems. What is software? Software is the term that we use for all the programs and data that we use with a computer system.
 Database Administration Installing Oracle 11g & Creating Database.
IS 221: DATABASE ADMINISTRATION Lecture 2: Installing Oracle 10g or 11g & Creating Database. Information Systems Department 1.
© 2015 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Intro to Datazen.
IPT – Getting Started June Online Resources Project Website Requirements Server Preparation Installation Running IPT Installation Demo Upgrade/Reinstall.
IT1001 – Personal Computer Hardware & system Operations Week7- Introduction to backup & restore tools Introduction to user account with access rights.
Computer Software Operating Systems – Programs. Computer Language - Review We learnt that computers are made up of millions of tiny switches that can.
Alessandro Cardoso, Microsoft MVP Creating your own “Private Cloud” with Windows 10 Hyper- V WIN443.
IT1001 – Personal Computer Hardware & System Operations Week 6 - Introduction to software installation.
Page 1 of 38 Lenovo Confidential Lenovo Confidential Lenovo Confidential Lenovo Confidential Lenovo Confidential Please Note: Information contained in.
Practical Kinetics Exercise 0: Getting Started Objectives: 1.Install Python and IPython Notebook 2.print “Hello World!”
Software Installation and Copyrights Basic Computer Concepts Installation Basics  Installation Process  Copy files from distribution disks.
MIS Week 5 Site:
Virtual Machines Module 2. Objectives Define virtual machine Define common terminology Identify advantages and disadvantages Determine what software is.
Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD
1 BCS 4 th Semester. Step 1: Download SQL Server 2005 Express Edition Version Feature SQL Server 2005 Express Edition SP1 SQL Server 2005 Express Edition.
Planning Server Deployments Chapter 1. Server Deployment When planning a server deployment for a large enterprise network, the operating system edition.
Computer Software. Two Major Types of SW System SW Programs that generally perform the background tasks in a computer. These programs, many times, talk.
UNIX U.Y: 1435/1436 H Operating System Concept. What is an Operating System?  The operating system (OS) is the program which starts up when you turn.
GSU-Schedule File Transformation Tools Presented by: Radhika Eedara Pratima Dharmala Phanendar Movva Advisor: Dr. Soon Ok Park CPSC Spring 2016.
9 Copyright © 2004, Oracle. All rights reserved. Getting Started with Oracle Migration Workbench.
Sandbox Setup 2-Node Cluster. ©2015 Couchbase Inc. 2 What are the Pre-requisites for the Setup  Have at least an Intel i3 or AMD equivalent processor.
bitcurator-access-webtools Quick Start Guide
Nithyamoorthy S Core Mind Technologies
Microsoft Virtual Academy
Virtual Operating Systems
bitcurator-access-webtools Quick Start Guide
Software - Operating Systems
Azure Container Service
Windows Server Installation
Presentation transcript:

Data Science Background and Course Software setup Week 1

Index Installation process Lecture 1: Introduction to big data and data science Lecture 2: Performing data science and preparing data

Installation process (I) The same development environment: Two free software packages: VirtualBox and Vagrant Virtual Machine Hardware and Software Prerequisites Minimum Hardware Requirements Free disk space: 3.5 GB RAM memory: 2.5 GB (4+ GB preferred) Processor: Any recent Intel or AMD multicore processor should be sufficient. Supported Operating Systems Windows, Linux, MAC OS X

Installation process (II) Installation of the Virtual box: virtualbox.org  Downloads  Choose the appropriate version of the Virtual box for your OS Installation of Vagrant: -> Downloads  Choose the appropriate version of the Vagrant for your OS Installation of the Virtual Machine: Create a custom directory (e.g., /home/marrval/myvagrant) Download the file: setup/archive/master.zip to the custom directory and unzip it. Copy Vagrantfile to the custom directory you created in step #1 Open a DOS prompt (Windows) or Terminal (Mac/Linux), change to the custom directory, and issue the command vagrant up (the Virtual box opened in the background) Sparkvm is running!

Installation process (III) Basic Instructions for Using the Virtual Machine To start the VM, from a DOS prompt (Windows) or Terminal (Mac/Linux), issue the command vagrant up. To stop the VM, use the command vagrant halt You should always stop the VM before you log off, turn off, or reboot your computer. To erase or delete the VM, vagrant destroy Once the VM is running, to access the notebook, open a web browser to " : start the iPython notebook on port 8001 (so we can have access to an IPython notebook with a Spark)

Installation process (IV) Running Your First Notebook Start the VM Open a web browser to " Upload the file "lab0_student.ipynb”, which is contained in the.zip Verify that you do not encounter any errors in the run of the cells

Introduction to big data and data science (I) Correlation doesn’t imply causation Use more data Explore more types of data/factors

Introduction to big data and data science (II) Big Data: Why all this excitement? From 2003 to 2008, they looked at weekly search queries  Identify 45 terms relevant to people searching about flu  Build a model Google rolled out flu stories in Google News during this period + reading stories  skewed the results

Introduction to big data and data science (III) Big Data: Why all this excitement? Bloggers used data science to analyze the elections The campaigns were using data science (database that modeled the behavior of the electorate) Pollsters try to predict the outcome by polling people  they have biases (+errors)  incorrect results Challenge: remove biases + errors

Introduction to big data and data science (IV) Cautionary tale How did they come to this conclusion? Look at Google trend searches for MySpace and use the same model to Facebook Correlation doesn’t imply causation Identify important factors

Introduction to big data and data science (V) Where Does Big Data Come From? Online (And can be recorded). Many data are collected and few analyzed Users (user-generated content) Individually is not very large

Introduction to big data and data science (VI) Where Does Big Data Come From? Health and scientific computing Graphs Log files (generated by servers around The Internet) The Internet of Things (e.g., sensors in a forest, toll collection transponder to traffic reporting)

Performing Data Science and preparing Data (I) What is Data Science? Data Science aims to derive knowledge from big data, efficiently and intelligently” Data Science encompasses the set of activities, tools, and methods that enable data-driven activities in science, business, medicine, and government Apply algorithms at scale to large amounts of data, and understand both the algorithms and the results Collect data, analyze them and understands the analytical process and results Collect knowledge, apply algorithms, but do not understand

Performing Data Science and preparing Data (I) What is Data Science? Data Science aims to derive knowledge from big data, efficiently and intelligently” Data Science encompasses the set of activities, tools, and methods that enable data-driven activities in science, business, medicine, and government Apply domain-specific knowledge at very large scale, and understand both the algorithms and the results

Performing Data Science and preparing Data (II) Contrasting Data Science: Database

Performing Data Science and preparing Data (III) Contrasting Data Science: Database Contrasting Data Science: Scientific computing

Performing Data Science and preparing Data (IV) Contrasting Data Science: Traditional Machine Learning

Performing Data Science and preparing Data (V) Doing data science Problem  Collect data  clean the data  build a model  communicate the results

Performing Data Science and preparing Data (V) Cloud computing: key enabler of data science Allows date science on a massive scale Data science practice

Performing Data Science and preparing Data (VI) What is hard about Data Science?

Performing Data Science and preparing Data (VII) Data acquisition and Preparation 1.Extract data from sources 2.Load data into the sink 3.Transform data (source, sink, staging area)

Performing Data Science and preparing Data (VIII) Data acquisition and Preparation We create pipelines or workflows, which can be scheduled Recording the execution of a workflow is known as capturing lineage or provenance (Spark does it automatically) Impediments to collaboration: diversity of tools/programming languages, finding a script is hard, most analysis work is ‘thrown away’

Performing Data Science and preparing Data (VIII) Data Science roles Individual Organizational