Kevin Bacon. The Question You are Going to Answer (again) … Which pair of actors/actresses have acted together the most times?

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

Macros/VBA Project Modules and Creating Add-Ins on the Toolbar
Working with pig Cloud computing lecture. Purpose  Get familiar with the pig environment  Advanced features  Walk though some examples.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture VII: 2014/04/21.
Notes for Overview. Chrome plug-in Chrome plug-in works in product model. Chrome plug-in does not work in development model. – Why?
VIM: The basics Tang Wai-Chung, Matthew (MaFai) 29/12/2006.
Overview of Hadoop for Data Mining Federal Big Data Group confidential Mark Silverman Treeminer, Inc. 155 Gibbs Street Suite 514 Rockville, Maryland
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
CS143 Project 1 Due: Oct 24 th, 11:59 PM All the materials will be posted in courseweb.
Using the Facilities “FTP Site” Uploading Files
Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.
Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)
Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.
Getting Started with GIT. Basic Navigation cd means change directory cd.. moves you up a level cd dir_name moves you to the folder named dir_name A dot.
Using Google Drive/Docs Skills: use Google Drive/Docs Concepts: we download and run programs inside our Web clients, wire-frame diagram, user interface,
THE BIG PICTURE. How does JavaScript interact with the browser?
DataStage TX 6.7 (formerly Mercator 6.7) Documentation Stylesheets Includes versions for Microsoft and Saxon users.
BUSN 216 BY YOUR NAME 1 TOPICS Windows Explorer Word PowerPoint Excel Access Mail Merge 2.
An Introduction to HDInsight June 27 th,
Indispensable tools for research at its best
Configuring Sourceforge’s CVS to work with Forte (3.0 or later) on WindowsNT and Windows 2000 Specific for MAExplorer.sourceforge.net Written by Eric Shen,
Moving files to the server You MUST save any files that you want to access in the future to the server. Go to “My Computer”. Select the “C” drive to see.
Topic Java EE installation (Eclipse, glassfish, etc.) Eclipse configuration for EE Creating a Java Web Dynamic Project Creating your first servlet.
Level 1 Tutorial Project How to put a movie player on your Weebly website using an HTML code.
C programming and compilers. At least 3 ways to compile C Using gcc in UNIX environment via chaos.cs.auckland.ac.nz Using gcc in Cygwin in Windows Using.
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Before we start, please download: VirtualBox: – The Hortonworks Data Platform: –
How to Install Eclipse Click hereClick here to download Eclipse.
Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim.
Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.
Set up environment for mapreduce developing on Hadoop.
Updating to X6 Mike DeButts. Updating to X6 How to Install X6 What Not to do when Updating to X6 Migration Utility New File Locations introduced in X6.
 Before you continue you should have a basic understanding of the following:  HTML  CSS  JavaScript.
HYPACK PROJECTS HYPACK Projects What’s in a Project  ‘ A Project contains all of the information about a specific survey in a separate folder.
Open ModelSphere, a free CASE tool Page 1 © neosapiens 2010 Add a ModelSphere Plug-in in Eclipse This tutorial shows how to add a ModelSphere plug-in in.
Install CB 1.8 on Ubuntu. Steps Followed Install Ubuntu (Ubuntu LTS) on Virtual machine – (VMware Workstation) (
Aggregator  Performs aggregate calculations  Components of the Aggregator Transformation Aggregate expression Group by port Sorted Input option Aggregate.
Uniq The uniq command is useful when you need to find duplicate lines in a file. The basic format of the command is uniq in_file out_file In this format,
Matthias Clausen, DESY EPICS Training – Client Tools/ CSS EPICS collaboration meeting EPICS Training Client Tools EPICS collaboration meeting 2008.
This is the software we will use to load our html page up to the server. You can download a copy for home if you want to.
Quick Reference Guide The Multi-Vendor Backup Manager allows you to manage backup software settings on multiple agents in one place for Acronis, AppAssure,
Before the Session Verify HDInsight Emulator properly installed Verify Visual Studio and NuGet installed on emulator system Verify emulator system has.
GETTING STARTED WITH AWS AND PYTHON. OUTLINE  Intro to Boto  Installation and configuration  Working with AWS S3 using Bot  Working with AWS SQS using.
SSIS ETL Data Resource Management. Create an ETL package using a wizard database server to database server The business goal of this ETL package is to.
111 State Management Beginning ASP.NET in C# and VB Chapter 4 Pages
Using Endnotes with ERIC Click Here to Start Tutorial There are 33 slides in this tutorial. Click where directed on the screen to advance the tutorial.
ML-Dev: SML Plug-in for Eclipse Yevgeniy Bangiyev 02/07/07 Yevgeniy Bangiyev 02/07/07.
Database Programming Basic JDBC Programming Concepts.
Netbeanstcl (A netbeans plugin for Tcl) A GSoC (Google Summer of Code) Project by Michal Poczwardowski.
MapReduce “MapReduce allows us to stop thinking about fault tolerance.” Cathy O’Neil & Rachel Schutt, 2013.
Multiplying Decimals.
Script IBM SPSS & Apache Spark.
Dropbox Basics.
How to download, configure and run a mapReduce program In a cloudera VM Presented By: Mehakdeep Singh Amrit Singh Chaggar Ranjodh Singh.
Log in but don’t open any work. Things are changing!!!
How to make and publish a web page
Counting (co-)Stars.
הצטרפות לקבוצת DeDemoc
Hadoop.
Multiplying Decimals.
Indexing with Elasticsearch
Pig from Alan Gates’ book (In preparation for exam2)
CSE 491/891 Lecture 24 (Hive).
Charles Tappert Seidenberg School of CSIS, Pace University
How to use StackInterleave
Dreamweaver.
Hola Hadoop.
Advanced Programming TA Session 3
Assignment #2 (Assignment due: Nov. 06, 2018) v1 v2 v3 v4 v5
EVERYONE LOG ON!!! WE ARE GOING TO DELETE YOUR OLD USERNAME FOLDERS & ADD NEW, IMPROVED ONES TO YOUR COMPUTERS.
Presentation transcript:

Kevin Bacon

The Question You are Going to Answer (again) … Which pair of actors/actresses have acted together the most times?

Kevin Bacon

1. Download the project

2. Install Highlighter for Pig Put org.apache.pig.contrib.eclipse_1.0.0.jar into plugins/ folder of Eclipse Start Eclipse and import the project (can omit org.apache.pig.contrib.eclipse_1.0.0.jar)

3. Reference Material Things to help you! – 07-Pig pptx 07-Pig pptx (7 th Lecture) – (Official Pig documentation)

4. Get Started In Eclipse, file actor-count.pig, change your username – STORE ordered_actor_pair_count INTO '/uhadoop/[username]/pig-debug/ '; Open WinSCP and copy actor-count.pig to /data/2014/uhadoop/[username]/ Open PuTTY, navigate to /data/2014/uhadoop/[username]/ and call – pig actor-count.pig In PuTTY, look at the output – hadoop fs -cat /uhadoop/[username]/pig-debug/part-m | more

5. Implement the Script Process is same as before: – filter everything but “ THEATRICAL MOVIE ” in type – unique movie name = title+”##”+year+”##”+num – map from raw data to actor pairs starring in the same movie – count them and sort them. Use reference material Methodology: 1.Add one script line at a time … STORE new relation 2.Copy new script to the server 3.Delete old output (careful) hadoop fs -rmr /uhadoop/[username]/pig-debug/ 4.Run new script and check output 5.If it looks okay, GOTO 1

Output for Small File 20 Gy � rffy, Gy � rgy (I)##Gy � rgy, L � szl � (I) 15 Guerrero, Eddie (I)##Guti � rrez, Oscar (III) 13 Guill � n Cuervo, Fernando##Guill � n, Fernando 13 Gregurevic, Ivo##Grgic, Goran 12 Gyenge, � rp � d##Gy � rgy, L � szl � (I) 11 Guevara, Luis (I)##Guti � rrez, Alfredo (I) 10 Gross, Walter (I)##Gro � kurth, Kurt 9 Gurza, Humberto##Gurza, Miguel 9 Guerrero, Eddie (I)##Guerrero, Sal 9 Gr � nberg, � ke##Gustafson, Eric (I)

6. Run for all data raw = LOAD 'hdfs://cm:9000/uhadoop/imdb/full/actperso ns-to-movies.tsv‘ … STORE ordered_actor_pair_count INTO '/uhadoop/[username]/imdbfull/';