Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pig Latin CS 6800 Utah State University. Writing MapReduce Jobs Higher order functions Map applies a function to a list Example list [1, 2, 3, 4] Want.

Similar presentations


Presentation on theme: "Pig Latin CS 6800 Utah State University. Writing MapReduce Jobs Higher order functions Map applies a function to a list Example list [1, 2, 3, 4] Want."— Presentation transcript:

1 Pig Latin CS 6800 Utah State University

2 Writing MapReduce Jobs Higher order functions Map applies a function to a list Example list [1, 2, 3, 4] Want to square each number in the list Write function f(x) = x*x Compute [f(1), f(2), f(3), f(4)] = [1, 4, 9, 16] map function signature: (a -> b) -> [a] -> [b] Haskell specification map f [] = [] map f (x:xs) = (f x) :: (map f xs) Call the function map (\x -> x * x) [1, 2, 3, 4]

3 Reduce Reduce converts a list into a scalar Example list [1, 2, 3, 4] Want to sum the numbers in the list Write function g(x,y) = x+y Compute g(1,g(2,g(3,g(4,0)))) = 10 reduce signature: (a -> b -> c) -> b -> [a] -> c Haskell specification reduce g c [] = c reduce g c (x:xs) = g x (reduce g c xs) Call the function reduce (\x -> x + x) 0 [1, 2, 3, 4]

4 Use in Cloud Computing Map can be used to clean data and "group" it Suppose a list of words words = [Bat Volcano bat vulcano] Map to lower case lcase = map lowercase words Map to correct spelling s = map spellFix lcase Count each word groups = map (\x -> (x, 1)) s groups is [(bat, 1), (volcano, 1), (bat, 1) …

5 Use in Cloud Computing (continues) Shuffles collects tuples with same "group" value Reduce combines counts result = reduce + 0 groups Problem - MapReduce jobs written in PL (e.g., Java) Complicated Not reusable Database-like operations common

6 CouchDB - Count People per Gender

7 Pig Latin Yahoo 40% of Hadoop jobs run using Pig Platform for analyzing massive data sets Runs on Hadoop (Map/Reduce) Version 0.12

8 What is Pig Latin? Dataflow language Non 1NF data model Tuples Sets Bags Use relational algebra-like operations to manipulate data Joins Filter - selection Generate - projection Compiles to MapReduce jobs on Hadoop cluster

9 Pig Latin Features A dataflow (NoSQL) language SQL is declarative, most PLs are not SQL poor at expressing workflow Non-1NF data model Bags, sets, tuples, maps Data resides in read-only files Schema-less

10 Example Count subscribers in each city A = LOAD ’subscribers.txt’ AS (name: chararray, city: chararray, amount: int); B = GROUP A BY city; C = FOREACH B GENERATE city, COUNT(B.name); DUMP C; Dataflow LOAD … GROUP A … A B C FOREACH B …

11 Compilation Pig Latin Compiler Map Reduce HDFS Map Reduce HDFS Map Reduce HDFS Map Reduce HDFS Map Reduce HDFS Map Reduce HDFS Hadoop Map Reduce Job Pig Latin Program Result

12 Data Transformations Relational algebra-like JOIN (inner and outer joins) FILTER (selection) FOREACH (projection) CROSS (product) UNION SQL-like DISTINCT LIMIT ORDER BY GROUP Non-traditional COGROUP MAPREDUCE FLATTEN RANK STREAM SAMPLE SPLIT

13 Magazine Subscriber Data Subscribers (Maya, Logan, $20, 1) (Jose, Logan, $15, 2) (Name, City, Amt, Id) (Knut, Ogden, $20, 3)... Personal Information (Maya, maya@gmail.com, 5) (Jose, jose@gmail.com, 6) (Name, Email, Id) (Knut, knut@hotmail.com, 7)...

14 FILTER A filter restricts the result /* Restrict to Logan subscribers */ X = FILTER R ON city = "Logan"; FILTER example Subscribers (Name, City, Amt, Id) (Maya, Logan, $20, 1) (Jose, Logan, $15, 2) (Knut, Ogden, $20, 3)...

15 Magazine Subscriber Data Subscribers (Maya, Logan, $20, 1) (Jose, Logan, $15, 2) (Name, City, Amt, Id) (Knut, Ogden, $20, 3)... Personal Information (Maya, maya@gmail.com, 5) (Jose, jose@gmail.com, 6) (Name, Email, Id) (Knut, knut@hotmail.com, 7)... B = JOIN Subscribers BY name, PerInfo By name

16 Magazine Subscriber Data B (Maya, Logan, $20, 1, Maya, maya@gmail.com, 5) (Jose, Logan, $15, 2, Jose, jose@gmail.com, 6) (Name, City, Amt, Id, Name, Email, Id) (Knut, Ogden, $20, 3, Knut, knut@hotmail.com, 7)... B = JOIN Subscribers BY name, PerInfo By name

17 Optimization FILTER … AB C JOIN …FILTER … D E CROSS … Map/Reduce


Download ppt "Pig Latin CS 6800 Utah State University. Writing MapReduce Jobs Higher order functions Map applies a function to a list Example list [1, 2, 3, 4] Want."

Similar presentations


Ads by Google