Presentation is loading. Please wait.

Presentation is loading. Please wait.

Watching Pigs Fly with the Netflix Hadoop Toolkit Hadoop Summit 2013 San Jose, CA.

Similar presentations


Presentation on theme: "Watching Pigs Fly with the Netflix Hadoop Toolkit Hadoop Summit 2013 San Jose, CA."— Presentation transcript:

1 Watching Pigs Fly with the Netflix Hadoop Toolkit Hadoop Summit 2013 San Jose, CA

2 Data should be accessible, easy to discover, and easy to process for everyone. Our Motivation

3 Our Users Analysts Engineers

4 Hadoop Platform as a Service

5 S3

6 Hadoop Platform as a Service Data Platform

7 Data Platform as a Service Franklin (Metadata API) Franklin (Metadata API) Sting (Adhoc Visualization) Sting (Adhoc Visualization) Forklift (Data Movement) Forklift (Data Movement) Looper (Backloading) Looper (Backloading) Ignite (A/B Test Analytics) Ignite (A/B Test Analytics) Spock (Data Auditing) Spock (Data Auditing) Genie (Hadoop PaaS) Genie (Hadoop PaaS) Lipstick (Pig Workflow Visualization) Lipstick (Pig Workflow Visualization) Event Service (Orchestration) Event Service (Orchestration) Hadoop S3 Other Processing

8 Let’s solve a problem using the data!

9 Build a recommender.

10 But, what makes good recommendations? Similarity Personalization

11 COLORS!

12 Box art is colorful…

13 We’re Sorry COLORS! Box art is colorful…

14 Where can I find the data?

15 Hadoop Platform as a Service S3

16 Hadoop Platform as a Service S3 Cassandra Teradata Redshift RDS

17 Data Platform as a Service Franklin (Metadata API) Franklin (Metadata API) S3 Cassandra Teradata Redshift RDS

18 Data Platform as a Service Franklin (Metadata API) Franklin (Metadata API)

19 Create a dataset for box art and color.

20 Whether your dataset is large or small, being able to visualize it makes it easier to explain.

21 Data Platform as a Service Franklin (Metadata API) Franklin (Metadata API) Sting (Adhoc Visualization) Sting (Adhoc Visualization)

22 Sting Allows users to cache the results of a genie job in memory Sub second response to OLAP style operations (slicing, dicing, aggregations). Adhoc / recurring schedule Easy to use!

23 Hive Query Schema

24 % Content Consumed / Hour

25 Hemlock Grove House of Cards Arrested Development

26 Similarity

27

28

29 House of Cards Macbeth

30

31

32 Toddlers & Tiaras Star Trek: Voyager

33 Personalization

34 # of subscribers X # of titles = ???,000,…,000 (big data) Big Data

35 Netflix Apache Pig

36

37 Data Platform as a Service Franklin (Metadata API) Franklin (Metadata API) Sting (Adhoc Visualization) Sting (Adhoc Visualization)

38 Lipstick Allows users to visualize their data flow Allows users to see common errors Allows users to easily monitor their jobs Empowers users to support themselves Facilitates communication between infrastructure team and users

39 Lipstick

40 Overall Job Progress

41 Logical Plan Overall Job Progress

42 Logical Operator (reduce side) Logical Operator (map side) Map/Reduce Job Intermediate Row Count Records Loaded

43 Hadoop Counters

44 My Job has stalled. Common Problem #1

45

46 Unoptimized/Optimized Logical Plan Toggle Dangling Operator

47 I didn’t get the data I was expecting Common Problem #2

48

49

50 I don’t understand why my job failed. Common Problem #3

51 Failed Job (light red background) Successful Job (light blue background)

52

53 Wrapping up Demos at the Netflix booth in the exhibit hall (see more Lipstick, Sting, and Genie). Lipstick is part of Netflix OSS. Clone it on github at We welcome feedback and contributions!

54  Charles Smith:  Jeff Magnusson: Thank you! Jobs: Netflix OSS: Tech Blog:


Download ppt "Watching Pigs Fly with the Netflix Hadoop Toolkit Hadoop Summit 2013 San Jose, CA."

Similar presentations


Ads by Google