Download presentation
Presentation is loading. Please wait.
Published byKent Bowell Modified over 9 years ago
1
Watching Pigs Fly with the Netflix Hadoop Toolkit Hadoop Summit 2013 San Jose, CA
2
Data should be accessible, easy to discover, and easy to process for everyone. Our Motivation
3
Our Users Analysts Engineers
4
Hadoop Platform as a Service
5
S3
6
Hadoop Platform as a Service Data Platform
7
Data Platform as a Service Franklin (Metadata API) Franklin (Metadata API) Sting (Adhoc Visualization) Sting (Adhoc Visualization) Forklift (Data Movement) Forklift (Data Movement) Looper (Backloading) Looper (Backloading) Ignite (A/B Test Analytics) Ignite (A/B Test Analytics) Spock (Data Auditing) Spock (Data Auditing) Genie (Hadoop PaaS) Genie (Hadoop PaaS) Lipstick (Pig Workflow Visualization) Lipstick (Pig Workflow Visualization) Event Service (Orchestration) Event Service (Orchestration) Hadoop S3 Other Processing
8
Let’s solve a problem using the data!
9
Build a recommender.
10
But, what makes good recommendations? Similarity Personalization
11
COLORS!
12
Box art is colorful…
13
We’re Sorry COLORS! Box art is colorful…
14
Where can I find the data?
15
Hadoop Platform as a Service S3
16
Hadoop Platform as a Service S3 Cassandra Teradata Redshift RDS
17
Data Platform as a Service Franklin (Metadata API) Franklin (Metadata API) S3 Cassandra Teradata Redshift RDS
18
Data Platform as a Service Franklin (Metadata API) Franklin (Metadata API)
19
Create a dataset for box art and color.
20
Whether your dataset is large or small, being able to visualize it makes it easier to explain.
21
Data Platform as a Service Franklin (Metadata API) Franklin (Metadata API) Sting (Adhoc Visualization) Sting (Adhoc Visualization)
22
Sting Allows users to cache the results of a genie job in memory Sub second response to OLAP style operations (slicing, dicing, aggregations). Adhoc / recurring schedule Easy to use!
23
Hive Query Schema
24
% Content Consumed / Hour
25
Hemlock Grove House of Cards Arrested Development
26
Similarity
29
House of Cards Macbeth
32
Toddlers & Tiaras Star Trek: Voyager
33
Personalization
34
# of subscribers X # of titles = ???,000,…,000 (big data) Big Data
35
Netflix Apache Pig
37
Data Platform as a Service Franklin (Metadata API) Franklin (Metadata API) Sting (Adhoc Visualization) Sting (Adhoc Visualization)
38
Lipstick Allows users to visualize their data flow Allows users to see common errors Allows users to easily monitor their jobs Empowers users to support themselves Facilitates communication between infrastructure team and users
39
Lipstick
40
Overall Job Progress
41
Logical Plan Overall Job Progress
42
Logical Operator (reduce side) Logical Operator (map side) Map/Reduce Job Intermediate Row Count Records Loaded
43
Hadoop Counters
44
My Job has stalled. Common Problem #1
46
Unoptimized/Optimized Logical Plan Toggle Dangling Operator
47
I didn’t get the data I was expecting Common Problem #2
50
I don’t understand why my job failed. Common Problem #3
51
Failed Job (light red background) Successful Job (light blue background)
53
Wrapping up Demos at the Netflix booth in the exhibit hall (see more Lipstick, Sting, and Genie). Lipstick is part of Netflix OSS. Clone it on github at http://github.com/Netflix/Lipstick http://github.com/Netflix/Lipstick We welcome feedback and contributions!
54
Charles Smith: charsmith@netflix.comcharsmith@netflix.com Jeff Magnusson: jmagnusson@netflix.comjmagnusson@netflix.com Thank you! Jobs: http://jobs.netflix.comhttp://jobs.netflix.com Netflix OSS: http://netflix.github.iohttp://netflix.github.io Tech Blog: http://techblog.netflix.com/http://techblog.netflix.com/
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.