Presentation is loading. Please wait.

Presentation is loading. Please wait.

2013 Building and Improving Products with Hadoop Matthew Rathbone.

Similar presentations


Presentation on theme: "2013 Building and Improving Products with Hadoop Matthew Rathbone."— Presentation transcript:

1 2013 Building and Improving Products with Hadoop Matthew Rathbone

2 2013 What is Foursquare Foursquare helps you explore the world around you. Meet up with friends, discover new places, and save money using your phone.  4bn check-ins  35mm users  50mm POI  150 employees  1tb+ a day of data

3 2013 FIRST, A STORY

4 2013 The Right Tool for the Job Nginx – Serving static files Perl – Regular expressions XML – Frustrating people Hadoop (Map Reduce) – Counting

5 2013 COUNTING – WHAT IS IT GOOD FOR

6 2013

7

8

9

10

11 Statistically Improbable Phrases

12 2013 SIPS use cases menu extraction sentiment analysis venue ratings specific recommendations search indexing pricing data facility information

13 2013 How is SIPS built? Basically lots of counting.

14 2013 SIPS Tokenize data with a language model (into N- Grams) built using tips, shouts, menu items, likes, etc Apply a TF-IDF algorithm (Term frequency, inverse document frequency) Global phrase count Local phrase count ( in a venue ) Some Filtering and ranking Re-compute & deploy nightly

15 2013 WHY USE HADOOP?

16 2013 SIPS – Without Hadoop Potential Problems Database Query Throttling Venues are out of sync Altering the algorithm could take forever to populate for all venues Where would you store the results? What about debug data? Does it scale to 10x, 100x? What about other, similar workflows?

17 2013 SIPS – Hadoop Benefits Quick Deployment Modular & Reusable Arbitrarily complex combination of many datasets Every step of the workflow creates value

18 2013 Apple Store - Downtown San Francisco 1 tip mentions "haircuts" Search for "haircuts" in "san francisco"  Apple store??? Fixed by looking at % of tips and overall frequency “Hey Apple, how bout less shiny pizzazz and fancy haircuts and more fix-

19 2013 Data & Modularity

20 2013

21

22

23 ACTUALLY, IT’S A BIT MORE COMPLICATED

24 2013 These benefits require infrastructure

25 2013 Dependency Management Many options Oozie (Apache) Azkaban (LinkedIn) Luigi ( Spotify, we <3 this ) Hamake ( Codeminders ) Chronos ( AirBNB)

26 2013

27 Database / Log Ingestion Sqoop Mongo-Hadoop Kafka Flume Scribe etc

28 2013

29 MapReduce Friendly Datastore A few obvious ones: Hbase Cassandra Voldemort we built our own, it’s very similar to Voldemort and uses the Hfile API

30 2013

31 Getting started without all that stuff

32 2013 Components you likely don’t have

33 2013 The best way to start Don’t use Hadoop. *but pretend you do

34 2013 Other reasons to not use Hadoop Your idea might not be very good Hadoop will slow you down to start with You don’t have enough infrastructure yet build it when you need it V1 might not be that complex V1 could be a spreadsheet

35 2013

36

37 SIPS Version 1 Off the shelf language model A subset of Venues & Tips Did not use Map Reduce Did not push to production at all

38 2013 SIPS Version 2 Started building our own language model Rewritten as a Map Reduce Manually loaded data to production Filters for English data only. Tweak, improve, etc

39 2013 SIPS Version 3 Incorporated more data sources into our language model Deployment to KV store (auto) Incorporated lots of debug output Language pipeline also feeds sentiment analysis Now we’re in the perfect place to iterate & improve

40 2013 …to explore data

41 2013 In Summary Hadoop is good for counting, so use it for counting Move quickly whenever possible and don’t worry about automation Bring in new production services as you need them Freedom!

42 2013 Bonus: from my colleague, Joe Crobak (presenting later!)


Download ppt "2013 Building and Improving Products with Hadoop Matthew Rathbone."

Similar presentations


Ads by Google