Analytics from 330 million smartphones Sean Byrnes CTO & Co-founder
Flurry Overview 60, ,000 App Developers: Live Applications: Flurry Analytics Better apps on iOS, Android, BB, WP, HTML5 480M Devices per month: 33B Sessions per month: AppCircle Network Acquisition & Monetization: iOS, Android 6,200 App Developers: 200M Devices per month: 300B Events per month: 3M Daily Completed Views
How Flurry Works
Flurry’s Scale 1.2 Billions Sessions / Day 900 Servers 1.56 PB
Topics 1. Big Data Collection (HDFS) 2. Big Data Processing (Hadoop) 3. Data Mining at Scale (Hbase)
BIG DATA COLLECTION
Incoming Data Peak Connections per Second: 25,000 Data per day: 1.5 TB
Data Collection Reports Load Balancer Data Collector Load Balancer Data Collector Load Balancer Data Collector File HDFS
Data Collection Reports HDFS Location A Location B
BIG DATA PROCESSING
11 Normalization Data Correction Metrics Computation Agent Report De-duplication Portfolio Analysis Benchmarking Clustering Identify Device, Country, Carrier, etc. Bad Phone Clocks Partial Session Reports Handle duplicate reports Flexible calculation Configurable Dimensions Data mining and analysis Audience Segmentation Industry TrendsApplication Analytics Merchandising Analytics Analytics Processing
Large-scale Data Processing Input Data NoSQL DataStore Real-Time Batch Collectors Consumer/ Producer Systems MapReduce (jobs) External Action
Map/Reduce Management Challenge: Task Starvation Challenge: Task Roadblocking Challenge: Network Connection Waiting
Network Topology: Chained Rack 1 Rack 2 Switch 1 Switch 2 Rack 3 Switch 3
Network Topology: Star Rack 3 Rack 2 Switch 3 Switch 4 Switch 1 Switch 2 Trunk Rack 1 Rack 2
DATA MINING AT SCALE
Stages of Data Normalized OLAP Cube Raw Data 80 Billion Rows 160 Billion Rows 500 Billion Records
NoSQL Tables Data Index Column Family A Column Family B Data Data
NoSQL OLAP metric.dimension Index Column Family A # metric.dimensionA metric.dimensionB metric.dimensionC metric.dimensionA.dimensionB.dimensionC metric.dimensionA.dimensionB metric.dimensionA.dimensionC...
Lexicographical Ordering metricdimensionAdimensionBindex metric.dimensionA.dimensionB
Lexicographical Ordering metricdimensionAdimensionBindex metric.dimensionA.dimensionB
NoSQL OLAP metric.dimension.date metric.dimension.1_1_12 metric.dimension.3_1_12 Index Row Scan metric 1/1/12 3/1/12
blog.flurry.com
Sean Byrnes Flurry, Inc nd St. Suite 202 San Francisco, CA