Analytics from 330 million smartphones Sean Byrnes CTO & Co-founder.

Slides:



Advertisements
Similar presentations
From Startup to Enterprise A Story of MySQL Evolution Vidur Apparao, CTO Stephen OSullivan, Manager of Data and Grid Technologies April 2009.
Advertisements

Oracle Data Warehouse Mit Big Data neue Horizonte für das Data Warehouse ermöglichen Alfred Schlaucher, Detlef Schroeder DATA WAREHOUSE.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
AppCircle for iOS and Android Sean Webster Manager, Publisher Relations December 2011.
Compuware Confidential. Do Not Duplicate THANK YOU APM in the cloud: Are you ready? By: Mike Taylor.
Real-Time Big Data Use Cases John Leach CTO, Splice Machine.
SwatI Agarwal, Thomas Pan eBay Inc.
© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Android Engagement by the Numbers Sean Byrnes Co-Founder & CTO October,
Guavus Confidential – Do Not Distribute © 2014 Guavus, Inc. All rights reserved. Eric Carr (VP Core Systems Group) Spark Summit 2014 BUILDING BIG DATA.
Big Data Technologies for InfoSec Dive Deeper. See Further. Ram Sripracha UCLA / Sift Security.
Richard Firminger General Manager, Europe February, 2012 Using analytics to take your mobile games to the next level.
1 A Comparison of Approaches to Large-Scale Data Analysis Pavlo, Paulson, Rasin, Abadi, DeWitt, Madden, Stonebraker, SIGMOD’09 Shimin Chen Big data reading.
Business Intelligence: The Next Big Thing (Really!) John Bair CTO, Ajilitee Sep 14, 2012 Presented to TDWI St. Louis Chapter.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
CERN IT Department CH-1211 Geneva 23 Switzerland t XLDB 2010 (Extremely Large Databases) conference summary Dawid Wójcik.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Types of Operating System
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
» What is a network? A network is a group of two or more computer systems linked together. A network consists of two or more computers that are linked.
The Multiple Uses of HBase Jean-Daniel Cryans, DB Berlin Buzzwords, Germany, June 7 th,
The Evolution of Big Data Netflix
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Building BI App on Cloud Rohit Chatter Sr.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Introduction to Hadoop and HDFS
What is Big Data? Bid Data extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially.
Distributed Systems Fall 2014 Zubair Amjad. Outline Motivation What is Sqoop? How Sqoop works? Sqoop Architecture Import Export Sqoop Connectors Sqoop.
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
MalStone:Towards A Benchmark for Analytics on Large Data Clouds Collin Bennett Open Data Group 400 Lathrop Ave Suite 90 River Forest IL Robert L.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. LogKV: Exploiting Key-Value.
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
Indexing HDFS Data in PDW: Splitting the data from the index VLDB2014 WSIC、Microsoft Calvin
Development of Hybrid SQL/NoSQL PanDA Metadata Storage PanDA/ CERN IT-SDC meeting Dec 02, 2014 Marina Golosova and Maria Grigorieva BigData Technologies.
Supporting Large-scale Social Media Data Analyses with Customizable Indexing Techniques on NoSQL Databases.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Data Directions Greg.
Nov 2006 Google released the paper on BigTable.
CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.
February 2013 Thriving in the Mobile First Economy Richard Firminger, GM, EMEA.
ETRI Site Introduction Han Namgoong,
David M. Kroenke and David J. Auer Database Processing Fundamentals, Design, and Implementation Chapter Twelve: Big Data, Data Warehouses, and Business.
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Beyond Hadoop The leading open source system for processing big data continues to evolve, but new approaches with added features are on the rise. Ibrahim.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
1 Divya Jain Oct 10 th, 2014 Big Data Products: Where do I start?
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
CPSC8985 FA 2015 Team C3 DATA MIGRATION FROM RDBMS TO HADOOP By Naga Sruthi Tiyyagura Monika RallabandiRadhakrishna Nalluri.
Microsoft Partner since 2011
Microsoft Ignite /28/2017 6:07 PM
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
ETRI Site Introduction
SNS COLLEGE OF TECHNOLOGY
Software Systems Development
Powering real-time analytics on Xfinity using Kudu
Blazing-Fast Performance:
20 Questions with Azure SQL Data Warehouse
Hyper-Converged Technology a User's Perspective
Charles Tappert Seidenberg School of CSIS, Pace University
Presentation transcript:

Analytics from 330 million smartphones Sean Byrnes CTO & Co-founder

Flurry Overview 60, ,000 App Developers: Live Applications: Flurry Analytics Better apps on iOS, Android, BB, WP, HTML5 480M Devices per month: 33B Sessions per month: AppCircle Network Acquisition & Monetization: iOS, Android 6,200 App Developers: 200M Devices per month: 300B Events per month: 3M Daily Completed Views

How Flurry Works

Flurry’s Scale 1.2 Billions Sessions / Day 900 Servers 1.56 PB

Topics 1. Big Data Collection (HDFS) 2. Big Data Processing (Hadoop) 3. Data Mining at Scale (Hbase)

BIG DATA COLLECTION

Incoming Data Peak Connections per Second: 25,000 Data per day: 1.5 TB

Data Collection Reports Load Balancer Data Collector Load Balancer Data Collector Load Balancer Data Collector File HDFS

Data Collection Reports HDFS Location A Location B

BIG DATA PROCESSING

11 Normalization Data Correction Metrics Computation Agent Report De-duplication Portfolio Analysis Benchmarking Clustering Identify Device, Country, Carrier, etc. Bad Phone Clocks Partial Session Reports Handle duplicate reports Flexible calculation Configurable Dimensions Data mining and analysis Audience Segmentation Industry TrendsApplication Analytics Merchandising Analytics Analytics Processing

Large-scale Data Processing Input Data NoSQL DataStore Real-Time Batch Collectors Consumer/ Producer Systems MapReduce (jobs) External Action

Map/Reduce Management Challenge: Task Starvation Challenge: Task Roadblocking Challenge: Network Connection Waiting

Network Topology: Chained Rack 1 Rack 2 Switch 1 Switch 2 Rack 3 Switch 3

Network Topology: Star Rack 3 Rack 2 Switch 3 Switch 4 Switch 1 Switch 2 Trunk Rack 1 Rack 2

DATA MINING AT SCALE

Stages of Data Normalized OLAP Cube Raw Data 80 Billion Rows 160 Billion Rows 500 Billion Records

NoSQL Tables Data Index Column Family A Column Family B Data Data

NoSQL OLAP metric.dimension Index Column Family A # metric.dimensionA metric.dimensionB metric.dimensionC metric.dimensionA.dimensionB.dimensionC metric.dimensionA.dimensionB metric.dimensionA.dimensionC...

Lexicographical Ordering metricdimensionAdimensionBindex metric.dimensionA.dimensionB

Lexicographical Ordering metricdimensionAdimensionBindex metric.dimensionA.dimensionB

NoSQL OLAP metric.dimension.date metric.dimension.1_1_12 metric.dimension.3_1_12 Index Row Scan metric 1/1/12 3/1/12

blog.flurry.com

Sean Byrnes Flurry, Inc nd St. Suite 202 San Francisco, CA