Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building BI App on Cloud Rohit Chatter Sr.

Similar presentations


Presentation on theme: "Building BI App on Cloud Rohit Chatter Sr."— Presentation transcript:

1 Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

2 Yahoo is the most Visited Site on the Internet – 600M+ Unique Visitors per Month – Billions of Page Views per Day – Billions of Searches per Month – Billions of Emails per Month – Terabytes of Data per Day! And we crawl the Web – 100+ Billion Pages – 5+ Trillion Links – Petabytes of data Reading 100 Terabytes could be overwhelming Yahoo! BigData Scale

3 Types in a search query on Yahoo or affiliate site (aka the Publisher) Passes search query to the ad platform for servable ad listings Manages campaigns, creates ad listings, bids for keywords Ad serving returns relevant & available ads matching the search query Clicks on Ad Shows ads returned by ad serving Yahoo! Search Scale

4 Daily, Weekly, Monthly & Yearly Daily, Hourly, Weekly, Monthly & Yearly Daily, Weekly, Monthly & Yearly Daily, Hourly, Weekly, Monthly & Yearly Performance, Credit Summary Performance, Budget Headroom, AM performance, competitive analysis Performance, Feature Adoption Competitive analysis, cross sell, upsell, performance Business Model

5 Business Perfomance monitoring RDBMS Facts Home Grown App Level 1 & 2 analysis Granular aggregates Home Grown App What if analysis and deep dive data analysis Most granular data- event level model Tactical & Operational reporting Improvement & Alignment Excellence & Strategic Hour Glass Model – A Perspective

6 Functional View Data – 100+ Gigabytes/Day Hadoop Grid + PIG Cloud Hadoop Grid + PIG Cloud Aggregates & Metadata layer App Server – BI layer Data Source Dimension & Fact Utility Computing Build Aggregates Oracle RDBMS BI Aggregates (H,D,W,M) BI Tool/Home Grown What is computed where Metrics Impressions, Revenue, Clicks, Conversions, Quality Score, Top keywords Rollups, Type 2 Dimension, Alerts & Messaging Load balanced web Apache Web Server Derived Metrics – CTR, Depth, RPM, Coverage BI on Cloud [1000ft view]

7 BI on Cloud – Screen Shots

8 CUBE on Hadoop?

9 Oracle ETL/ Aggregation I-CUBE HADOOP MicroStrategy Home Grown Tools ART Tradition APOLLO FEEDS

10 I-CUBE HADOOP BI Tool Home Grown Tools ART HBASE Aggregation in HIVE Game Changer – Hbase & Schema Hiveserver JDBC/ODBC

11 How we do? RowKeyDay MetricsWeek MetricsMTD MetricsSCD InfoOffer Stats OrderId-MMYYD 1 D 2 ……..D n W x W x+1 …… W y Imp ClicksName Email … Htable – Schema Less Use Hbase Incrementor - incrementColumnValue for Weekly & MTD Hive Windowing UDF to generate flattened daily row Carefully choose Rowkey SCD – Comes free Performance – Physical file Hfile by table & Column Family Number Game Size – 360GBFormat – RCFileRows – 14.7 Bilion Mappers – 562Reducers – 436 Elapsed Time <= 30 mins

12 Hadoop/RD BMS BIG DATA SLA Challenge@Hand

13 What users love? – Excel & Pivot

14 What if I need to Pivot Having few Million Record Or maybe Billion records But “Hang” on a minute? – BIG DATA?

15 Our Answer – Hadoop Pivot Number Game Size – 360GB Format – RCFile Rows – 14.7 Bilion Mappers – 670 Reducers – 30 Elapsed Time – 251 secs [< 5 mins] Voila – Back to Excel

16 Questions?

17 Hadoop HDFS – Hourly Feeds Hadoop HDFS Grid – Daily Feeds & Aggregates Oracle RAC 8 Node 60TB Oracle RAC 8 Node 60TB Oracle ETL Server BI App Server BI Web Server App Server,Grid Launcher Box GRID Based Report Web Server GRID Based Report Web Server Metadata Unified Web BI Portal Web Services Data Access Layer [ ODBC/PL/SQL API] Dimensions HBase Dimensions HBase Facts on HDFS [Rcfile] Other Tools Other Tools TRADITIONALTRADITIONAL GRIDGRID Hive + PIG – Query Engine Sche duler


Download ppt "Building BI App on Cloud Rohit Chatter Sr."

Similar presentations


Ads by Google