Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Copyright GlobalLogic 2011 1 Connect. Collaborate. Innovate. Emerging Trends and Technologies in BI Sunil K Singh January, 2011 GlobalLogic.

Similar presentations


Presentation on theme: "© Copyright GlobalLogic 2011 1 Connect. Collaborate. Innovate. Emerging Trends and Technologies in BI Sunil K Singh January, 2011 GlobalLogic."— Presentation transcript:

1 © Copyright GlobalLogic Connect. Collaborate. Innovate. Emerging Trends and Technologies in BI Sunil K Singh January, 2011 GlobalLogic

2 © Copyright GlobalLogic Connect. Collaborate. Innovate. Company Overview – US $170M in revenue, 40%+ CAGR – 175+ client partnerships under active management – 5,500+ employees – Headquartered in the US with business offices in the UK, Germany, Israel and India – Global R&D Centers and Innovation Labs in the US, Ukraine, India, China and Argentina – 10 years of leadership in global software R&D services – Provides full lifecycle product engineering and advisory services for ISVs and software-enabled businesses – Privately held and backed by Sequoia Capital, NEA, Draper Atlantic / NAV and Goldman Sachs “A product development company like GlobalLogic is doing more than just providing offshore developers — it is seeking to collaborate with clients at a strategic level and provide executives with on-demand access to global innovation networks.” — Forrester Research “Being Innovative Means Moving Beyond the Hype”

3 © Copyright GlobalLogic Connect. Collaborate. Innovate. Globallogic—A Software R&D Services Company GlobalLogic has created a network of global innovation hubs made up on some of the brightest and most innovative software minds connected by a platform that supports agile collaboration which together accelerate breakthrough products to market.

4 © Copyright GlobalLogic Connect. Collaborate. Innovate. Industry Focus Copyright GlobalLogic 2009 Digital Media Retail Finance Infrastructure Healthcare Mobile Telecom Electronics

5 © Copyright GlobalLogic Connect. Collaborate. Innovate. The BI (R) Evolution!

6 © Copyright GlobalLogic Connect. Collaborate. Innovate. First came the Relational Database

7 © Copyright GlobalLogic Connect. Collaborate. Innovate. Typical Retail Operational Database create table product_categories ( product_category_id integer primary key, product_category_name varchar(100) not null ); create table manufacturers ( manufacturer_id integer primary key, manufacturer_name varchar(100) not null ); create table products ( product_id integer primary key, product_name varchar(100) not null, product_category_id references product_categories, manufacturer_id references manufacturers ); create table cities ( city_id integer primary key, city_name varchar(100) not null, state varchar(100) not null, population integer not null ); create table stores ( store_id integer primary key, city_id references cities, store_location varchar(200) not null, phone_number varchar(20) ); create table sales ( product_id not null references products, store_id not null references stores, quantity_sold integer not null, date_time_of_sale date not null );

8 © Copyright GlobalLogic Connect. Collaborate. Innovate. Marketing Trying to do Some Sales Analysis How many Oreo cookies were sold yesterday in cities with population less than fifty thousand people? select sum(sales.quantity_sold) from sales, products, product_categories, manufacturers, stores, cities where manufacturer_name = 'Oreo' and product_category_name = 'cookie' and cities.population < and trunc(sales.date_time_of_sale) = trunc(sysdate-1) -- restrict to yesterday and sales.product_id = products.product_id and sales.store_id = stores.store_id and products.product_category_id = product_categories.product_category_id and products.manufacturer_id = manufacturers.manufacturer_id and stores.city_id = cities.city_id; This query has six join from all 7 tables. It is a very expensive query Let’s copy the data to another databases for the marketing people

9 © Copyright GlobalLogic Connect. Collaborate. Innovate. Then Came the Data Warehouse

10 © Copyright GlobalLogic Connect. Collaborate. Innovate. Pick a FACT as the Center of Data Warehouse Marketing Cares Most About Sales Let us create a Fact table on sales create table sales_fact ( sales_date date not null, product_id integer, store_id integer, unit_sales integer, dollar_sales number ); You can fill this table at a scheduled time from the operational database This is you ETL process

11 © Copyright GlobalLogic Connect. Collaborate. Innovate. Different DIMENSIONS can be created about the FACT For example, we are interested in sales from a store Let us create a DiMENSION table create table stores_dimension ( stores_key integer primary key, name varchar(100), city varchar(100), county varchar(100), state varchar(100), zip_code varchar(100), date_opened date, date_remodeled date, store_size varchar(100),... ); Now query on sales from a city take one join on 2 tables select sd.city, sum(f.dollar_sales) from sales_fact f, stores_dimension sd where f.stores_key = sd.stores_key group by sd.city

12 © Copyright GlobalLogic Connect. Collaborate. Innovate. Data Warehouse Traditional Approach to BI Data cleanup Lookup Validation Mapping Value Sort Join Aggregation. etc Enterprise Systems Staging Transform Extract Datamart End User Tools Extract Enterprise Reporting (Crystal, BIRT…) Analytic Application (SAS, SPSS …) Machine Learning Decision Modeling Feedback loop Other Systems & Flat Files Sales Systems Financial Systems Core Production Systems OLAP layer Drilldown Slice Dice Rollup Pivot Load Data Warehouse Load External Data Extract

13 © Copyright GlobalLogic Connect. Collaborate. Innovate. Data Warehouse Collection of a large amount of data which is cleaned, transformed and cataloged and is made available for use in data mining, online analytical processing, market research and decision support Method of storage – Normalized vs. Dimensional Normalized: Similar to Database Normalization Rules. Tables are grouped by subject area Dimensional: Transactions are split into “Facts” and “Dimensions”. Facts are numbers, whereas Dimension are reference information of Facts

14 © Copyright GlobalLogic Connect. Collaborate. Innovate. Data Warehouse (Cont.) Schema design – Snowflake or Star Schema Read-only access The term OLAP was created as a slight modification of the traditional database term OLTP (OnLine Transaction Processing) MOLAP: Multi-dimensional OLAP, which uses multi-dimensional cube to store the data ROLAP: Relational OLAP, with RDBMS as the underneath storage technology HOLAP: Hybrid OLAP, which uses a mix of Relational and Multi-dimensional technology ETL stands for Extract, Transform, Load Some shops use home grown ETL Language: Shell Script, Perl, Python and Ruby, Java Other use ETL tools Informatica, SAP and MS SISS (Commercial) Talend and Pentaho Kettle (Open Source)

15 © Copyright GlobalLogic Connect. Collaborate. Innovate. Then Came the Internet and the Explosion of Data on the Web

16 © Copyright GlobalLogic Connect. Collaborate. Innovate. User Behavior Analysis Decision Support New Rules Web 2.0 BI Approach Load balancer DMZ Request Dispatcher Website Request Logger Request Service Processor Log entry Request Result Operation data & rules Web Crawler Third-party Supplier (e.g. Doubleclick) Service Responser Response Cooperate Data Center Response Rules Result Request Dispatcher Website Request Logger Request Service Processor Log entry Request Result Service Responser Response Operation data & rules Rules Result Transaction related Info Customer behavior Statistics Internet Result Trend Analysis Result User Behavior Analysis Decision Support Result Web ApplicationData ProviderMap/Reduce Task New Rules

17 © Copyright GlobalLogic Connect. Collaborate. Innovate. And suddenly Data Mining is the new BI !

18 © Copyright GlobalLogic Connect. Collaborate. Innovate. Data Mining – a process view Many Definitions Non-trivial extraction of implicit, previously unknown and potentially useful information from data Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns

19 © Copyright GlobalLogic Connect. Collaborate. Innovate. Why Mine Data – Commercial Viewpoint Lots of data is being collected and warehoused Web data Yahoo! collects  10GB/hour purchases at department/ grocery stores Walmart records  20 million transactions per day Bank/Credit Card transactions Computers have become cheaper and more powerful Competitive Pressure is Strong Provide better, customized services for an edge (e.g. in Customer Relationship Management)

20 © Copyright GlobalLogic Connect. Collaborate. Innovate. Why Mine Data – Scientific Viewpoint Data collected and stored at enormous speeds (GB/hour) remote sensors on a satellite NASA EOSDIS archives over 1-petabytes of Earth Science data per year telescopes scanning the skies Sky survey data gene expression data scientific simulations terabytes of data generated in a few hours Traditional techniques infeasible for raw data Data mining may help scientists in automated analysis of massive data sets in hypothesis formation

21 © Copyright GlobalLogic Connect. Collaborate. Innovate. Common Data Mining Techniques Predictive Modeling Clustering Association Rules Anomaly Detection Milk Data

22 © Copyright GlobalLogic Connect. Collaborate. Innovate. Amazon.com Case Study: Personalized Customer Relationship Management

23 © Copyright GlobalLogic Connect. Collaborate. Innovate. Amazon.com 5-step loyalty model Need Creation Information search Evaluate alternatives Purchase transaction Post purchase experience provide /assist anticipate/stimulate assist / negate optimise /reward add value StepAmazon’s action

24 © Copyright GlobalLogic Connect. Collaborate. Innovate. Step1: Need Creation Need Creation anticipate/stimulate

25 © Copyright GlobalLogic Connect. Collaborate. Innovate. provide /assist Information search Step2: Information Search

26 © Copyright GlobalLogic Connect. Collaborate. Innovate. Step3: Evaluation of Alternatives assist / negate Evaluate alternatives

27 © Copyright GlobalLogic Connect. Collaborate. Innovate. Step4: Purchase Optimisation/Reward optimise /reward Purchase transaction 1-click purchase1-click purchase ‘slippery check out counter’ vs. ‘sticky aisles’‘slippery check out counter’ vs. ‘sticky aisles’

28 © Copyright GlobalLogic Connect. Collaborate. Innovate. Step5: Post-purchase experience add value Post purchase experience Post purchase experience

29 © Copyright GlobalLogic Connect. Collaborate. Innovate. Internet Marketing Insight – Jeff Bezos Role of Advertisement – get customer to the store Customer experience – get customer to buy Brick & mortar stores Getting customer to store is the hard part Shopping cart abandonment is not common, since the overhead of going to another store is very high – especially in Minnesota winters! Marketing expenses 80% for advertisement; 20% for customer experience The rule should be reversed for on-line stores

30 © Copyright GlobalLogic Connect. Collaborate. Innovate. Difference in Two BI Approaches Traditional (Enterprise approach) Mainly use for exec reports, consumed by human Medium size data volume at enterprise-scale, not web-scale Very batch-oriented, weekly or monthly is norm. ETL (Informatica) Data Warehouse (RDBMS, Fact / Dimension tables, Star / Snowflake schema) Multi-dimensional (ROLAP, MOLAP, Slice / Dice / Rollup / Drilldown) Analytic Tool (Business Object) Modern (Web 2.0 company approach) Mainly use for data mining, and automatic feedback loop for adaptation Gigantic size data volume at web-scale, from many different sources Tight feedback loop, latency is within seconds or minutes. ETL (more tolerance on unclean data, but must be processed at high speed) Data Warehouse (Distributed Files Systems, NOSQL) Map/Reduce Parallel Processing (Hadoop) Analytic Tool (Hive / R)

31 © Copyright GlobalLogic Connect. Collaborate. Innovate. BI with Unstructured Data Hadoop + Vertica

32 © Copyright GlobalLogic Connect. Collaborate. Innovate. Big Data comes in Three Forms Unstructured Images, sound, video Semi-structured Logs, data feeds, event streams Fully Structured Relational tables

33 © Copyright GlobalLogic Connect. Collaborate. Innovate. MOM / CEP / HOP Near Time BI Reporting on Continuous Data Stream Processing System Streaming Data Operational System BI Reporting System Data Queries Real Time Dashboard Near Time Reporting MapReduce MMM RR HDFS Aggregator BI Adaptor The data volume will determine underneath technology framework (MOM, CEP or HOP) Using Commodity Hardware Any BI Reporting Tool Lookup DB Expected high volume incoming data stream

34 © Copyright GlobalLogic Connect. Collaborate. Innovate. Near Real-Time BI Reporting Raw incoming data gets processed real-time Depending on incoming data streaming velocity, different technologies will be use to pre-process data MOM (Message Oriented Middleware) CEP (Complex Event Processing) HOP (Hadoop Online Prototype) Incoming data will be divided in smaller batch, forwarded to MapReduce processer Processed data will typically be stored in a distributed file system such as HDFS Processed data will be pushed or pulled to target BI reporting application or tools

35 © Copyright GlobalLogic Connect. Collaborate. Innovate. What do people do with Hadoop? >Transform data >Archive data >Look for Patterns >Parse Logs

36 © Copyright GlobalLogic Connect. Collaborate. Innovate. Vertica ® Analytic Database MPP columnar architecture Second to sub-second queries 300GB/node load times Scales to hundreds of TBs Standard ETL & Reporting Tools

37 © Copyright GlobalLogic Connect. Collaborate. Innovate. Availability, Scalability and Efficiency …how fast can you go from data to answers? Unstructured data needs to be analyzed to make sense. Semi-structure data parsed based on spec (or brute force). Structured data can be optimized for ad- hoc analysis.

38 © Copyright GlobalLogic Connect. Collaborate. Innovate. Hadoop / Vertica Distributed processing framework (MapReduce) Distributed storage layer (HDFS) >Vertica can be used as a data source and target for MapReduce >Data can also be moved between Vertica and HDFS (sqoop) >Hadoop talks to Vertica via custom Input and Output Formatters

39 © Copyright GlobalLogic Connect. Collaborate. Innovate. Hadoop Compute Cluster Hadoop Compute Cluster Map Reduce Hadoop / Vertica Vertica serves as a structured data repository for hadoop

40 © Copyright GlobalLogic Connect. Collaborate. Innovate. Hadoop / Vertica Vertica’s input formatter takes a parameterized query Relational Map operations can be pushed down to the database Vertica’s output formatter takes an existing table name or a description Vertica output tables can be optimized directly from hadoop

41 © Copyright GlobalLogic Connect. Collaborate. Innovate. Hadoop / Vertica Federate multiple Vertica database clusters with hadoop Hadoop Compute Cluster Hadoop Compute Cluster Ma p Red uce Hadoop Compute Cluster Hadoop Compute Cluster Ma p Red uce Hadoop Compute Cluster Hadoop Compute Cluster Ma p Red uce Hadoop Compute Cluster Hadoop Compute Cluster Ma p Red uce

42 © Copyright GlobalLogic Connect. Collaborate. Innovate. Data Mining for Computational Social Sciences A Case Study from Virtual Worlds

43 © Copyright GlobalLogic Connect. Collaborate. Innovate. Online Games Massively Multiplayer Online Role Playing Games (MMORPG) are computer games that allow hundreds to thousands of players to interact and play together in a persistent online world Popular MMO Games- Everquest 2, World of Warcraft and Second Life

44 © Copyright GlobalLogic Connect. Collaborate. Innovate. MMORPG – Everquest 2 MMORPGs (MMO Role Playing Games) are the most popular of MMO Games Examples: World of Warcraft by Blizzard and Everquest 2 by Sony Online Entertainment Various logs of players’ behavior are maintained Player activity in the environment as well his/her chat is recorded at regular time instances, each such record carries a time stamp and a location ID Some of the logs capture different aspects of player behavior Guild membership history (member of, kicked out of, joined, left) Achievements (Quests completed, experience gained) Items exchanged and sold/bought between players Economy (Items/properties possessed/sold/bought, banking activity, looting, items found/crafted) Faction membership (faction affiliation, record of actions affecting faction affiliation)

45 © Copyright GlobalLogic Connect. Collaborate. Innovate. Social Science Data Mining with EverQuest 2 Data improve understanding of the dynamics of group behavior MMORPG data enables us to look at dynamics of groups in a new way Multiple groups are part of a large social network Individuals from the social network can join or leave groups Groups are not isolated and some of them can be related i.e. they may be geared towards specific objectives, each of which works towards a larger goal (e.g. different teams working towards disaster recovery) The emergence, destruction as well as dynamic memberships of the groups depend on the underlying social network as well as the environment

46 © Copyright GlobalLogic Connect. Collaborate. Innovate. Thank You! We are always looking for good engineers who are passionate about technology. For more information, please


Download ppt "© Copyright GlobalLogic 2011 1 Connect. Collaborate. Innovate. Emerging Trends and Technologies in BI Sunil K Singh January, 2011 GlobalLogic."

Similar presentations


Ads by Google