3 The Human Migration path Historical human migration patterns mapped by analyzing DNA samples from hundreds of thousands of people around the world
4 Advanced Analytics – some pointers Focused on finding patterns & relationships in data, and using that to predict future behaviour “What will happen?” “Why is this happening?” “What can happen” etc. Discovery, Actionable Insight Extremely complex(often SQL driven) queries & usage of statistical & predictive models & techniques Usually involve processing large volumes of data – and quite often specially extracted/prepared data as well. Usually demands high levels of expertise from users to define the models involved, and to infer the output Mostly Expensive!
5 Advanced Analytics – Some Typical Business applications ChurnLoyaltyRetention Cross Sell/ Upsell Loss Pervention Anti-Fraud Segmentation Market Basket Analysis Survival Analysis Drug Discovery
6 What is at stake… 6x to 7x 96% 50% 55 The number of times more expensive to gain a customer than to retain one % of customers lost by US Companies every 5 years # of negative pieces of advertising from one disgruntled customer % of customers who don't complain when they have a problem, but don’t come back 50% % of customers who tell the business they are "fairly satisfied" but won't be repeat buyers 25 to 95% 83% Increase in profits from a 5% increase in customer retention % of Customers who will remain loyal after a complaint is resolved 2x Growth of Businesses which have a reputation for excellent customer relations Source: Bain & Co in HBS; Entrepreneur Business Centre's Information Resource Centre
7 The CrispDM process Problem Definition Initial Data gathering First understanding of data Preparing Data for modeling tool Cleansing/transformation Modeling technique Selection Reevaluate data needs if reqd Model Evaluation against business needs Deployment of model, gain insight Clustering Association Regression Classification Clustering Association Regression Classification Neural Networks Decision Trees Machine learning Sequencing Neural Networks Decision Trees Machine learning Sequencing
8 Major Technology players Source: Forrester Wave : Predictive Analytics and Data Mining Solutions 2010 SAS leads the pack, highest market share, best spread of solutions IBM integrating SPSS with Cognos suite Oracle leverages Oracle Data Mining tightly integrated with database KXen offers wide range of solutions TIBCO with Spotfire 3.1
9 Advanced Analytics - Trends Increased Attention and focus for Advanced Analytics – hot priority item for the next 2-3 years Increased Pervasiveness -> Moving on from the domain of PhDs and statistician to regular information workers. New vendors offering lower cost solutions will add to this Text Analytics will become mainstream technology – initially overlapping with social media, but will extend to other domains as well Social Media Analytics still evolving, a lot of players in the space right now Technology Vendors scrambling for incorporating Advanced Analytics capabilities as part of main solution stack Big Data Analytics focus – moving away from the constraint of DW driven predictive analytics Analytics in the cloud – increased acceptance, mostly in SMBs R language – increased acceptance, leading to lower-cost solutions In-Memory Analytics gaining momentum Source – various analysts & industry observers Predictive Analytics is the next big battleground in the BI Market!
10 Moving from Experts to Information Workers? Info workers want smarter, more predictive apps Packages that can be used by everyone Complexity hidden inside the tool Higher levels of usability Include visualization and embedded predictive models with apps Info workers don’t want to know they have analytics –they just want to have the right answers!
11 R – game changer ? Programming Language for Statistical Computing & Analysis – Open source Offers a fascinating low-cost option compared to industry leaders Still evolving, in a continuous improvement mode In-memory features are a big advantage Big bets being placed on R by many vendors SAS, Information Builders, Netezza, Jaspersoft – joining the R bandwagon Expected to be picked up and integrated by most predictive analytics vendors to enhance capabilities Next 2-3 years will see R evolving and being accepted in the mainstream – once rough edges are polished Developed in 1993 Highly Extensible, with additional packages being built continuously Uses a command line interface, several GUIs are available too Variety of Statistical and graphics techniques Multiple versions/modes available Highly Extensible, with additional packages being built continuously Uses a command line interface, several GUIs are available too Variety of Statistical and graphics techniques Multiple versions/modes available
12 Survey feedback The Challenge of Unstructured Data Sales Info Customer feedback Service Info Analytical Process Decisions?? Blog entries Online reviews ? 92% of Consumers search for Information online 46% them are influenced to purchase 43% deterred from purchasing ( Source – ChannelAdvisor- Consumer Shopping Habits Suvey 2010
13 Text Analytics/Text Mining - Increasing Relevance and Adoption Linguistic, statistical and machine learning techniques to structure and model information content from textual sources –Information Retrieval –Pattern Recognition –Entity recognition –Co-references –Sentiment Analysis Picture Courtesy - IBM Major Vendors – IBM, SAS, Offer focused Text Analytics solutions Listening Post Services for Sentiment Analysis
15 Social Media Analytics – an evolving discipline A number of players in the market Typically covers the common social media content like blogs, social networking sites, Discussion forums etc Primary Objective : Get insight into products/brands, understand user sentiment and behaviour, perception etc. Clarabridge, Radian6, ScoutLabs. Alterian, Attentio etc are some popular tools Advanced, Predictive Capabilities getting enhanced
17 Big Data Analytics Analytics Involving possibly Petabytes of data Pressure taken off traditional Data Warehouses and similar data sources for analytics Separate Analytics Database focusing on massive query performance Unshackles from the limitations the existing data warehouse design has in terms of performance and scaleability Columnar vs Row-based? Two schools of thought MPP capabilities are leveraged to the hilt Leverages frameworks like MapReduce, Hadoop etc Aster Data, ParAccel, Teradata etc focused in this area