Ch 5. The Evolution of Analytic Processes

Slides:



Advertisements
Similar presentations
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Advertisements

C6 Databases.
Which server is right for you? Get in Contact with us
Multi-Mode Survey Management An Approach to Addressing its Challenges
A Fast Growing Market. Interesting New Players Lyzasoft.
Ch 9. What Makes a Great Analytics Team? Taming The Big Data Tidal Wave 7 June 2012 SNU IDB Lab. Jee-bum Park.
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc. All rights reserved. 8-1 BUSINESS DRIVEN TECHNOLOGY Chapter Eight: Viewing and Protecting Organizational.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Page 1Prepared by Sapient for MITVersion 0.1 – August – September 2004 This document represents a snapshot of an evolving set of documents. For information.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
Private Cloud: Application Transformation Business Priorities Presentation.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | OFSAAAI: Modeling Platform Enterprise R Modeling Platform Gagan Deep Singh Director.
BUSINESS INTELLIGENCE/DATA INTEGRATION/ETL/INTEGRATION AN INTRODUCTION Presented by: Gautam Sinha.
Basic Concepts of Datawarehousing An Overview Prasanth Gurram.
ENTERPRISE DATA INTEGRATION APPLICATION ARCHITECTURE COMMITTEE OCTOBER 8, Year Strategic Initiatives.
Ch 4. The Evolution of Analytic Scalability
ACS1803 Lecture Outline 2 DATA MANAGEMENT CONCEPTS Text, Ch. 3 How do we store data (numeric and character records) in a computer so that we can optimize.
Understanding Data Warehousing
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Rodney Holman Mandip Kaur Information Builders  Company Name: Information Builders  CEO and Founder: Gerald D. Cohen  Address: Two Penn Plaza, New.
Copyright 2002 Prentice-Hall, Inc. Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design.
RiskView ® Architecture: Data Model September 2012 Robert Cruickshank CEO & CTO, (703)
Oracle Challenges Parallelism Limitations Parallelism is the ability for a single query to be run across multiple processors or servers. Large queries.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
ETL Overview February 24, DS User Group - ETL - February ETL Overview “ETL is the heart and soul of business intelligence (BI).” -- TDWI ETL.
= WEEKS, MONTHS, YEARS OF DELAYED APPLICATION VALUE MISSED REVENUE OPPORTUNITIES, INCREASED COST AND RISK DEV QA PACKAGE COMMERCIAL SOFTWARE CUSTOM APPLICATION.
© 2007 by Prentice Hall 1 Introduction to databases.
1 Data Warehouses BUAD/American University Data Warehouses.
Right In Time Presented By: Maria Baron Written By: Rajesh Gadodia
 Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). 
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Server Virtualization
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Cloud Computing Project By:Jessica, Fadiah, and Bill.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
© 2007 IBM Corporation IBM Information Management Accelerate information on demand with dynamic warehousing April 2007.
VMware vSphere Configuration and Management v6
Pengantar Sistem Informasi Data Resource Management.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Data Warehousing 101 Howard Sherman Director – Business Intelligence xwave.
1 Chapter The Impact of Database Customer centric approach - A highly personal approach Marketing databases are essential to the marketing process.
Slide 1 Data Warehousing in CIM  2000 YourNameHere Data Warehousing in Computer Integrated Manufacturing Steve Daino IEM 5303.
Managing Data Resources File Organization and databases for business information systems.
Bartek Doruch, Managing Partner, Kamil Karbowiak, Managing Partner, Using Power BI in a Corporate.
DATA Storage and analytics with AZURE DATA LAKE
Avenues International Inc.
Organizations Are Embracing New Opportunities
Data Platform and Analytics Foundational Training
Data Platform Modernization
Data Warehouse.
Red Hat User Group June 2014 Marco Berube, Cloud Solutions Architect
SQL Server 2012 Licensing Overview.
Business Intelligence for Project Server/Online
Data Platform Modernization
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Ch 4. The Evolution of Analytic Scalability
An Introduction to Data Warehousing
Organizational Design, Competences, and Technology
Automating Profitable Growth™
JOINED AT THE HIP: DEVSECOPS AND CLOUD-BASED ASSETS
Data Warehouse.
CRM DMP – a marriage of two acronyms
OU BATTLECARD: Oracle Data Integrator
Presentation transcript:

Ch 5. The Evolution of Analytic Processes Taming The Big Data Tidal Wave 24 May 2012 SNU IDB Lab. Hyewon Kim

Outline Introduction The Analytic Sandbox Analytic Data Set (ADS) Enterprise Analytic Data Set (EADS) Scoring Routines

Enterprise Analytic Data Set (EADS) Introduction Upgrading technologies won’t provide a lot of value, if the same old analytical processes remain in place Change the process of configuring and maintaining workspace Consistently leverage a database platform through a sandbox Necessary to keep scores up to date on a daily The Analytic SandBox Enterprise Analytic Data Set (EADS) Embedded Scoring

Outline Introduction The Analytic Sandbox Analytic Data Set (ADS) Enterprise Analytic Data Set (EADS) Scoring Routines

The Analytical Sandbox (1/5) Definition A set of resources that enable analytic professionals to experiment and reshape data in whatever fashion they need to Data exploration Development of analytical processes Proof of concepts prototyping

The Analytical Sandbox (2/5) An Internal Sandbox A portion of an enterprise data warehouse or data mart is carved out to serve as the analytic sandbox Strength Leverage existing hardware resources and infrastructure already in place Ability to directly join production data with sandbox data Cost-effective since no new hardware is needed Weaknesses An additional load on the existing enterprise data warehouse or data mart Can be constrained by production policies and procedures Sandbox Analytic Views & Enterprise Analytic Data Sets Core Database Tables Enterprise Data Warehouse or Data Mart Additional Data

The Analytical Sandbox (3/5) An External Sandbox A physically separate analytic sandbox is created for testing and development of analytic processes Strength A stand-alone environment, no impact on other processes Reduce workload management Weaknesses The additional cost of the stand-alone system Some data movement Sandbox Extract Enterprise Data Warehouse or Data Mart

The Analytical Sandbox (4/5) A Hybrid Sandbox The combination of an internal sandbox and an external sandbox Strength Flexibility in the approach taken for an analysis Can be run in a ‘pseudo-production’ mode temporarily Weaknesses Maintain both an internal and external sandbox environment Two-way data feeds may be required, which adds complexity External Sandbox Internal Sandbox Extract Enterprise Data Warehouse or Data Mart

The Analytical Sandbox (5/5) Benefits From the view of an analytic professional Independence Flexibility Efficiency Freedom Speed From the view of IT Centralization Streamlining Simplicity Control Costs

Outline Introduction The Analytic Sandbox Analytic Data Set (ADS) Enterprise Analytic Data Set (EADS) Scoring Routines

Analytic Data Set (1/2) Definition The data that is pulled together in order to create an analysis or model In the format required for the specific analysis at hand Generated by transforming, aggregating, and combining data Help to bridge the gap between efficient storage and ease of use

Analytic Data Set (2/2) Two Primary kinds of Analytic Data Sets A development ADS Used to build an analytic process Have many variables or metrics within it Very wide but not very deep Production analysis data set Needed for scoring and deployment Contain only the specific metrics that were actually in the final solution Not very wide but very deep Table1 Production ADS Table2 Table3 Table4 Development Analytic Data Set Table5 Table6 Narrow & Deep Wide & Shallow Base Tables Derive, Aggregate, Combine, and Transform….

Outline Introduction The Analytic Sandbox Analytic Data Set (ADS) Enterprise Analytic Data Set (EADS) Scoring Routines

Enterprise Analytic Data Set (1/5) Traditional Analytic Data Sets All analytic data sets are created outside of the database Each analytic professional creates their own data sets independently The risk of inconsistencies The repetitious work A dedicated ADS is generated outside the database for every project

Enterprise Analytic Data Set (2/5) Enterprise Analytic Data Set A shared and reusable set of centralized, standardized analytic data sets for use in analytics A standardized view of data to support multiple analysis efforts Streamline the data preparation process Provide grate consistency, accuracy, and visibility to analytics processes Build once, use many Centralized ADS tables and views are utilized across many projects

Enterprise Analytic Data Set (3/5) Structure EADS Logical View: Customer ADS Table Customer Total Sales Total Purchases Home- owners Gender Mail Responder E-mail Opt in EADS Potential Physical View: Customer Sales Customer Demographics Customer Total Sales Total Purchases Customer Home-owner Gender Customer Sales It could very well be stored differently! Customer Mail Responder E-mail Opt in For updating an EADS

Enterprise Analytic Data Set (4/5) Summary Table or View? Summary tables that are updated via a scheduled process Benefits Compute once, use many Most advanced analytics efforts involve a heavy use of historical data Very low latency in getting data Downsides Not be fully up-to-date with the latest data Use disk space on the system, potentially a whole lot of it

Enterprise Analytic Data Set (5/5) Summary Table or View? A series of views that are run on demand Benefits be completely fresh and updated Good performance in real-time analysis Changes are immediately available Consistency and transparency of the computations Downsides The system load won’t necessarily be reduced that much Have to wait longer to get their data back

Outline Introduction The Analytic Sandbox Analytic Data Set (ADS) Enterprise Analytic Data Set (EADS) Scoring Routines

Scoring Routines (1/2) Embedded Scoring Score Something generated from a predictive model, or any other type of output from analytic process Embedded Scoring Deploying each individual scoring routine A process to manage and track the various scoring routines Benefits Scores run in batches will be available on demand Real-time scoring Abstract complexity from users Have all the models contained in a centralized repository so they are all in one place

Embedded Scoring (2/2) Model and Score Management Model and score management procedures will need to be in place to scale the use of models by an organization Analytic Data Set Inputs Model Definitions Model Validation & Reporting Model Scoring Outputs