Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to Data Warehousing Concept and Technology Mort Anvari.

Similar presentations


Presentation on theme: "An Introduction to Data Warehousing Concept and Technology Mort Anvari."— Presentation transcript:

1 An Introduction to Data Warehousing Concept and Technology Mort Anvari

2 Data Warehousing Concept Data Access Technology Enterprise Real-Time Knowledge Architecture for Data Warehousing Data Collection and Delivery Data Warehousing Concept Data Access Technology Enterprise Real-Time Knowledge Architecture for Data Warehousing Data Collection and Delivery Topics

3 M. Anvari Page 3 Benson & Parker’s “Square Wheel” BusinessEnvironment TechnologyEnvironment BusinessPlanning BusinessOperations

4 M. Anvari Page 4 Benson & Parker’s “Square Wheel” BusinessEnvironment TechnologyEnvironment BusinessPlanning BusinessOperations TechnologyPlanning TechnologyOperations

5 M. Anvari Page 5 Benson & Parker’s “Square Wheel” BusinessEnvironment TechnologyEnvironment BusinessPlanning BusinessOperations TechnologyPlanning TechnologyOperations Alignment Impact Organization Opportunity

6 M. Anvari Page 6 Benson & Parker’s “Square Wheel” BusinessEnvironment TechnologyEnvironment BusinessPlanning BusinessOperations TechnologyPlanning TechnologyOperations Alignment Impact Organization Opportunity Information Technology has to do more than just align itself with the business, it has to help the business have the maximum impact in the marketplace.

7 Data Access Data Access and Delivery System

8 M. Anvari Page 8 Technology Evolution New classes of computers New classes of computers New classes of communications New classes of communications New classes of technology (image, sound, video, multimedia) New classes of technology (image, sound, video, multimedia) New classes of software New classes of software Much more complex technical environment Much more complex technical environment Cooperative Processing/Client-Server Cooperative Processing/Client-Server Distributed Data Bases Distributed Data Bases LANs, WANs, etc. LANs, WANs, etc. Obsolescence Problem Multiple Legacy Systems New classes of computers New classes of computers New classes of communications New classes of communications New classes of technology (image, sound, video, multimedia) New classes of technology (image, sound, video, multimedia) New classes of software New classes of software Much more complex technical environment Much more complex technical environment Cooperative Processing/Client-Server Cooperative Processing/Client-Server Distributed Data Bases Distributed Data Bases LANs, WANs, etc. LANs, WANs, etc. Obsolescence Problem Multiple Legacy Systems

9 M. Anvari Page 9 IT Impact on Business HP IBM DEC Compaq Enterprise Network Computing and Client/Server Technology are changing the way organizations look at all of their information systems Data Jail Obsolescence IT Wastes

10 M. Anvari Page 10 The Existing Enterprise Support Existing Products Support Existing Products Support Existing Customers Support Existing Customers Support Existing Organization Support Existing Organization Support Existing Workforce Support Existing Workforce Support Existing Technology Support Existing Technology Support Existing Products Support Existing Products Support Existing Customers Support Existing Customers Support Existing Organization Support Existing Organization Support Existing Workforce Support Existing Workforce Support Existing Technology Support Existing Technology

11 M. Anvari Page 11 Controlling the (Global) Real-time Organization RTO = 24 x 7 x E (Where E means every major market)

12 M. Anvari Page 12 Information and the Enterprise Organizational needs for data Organizational needs for information Organizational needs for knowledge Organizational needs for data Organizational needs for information Organizational needs for knowledge

13 M. Anvari Page 13 Information and the Enterprise organization could only access something like 1% of all the data on their data base An Insurance IS Executive estimated that his organization could only access something like 1% of all the data on their data base the amount of data doubles every 5 years,... and they can only use about 5% of it! A Bell Labs report has indicated that the amount of data doubles every 5 years,... and they can only use about 5% of it! Data Warehousing is Data Delivery System Data Warehousing is Data Delivery System organization could only access something like 1% of all the data on their data base An Insurance IS Executive estimated that his organization could only access something like 1% of all the data on their data base the amount of data doubles every 5 years,... and they can only use about 5% of it! A Bell Labs report has indicated that the amount of data doubles every 5 years,... and they can only use about 5% of it! Data Warehousing is Data Delivery System Data Warehousing is Data Delivery System

14 M. Anvari Page 14 Needs for Data Data = Values (Measurements) Data to operate Data to operate Data to control Data to control Data to plan Data to plan Data = Values (Measurements) Data to operate Data to operate Data to control Data to control Data to plan Data to plan

15 M. Anvari Page 15 Needs for Information Information = Content + Structure (Relationships) Structure of the Real-world Structure of the Real-world Relating data to the business Relating data to the business Cross functional processes Cross functional processes Relating data to the real world Relating data to the real world External DB External DB External Data Feeds (D&B, Reuters, etc.) External Data Feeds (D&B, Reuters, etc.) Text, Image, Voice, Video, etc. Text, Image, Voice, Video, etc. Statistical Studies Statistical Studies Information = Content + Structure (Relationships) Structure of the Real-world Structure of the Real-world Relating data to the business Relating data to the business Cross functional processes Cross functional processes Relating data to the real world Relating data to the real world External DB External DB External Data Feeds (D&B, Reuters, etc.) External Data Feeds (D&B, Reuters, etc.) Text, Image, Voice, Video, etc. Text, Image, Voice, Video, etc. Statistical Studies Statistical Studies

16 M. Anvari Page 16 Needs for Knowledge Knowledge = Goals + Actions + Learning Learning more about our business Learning more about our business Learning more about our market Learning more about our market Learning more about the business environment Learning more about the business environment Knowledge is the area in which Data Warehousing and Data Mining are potentially critical technologies Knowledge = Goals + Actions + Learning Learning more about our business Learning more about our business Learning more about our market Learning more about our market Learning more about the business environment Learning more about the business environment Knowledge is the area in which Data Warehousing and Data Mining are potentially critical technologies

17 M. Anvari Page 17 Data, Information and Knowledge Data Centers Data Centers Information Centers Information Centers Knowledge Centers Knowledge Centers Data Centers Data Centers Information Centers Information Centers Knowledge Centers Knowledge Centers Data Bases Data Bases Information Bases Information Bases Knowledge Bases Knowledge Bases

18 M. Anvari Page 18 Old Data Never Dies Note that none of the early computing styles have ever gone away!!! Batch On-line Minis PCs Networking Enterprise Computing (Peer to Peer, Network to Network) 60s70s80s90s

19 M. Anvari Page 19 Operational vs. Informational Systems Information Access Today

20 M. Anvari Page 20 Operational vs. Informational Systems Information Access Today OperationalSystems Mafg. Ord.Entry

21 M. Anvari Page 21 Operational vs. Informational Systems Information Access Today OperationalSystems InformationalSystems

22 M. Anvari Page 22 Operational vs. Informational Systems Information Access Today OperationalSystems InformationalSystems Estimating & Analysis & Analysis MarketingSystemsProductPlanning

23 M. Anvari Page 23 Operational vs. Informational Systems Information Access Today OperationalSystems InformationalSystems Information Delivery System

24 M. Anvari Page 24 Operational vs. Informational Systems Information Access Today OperationalSystems InformationalSystems Information Delivery System Data Warehousing is fundamentally an issue of Enterprise Data Architecture

25 M. Anvari Page 25 Operational vs. Informational Systems OperationalSystems InformationalSystems Information Delivery System

26 M. Anvari Page 26 Operational vs. Informational Systems OperationalSystems InformationalSystems Information Delivery System DataWarehouse

27 M. Anvari Page 27 Operational vs. Informational Systems OperationalSystems Information Delivery System DataWarehouse InformationalSystems DataMarts

28 M. Anvari Page 28 Operational vs. Informational Systems OperationalSystems Information Delivery System InformationalSystems Data Warehouse External Data DataGarages

29 M. Anvari Page 29 Operational vs. Informational Systems OperationalSystems Information Delivery System InformationalSystems Data Warehouse External Data ExternalUsers

30 M. Anvari Page 30 End User Evolution Data Base Management Systems users Data Base Management Systems users Ad Hoc Reports users Ad Hoc Reports users Today’s Customer Demands Automated Real-Time Response. Today’s Customer Demands Automated Real-Time Response. End User Systems End User Systems Decision Support Systems Decision Support Systems Executive Information Systems Executive Information Systems Information Centers Information Centers Data Base Management Systems users Data Base Management Systems users Ad Hoc Reports users Ad Hoc Reports users Today’s Customer Demands Automated Real-Time Response. Today’s Customer Demands Automated Real-Time Response. End User Systems End User Systems Decision Support Systems Decision Support Systems Executive Information Systems Executive Information Systems Information Centers Information Centers

31 M. Anvari Page 31 Ways to Organize Data TablesFlexible, Simple TablesFlexible, Simple HierarchiesSpeed, Natural Reporting HierarchiesSpeed, Natural Reporting NetworksMultiple Directions, Complex Structure NetworksMultiple Directions, Complex Structure ListsUpdating Complex Structure ListsUpdating Complex Structure Matrices / Array Manipulate Multiple Dimensions Matrices / Array Manipulate Multiple Dimensions Inverted FilesUnplanned queries, text retrieval Inverted FilesUnplanned queries, text retrieval ObjectsComplex structures, hide structure ObjectsComplex structures, hide structure Multidimensional Data Bases (Data Warehousing) Multidimensional Data Bases (Data Warehousing) TablesFlexible, Simple TablesFlexible, Simple HierarchiesSpeed, Natural Reporting HierarchiesSpeed, Natural Reporting NetworksMultiple Directions, Complex Structure NetworksMultiple Directions, Complex Structure ListsUpdating Complex Structure ListsUpdating Complex Structure Matrices / Array Manipulate Multiple Dimensions Matrices / Array Manipulate Multiple Dimensions Inverted FilesUnplanned queries, text retrieval Inverted FilesUnplanned queries, text retrieval ObjectsComplex structures, hide structure ObjectsComplex structures, hide structure Multidimensional Data Bases (Data Warehousing) Multidimensional Data Bases (Data Warehousing)

32 M. Anvari Page 32 End User Computing Evolution

33 M. Anvari Page 33 Data Warehousing Data Warehouse can be thought of as an automated version of the Information Center that was widely popular in the mid-1980s or even ultimately as the automation of Information Resource Management. And while technologies such as client-server have begun to put enormous computing and graphics power in the hands of individuals, however, these technologies have not, in general, provided the link to the operational data that end users need to make critical business decisions.

34 M. Anvari Page 34 Data Warehouse Requirements Support for Universal Access to Multi-platform Data Bases Support for Multiple User Types Separation of Operational and Informational Concerns Support for Networked Data Support for Directories, Repositories and Information Models, Support for Advanced End User Interfaces Support for Universal Access to Multi-platform Data Bases Support for Multiple User Types Separation of Operational and Informational Concerns Support for Networked Data Support for Directories, Repositories and Information Models, Support for Advanced End User Interfaces

35 M. Anvari Page 35 Access to Heterogeneous Data HP IBM DEC Compaq

36 M. Anvari Page 36 Multiple User Types Multiple User Types (Knowledge workers) Top Executives Top Executives Managers Managers Analysts Analysts Planners Planners Product Developers Product Developers Consultants Consultants Lawyers Lawyers etc. etc. Top Executives Top Executives Managers Managers Analysts Analysts Planners Planners Product Developers Product Developers Consultants Consultants Lawyers Lawyers etc. etc.

37 M. Anvari Page 37 Separation of Operational and Informational Concerns Operational Systems Operational Systems Response Time Response Time Reliability Reliability Security Security Recoverability Recoverability Informational Systems Informational Systems Flexibility, Performance, Ease of Navigation Flexibility, Performance, Ease of Navigation Large numbers of different views Large numbers of different views Manage Huge Amounts of Data (VLDBs) Manage Huge Amounts of Data (VLDBs) Need to drill down/drill thru into data Need to drill down/drill thru into data Need to draw on data from many sources Need to draw on data from many sources Operational Systems Operational Systems Response Time Response Time Reliability Reliability Security Security Recoverability Recoverability Informational Systems Informational Systems Flexibility, Performance, Ease of Navigation Flexibility, Performance, Ease of Navigation Large numbers of different views Large numbers of different views Manage Huge Amounts of Data (VLDBs) Manage Huge Amounts of Data (VLDBs) Need to drill down/drill thru into data Need to drill down/drill thru into data Need to draw on data from many sources Need to draw on data from many sources

38 M. Anvari Page 38 Support for Networked Data All the data that is required to support informational needs is often not on the same operational data base. The need for Labor Negotiations, for example, may come from a variety of operational data bases, such as Manufacturing, Personnel, and Accounting. Distributed Systems All the data that is required to support informational needs is often not on the same operational data base. The need for Labor Negotiations, for example, may come from a variety of operational data bases, such as Manufacturing, Personnel, and Accounting. Distributed Systems

39 M. Anvari Page 39 Support for Advanced End User Interfaces

40 M. Anvari Page 40 Dimensions of Data Warehousing Performance Flexibility Scalability Ease of Use Quality Connection to the Operational Data Distributed Data Security

41 M. Anvari Page 41 Enterprise Knowledge Architecture for Data Warehousing

42 M. Anvari Page 42 Operational vs. Informational Systems OperationalSystems InformationalSystems Information Delivery System

43 M. Anvari Page 43 Operational vs. Informational Systems

44 M. Anvari Page 44 Enterprise Network Computer Architecture DataMart

45 M. Anvari Page 45 Freeing the “Data in Jail”

46 M. Anvari Page 46 The Information Access Layer

47 M. Anvari Page 47 The Legacy Data Layer

48 M. Anvari Page 48 The External Data Layer

49 M. Anvari Page 49 The Data Access Layer

50 M. Anvari Page 50 The Data Access Layer Data Access Filter

51 M. Anvari Page 51 The Data Access Layer SQL Queries

52 M. Anvari Page 52 The Data Access Layer SQL Queries SQL Answers

53 M. Anvari Page 53 Application Messaging

54 M. Anvari Page 54 The Meta-Data Repository Layer

55 M. Anvari Page 55 The Process Management Layer

56 M. Anvari Page 56 The Core Data Warehouse

57 M. Anvari Page 57 Data Staging and Quality

58 M. Anvari Page 58 Data Mart (Post-process/Indexing) Post- Proc.& Indexing

59 M. Anvari Page 59 Goals of Warehouse 1. Performance (Canned queries, MD Analysis, Ad hoc, Impact on Operational System) 2. Flexibility (MD Flex, Ad hoc, Change data structure) 3. Scalability (No. of Users, Volume of Data) 4. Ease of Use (Location, Formulation, Navigation, Manipulation) 5. Data Quality (Consistent, Correct, Timely, Integrated) 6. Connection to the Detail Business Transactions 1. Performance (Canned queries, MD Analysis, Ad hoc, Impact on Operational System) 2. Flexibility (MD Flex, Ad hoc, Change data structure) 3. Scalability (No. of Users, Volume of Data) 4. Ease of Use (Location, Formulation, Navigation, Manipulation) 5. Data Quality (Consistent, Correct, Timely, Integrated) 6. Connection to the Detail Business Transactions

60 M. Anvari Page 60 Virtual Warehouse

61 M. Anvari Page 61 Virtual Warehouse

62 M. Anvari Page 62 Virtual Warehouse A Virtual Data Warehouse approach is often chosen when there are infrequent demands for data and management wants to determine if/how users will use operational data.

63 M. Anvari Page 63 Virtual Warehouse One of the weaknesses of a Virtual Data Warehouse approach is that user queries are made against operational DBs. One way to minimize this problem is to build a “Query Monitor” to check the performance characteristics of a query before executing it.

64 M. Anvari Page 64 Distributed Data Warehouse

65 M. Anvari Page 65 Distributed Data Warehouse A Distributed Data Warehouse is similar in most respects to a Central Data Warehouse, except that the data is distributed to separate mini-Data Warehouses (Data Marts ) on local or specialized servers

66 M. Anvari Page 66 Information Access Tools Desktop DBs Desktop DBs Spreadsheets Spreadsheets 4GL/Desktop Query Tools 4GL/Desktop Query Tools Decision Support Systems (DSS) Decision Support Systems (DSS) Multi-dimensional DBs (MDDs) Multi-dimensional DBs (MDDs) OLAP (On-line Analytical Processing OLAP (On-line Analytical Processing Executive Information Systems (EIS) Executive Information Systems (EIS) Data Visualization Tools Data Visualization Tools Data Mining Tools Data Mining Tools Business Modeling and Simulation Tools Business Modeling and Simulation Tools Desktop DBs Desktop DBs Spreadsheets Spreadsheets 4GL/Desktop Query Tools 4GL/Desktop Query Tools Decision Support Systems (DSS) Decision Support Systems (DSS) Multi-dimensional DBs (MDDs) Multi-dimensional DBs (MDDs) OLAP (On-line Analytical Processing OLAP (On-line Analytical Processing Executive Information Systems (EIS) Executive Information Systems (EIS) Data Visualization Tools Data Visualization Tools Data Mining Tools Data Mining Tools Business Modeling and Simulation Tools Business Modeling and Simulation Tools

67 M. Anvari Page 67 Data Warehousing Tools and Technology Desktop Data Bases: Structured for Database ManipulationStructured for Database Manipulation Provides facility for selecting, and loading of Desktop DBs from Informational DBsProvides facility for selecting, and loading of Desktop DBs from Informational DBs Provides ability to Create Highly “Personalized” Informational SystemsProvides ability to Create Highly “Personalized” Informational SystemsExamples AccessAccess ParadoxParadox dBase/FoxPro/ClipperdBase/FoxPro/Clipper

68 M. Anvari Page 68 Enterprise Network Computer Architecture Spreadsheets: Structured to get any subset of InformationStructured to get any subset of Information Ability to Interface with standard Spreadsheet tools (Ability to Interface with standard Spreadsheet tools (Examples Excel Excel 1-2-3 1-2-3 Quatro Pro Quatro Pro

69 M. Anvari Page 69 Enterprise Network Computer Architecture Ad Hoc Query Systems: Tailored for Flexible ReportingTailored for Flexible Reporting Ability to do Sophisticated Analysis FunctionsAbility to do Sophisticated Analysis Functions Aimed a a variety of users from casual to the power userAimed a a variety of users from casual to the power userExamples Focus for Windows (IBI)Focus for Windows (IBI) SASSAS Business ObjectsBusiness Objects GQL (Anadyne)GQL (Anadyne) Esperant (Software AG)Esperant (Software AG) Forrest & Trees (Platinum)Forrest & Trees (Platinum) Visualizer (IBM)Visualizer (IBM) Impromptu (Cognos)Impromptu (Cognos) Beacon (Prodea)Beacon (Prodea)

70 M. Anvari Page 70 Enterprise Network Computer Architecture Multi-dimensional Databases (MDDB) OLAP (On-line analytical processing): Highly Structured DataHighly Structured Data Tailored for Financial ModelingTailored for Financial Modeling Tailored for “Power Users”Tailored for “Power Users” Ability to do Sophisticated Financial “What-if” AnalysisAbility to do Sophisticated Financial “What-if” Analysis Ability to “drill-down” from high-level to Detail DataAbility to “drill-down” from high-level to Detail DataExamples Acumate (Kenan Tech.) Acumate (Kenan Tech.) Beacon (Prodea) Beacon (Prodea) CrossTarget (Dimensional Insight) CrossTarget (Dimensional Insight) eSSbase (Arbor) eSSbase (Arbor) Oracle Express (Oracle) Oracle Express (Oracle)

71 M. Anvari Page 71 Enterprise Network Computer Architecture Executive Information Systems (EIS): Highly Structured DataHighly Structured Data Tailored for Non-technical UsersTailored for Non-technical Users Ability to “slice and dice” dataAbility to “slice and dice” data Ability to “drill-down”Ability to “drill-down” Examples Examples Commander OLAP Server Commander OLAP Server Pilot (Lightship) Pilot (Lightship) VB VB Powerbuilder Powerbuilder

72 M. Anvari Page 72 Enterprise Network Computer Architecture Data Visualization: Automatic Categorization Automatic Categorization Visualization of Multi-dimensional data Visualization of Multi-dimensional data Automatic Analysis and/or Indexing Automatic Analysis and/or Indexing Examples Examples WinViz (IBI) WinViz (IBI) dbExpress (Computer Concepts) dbExpress (Computer Concepts) Data Explorer (IBM) Data Explorer (IBM) ARC Info/ARC View ARC Info/ARC View Strategic Mapping Strategic Mapping

73 M. Anvari Page 73 Enterprise Network Computer Architecture Data Mining: High Speed Analysis of Detail DataHigh Speed Analysis of Detail Data Constructs Business PatternsConstructs Business Patterns Provides Statistical SupportProvides Statistical Support Examples Examples IBM beta-test IBM beta-test Information Harvester Information Harvester IDIS IDIS d.b.Express d.b.Express DataMind DataMind

74 M. Anvari Page 74 Enterprise Network Computer Architecture Business Modeling and Simulation: Business Feedback ModelBusiness Feedback Model Direct ManipulationDirect Manipulation Business GamingBusiness Gaming Management/Operations TrainingManagement/Operations Training Examples Examples SimRefinery SimRefinery SimTelephone SimTelephone iThink iThink Microworlds Microworlds

75 M. Anvari Page 75 3. Meta-data Repository Layer Data Dictionary/ Repository Meta-data Modeling Meta-data Modeling Meta-data Updating Meta-data Updating Meta-data Meta-data Examples Examples o Platinum o Platinum o Rochade o Rochade o MSP o MSP o Data Atlas (IBM) o Data Atlas (IBM) o MS/TI o MS/TI

76 M. Anvari Page 76 3. Process (Systems) Management Process Management Scheduling Scheduling Execution Execution Subscription Subscription Examples Examples o Data Harvester o Data Harvester o Data Hub o Data Hub o Detect and Alert o Detect and Alert(Comshare)

77 M. Anvari Page 77 3. Post-processing/Indexing Layer Post-processing/IndexingExamples Sybase IQ AcceleratorSybase IQ Accelerator OMNIdexOMNIdex Oracle 7.3Oracle 7.3 eSSbaseeSSbase IRI ExpressIRI Express


Download ppt "An Introduction to Data Warehousing Concept and Technology Mort Anvari."

Similar presentations


Ads by Google