Data Management: Warehousing, Analyzing, Mining & Visualization

Slides:



Advertisements
Similar presentations
10-1 Data and Knowledge Management 10-2 Data Management: A Critical Success Factor The difficulties and the process Data sources and collection Data.
Advertisements

Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
1 CHAPTER 4 Data Warehousing, Access, Analysis, Mining, and Visualization.
Sistem (Pengantar) Penunjang Keputusan Kecerdasan Bisnis (KB) 1/20 KECERDASAN BISNIS (KB) Sifat dan sumber data Pengumpulan data, masalah dan kualitas.
Chapter 3 Database Management
Copyright 2007 John Wiley & Sons, Inc. Chapter 41 Data and Knowledge Management.
The Hierarchy of Data Bit (a binary digit): a circuit that is either on or off Byte: 8 bits Character: each byte represents a character; the basic building.
Business Intelligence in Detail What is a Data Warehouse?
Database – Part 2b Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Sakthi Angappamudali at Standard Insurance; BI.
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
McGraw-Hill/Irwin Copyright © 2008, The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin Copyright © 2008 The McGraw-Hill Companies, Inc.
1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization.
Chapter 14 The Second Component: The Database.
13 Chapter 13 The Data Warehouse Hachim Haddouti.
Data and Knowledge Management
Chapter 13 The Data Warehouse
1 Data and Knowledge Management. 2 Data Management: A Critical Success Factor The difficulties and the process Data sources and collection Data quality.
Data Management: Warehousing, Analyzing, Mining, and Visualization
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
CHAPTER 4 Data Warehousing, Access, Analysis, Mining, and Visualization.
Lecture-8/ T. Nouf Almujally
1 Chapter 4 Data Management: Warehousing, Access and Visualization MSS foundation New concepts Object-oriented databases Intelligent databases Data warehouse.
What is Business Intelligence? Business intelligence (BI) –Range of applications, practices, and technologies for the extraction, translation, integration,
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Chapter 31 Information Technology For Management 6 th Edition Turban, Leidner, McLean, Wetherbe Lecture Slides by L. Beaubien, Providence College John.
CSI315CSI315 Web Development Technologies Continued.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Introduction to Information Technology Turban, Rainer and Potter
Data Warehouse & Data Mining
Introduction to Information Technology Turban, Rainer and Potter
Data Warehousing, Access, Analysis, Mining, and Visualization
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
1 CHAPTER 4 Data Warehousing, Access, Analysis, Mining, and Visualization.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Case 2: Emerson and Sanofi Data stewards seek data conformity
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 1 Foundations of Information Systems in Business.
1 CHAPTER 4 Data Management. 2 Data Warehousing, Access, Analysis, Mining, and Visualization n MSS foundation n Many new concepts n Object-oriented databases.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
DATA RESOURCE MANAGEMENT
CHAPTER 4 Data Warehousing, Access, Analysis, Mining, and Visualization 2 1.
Pertemuan 16 Materi : Buku Wajib & Sumber Materi :
Foundations of Information Systems in Business
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
Introduction to Business Analytics
Data Resource Management Chapter 5 McGraw-Hill/IrwinCopyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved.
INTRODUCTION TO INFORMATION SYSTEMS LECTURE 9: DATABASE FEATURES, FUNCTIONS AND ARCHITECTURES PART (2) أ/ غدير عاشور 1.
McGraw-Hill/Irwin ©2008,The McGraw-Hill Companies, All Rights Reserved Chapter 5 Data Resource Management.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Popular Database Management Systems
Pengantar Sistem Informasi
Discovering Computers 2010: Living in a Digital World Chapter 14
Advanced Applied IT for Business 2
Foundations of Information Systems in Business
Chapter 13 The Data Warehouse
Chapter 5 Data Management
Data Warehousing, Access, Analysis, Mining, and Visualization
Chapter 1 Database Systems
C.U.SHAH COLLEGE OF ENG. & TECH.
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Chapter 1 Database Systems
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Presentation transcript:

Data Management: Warehousing, Analyzing, Mining & Visualization Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Difficulties of Managing Data The amount of data increases exponentially. Data are scattered throughout organizations and are collected by many individuals using several methods and devices. Only small portions of an organization’s data are relevant for any specific decision. An ever-increasing amount of external data needs to be considered in making organizational decisions. Data are frequently stored in several servers and locations in an organization.

Difficulties, cont’d. Raw data may be stored in different computing systems, databases, formats, and human and computer languages. Legal requirements relating to data differ among countries and change frequently. Selecting data management tools can be a major problem because of the huge number of products available. Data security, quality, and integrity are critical yet are easily jeopardized.

Data Sources and Collection Internal Data: An organization’s internal data are about people, products, services, and processes. Personal Data: IS users or other corporate employees may document their own expertise by creating personal data. External Data: There are many sources for external data, ranging from commercial databases to sensors and satellites. The Internet & Commercial Database Services: Some external data flow to an organization through electronic data interchange (EDI), through other company-to-company channels, or the Internet.

Data Quality Data Quality (DQ) is an extremely important issue since quality determines the data’s usefulness as well as the quality of the decisions based on the data

Data Quality Problems (Strong, et al., 1997) Intrinsic DQ: Accuracy, objectivity, believability, and reputation. Accessibility DQ: Accessibility and access security. Contextual DQ: Relevancy, value added, timeliness, completeness, amount of data. Representation DQ: Interpretability, ease of understanding, concise representation, consistent representation.

Object-Oriented Databases Last time we discussed hierarchical, network, and relational databases An object-oriented database is a part of the object-oriented paradigm, which also includes object-oriented programming, operating systems, and modeling. Object-oriented databases are sometimes referred to as multimedia databases and are managed by special multimedia database management systems

Document Management Document Management is the automated control of electronic documents, page images, spreadsheets, word processing documents, and complex, compound documents through their entire life cycle within an organization, from initial creation to final archiving. Benefits of Document Management: Greater control over production, storage, and distribution of documents Greater efficiency in the reuse of information Control of a document through a workflow process Reduction of product cycle times

Data Processing Data processing in organizations can be viewed as either transactional or analytical Transactional The data in TPS are organized mainly in a hierarchical structure and are centrally processed. Databases and processing systems are known as operational systems. Analytical Analytical processing involves analysis of accumulated data, mainly by end-users. Includes DSS, EIS, Web applications, and other end-user activities.

Delivery Systems A good data delivery system should be able to support: Easy data access by the end-users. A quick decision-making process. Accurate and effective decision making. Flexible decision making.

Data Warehouses The purpose of a data warehouse is to establish a data repository that makes operational data accessible in a form readily acceptable for analytical processing activities (e.g. decision support, EIS) Data warehouses include a companion called metadata, meaning data about data.

Benefits of Data Warehousing The ability to reach data quickly, as they are located in one place. The ability to reach data easily, frequently by end-users themselves, using Web browsers.

Characteristics of Data Warehousing Organization: Data are organized by detailed subjects. Consistency: Data in different operational databases may be encoded differently. In the warehouse they will be coded in a consistent manner. Time variant: The data are kept for 5 to 10 years so they can be used for trends, forecasting, and comparisons over time. Non-volatile: Once entered into the warehouse, data are not updated. Relational: The data warehouse uses a relational structure. Client/Server: The data warehouse uses the client/server to provide the end user an easy access to its data.

Data Warehouse Framework

Data Warehouse Suitability Data warehousing is most appropriate for organizations in which some of the following apply: Large amounts of data need to be accessed by end-users. The operational data are stored in different systems. An information-based approach to management is in use. There is a large, diverse customer base. The same data are represented differently in different systems. Data are stored in highly technical formats that are difficult to decipher. Extensive end-user computing is performed.

Data Mart An alternative to data warehousing used by many smaller firms is the creation of a lower cost, scaled-down version of a data warehouse, called a data mart. A data mart refers to a small warehouse designed for a strategic business unit (SBU) or a department.

Data Mart Types Replicated (dependent) Data Marts: Sometimes it is easier to work with a subset of the data warehouse. In such cases one can replicate functional subsets of the data warehouse in smaller databases. Stand-Alone Data Marts: A company can have one or more independent data marts without having a data warehouse.

Knowledge Discovery in Databases (KDD) KDD is the process of extracting useful knowledge from volumes of data. It is the subject of extensive research. KDD’s objective is to identify valid, novel, potentially useful, and ultimately understandable patterns in data. KDD is useful because it is supported by three technologies that are now sufficiently mature: Massive data collection Powerful multiprocessor computers Data mining algorithms

Tools and Techniques of KDD Ad-hoc queries allow users to request in real time information from the computer that is not available in the periodic reports. Online analytical processing (OLAP) refers to such end-user activities as DSS modeling using spreadsheets and graphics, which are done online. Ready-made Web-based Analysis. Many vendors provide ready made analytical tools, mostly in finance, marketing, and operations.

Data Mining Data mining derives its name from the similarities between searching for valuable business information in a large database, and mining a mountain for valuable ore. Data mining technology can generate new business opportunities by providing these capabilities: Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in large databases. Automated discovery of previously unknown patterns. Data mining tools identify previously hidden patterns in one step.

Applications of Data Mining Retailing & Sales Banking Manufacturing & Production Brokerage & Securities trading Computer hardware & software Insurance Police work Government & Defense Airlines Health care Broadcasting Marketing

Text Mining Text mining is the application of data mining to non-structured or less structured text files. Text mining helps organizations to do the following: Find the 'hidden' content of documents, including additional useful relationships. Group documents by common themes.

Web Mining Web Mining refers to mining tools used to analyze a large amount of data on the Web, such as what customers are doing on the Web—that is, to analyze clickstream data.

Data Visualization Data visualization refers to the presentation of data by technologies such as digital images, geographical information systems, graphical user interfaces, multidimensional tables and graphs, virtual reality, three-dimensional presentations, and animation.

Multidimensionality Modern data and information may have several dimensions. e.g. Management may be interested in examining sales figures in a certain city by product, by time period, by salesperson, and by store. It is important to provide the user with a technology that allows her to add, replace, or change dimensions quickly and easily in a table and/or graphical presentation. The technology of slicing, dicing, and similar manipulations of data is called Multidimensionality.

Multidimensionality, cont’d. 3 factors are considered in multidimensionality: Dimensions Measures Time

Examples of Dimensions Products Salespeople Market segments Business units Geographical locations Distribution channels Countries Industries

Examples of Measures Money Sales volume Head count Inventory Profit Actual vs. forecasted results

Examples of Time Daily Weekly Monthly Quarterly Yearly

Advantages of Multidimensionality Data can be presented and navigated with relative ease. Multidimensional databases are easier to maintain. *Multidimensional databases are significantly faster than relational databases as a result of the additional dimensions and the anticipation of how the data will be accessed by users.*

Geographical Information Systems (GIS) A geographical information system (GIS) is a computer-based system for capturing, storing, checking, integrating, manipulating, and displaying data using digitized maps. Every record or digital object has an identified geographical location.

Example of GIS in Action Banks are using GIS for plotting the following: Branch and ATM locations Customer demographics Volume and traffic patterns of business activities Geographical area served by each branch Market potential for banking activities Strengths and weaknesses against the competition Branch performance

GIS, cont’d. GIS Software varies in its capabilities, from simple computerized mapping systems to enterprise wide tools for decision support data analysis. GIS Data are available from a wide variety of sources. Government sources (via the Internet and CD-ROM) provide some data, while vendors provide diversified commercial data as well. GIS & Decision Making: The graphical format of makes it easy for managers to visualize the data & make decisions. GIS and the Internet or intranet. Most major GIS software vendors are providing Web access, such as embedded browsers, or a Web/Internet/intranet server that hooks directly into their software.

Visual Interactive Modeling (VIM) Visual interactive modeling (VIM) uses computer graphic displays to represent the impact of different management decisions on goals such as profit or market share. A VIM can be used both for supporting decisions & training. It can represent a static or a dynamic system.

Visual Interactive Simulation (VIS) Visual interactive simulation (VIS) is one of the most developed areas in VIM. It is a decision simulation in which the end-user watches the progress of the simulation model in an animated form using graphics terminals.

Virtual Reality (VR) Virtual reality (VR) is interactive, computer-generated, three-dimensional graphics. VR applications to date have been used to support decision making indirectly. Boeing has developed a virtual aircraft mock-up to test designs. At Volvo, VR is used to test virtual cars in virtual accidents. Data visualization helps financial decision makers by using visual, spatial & aural immersion virtual systems. Some stock brokerages have a VR application in which users surf over a landscape of stock futures, with color, hue, and intensity.

Data Mining and Warehousing Implementation Examples Alamo Rent-a-Car discovered that German tourists liked bigger cars. So now, when Alamo advertises its rental business in Germany, the ads include information about its larger models. Au Bon Pain Company discovered that they were not selling as much cream cheese as planned. When they analyzed point-of-sale data, they found that customers preferred small, one-serving packaging. AT&T and MCI sift through terabytes of customer phone data to fine-tune marketing campaigns and determine new discount calling plans.

CASE: Data Mining Powers Wal-Mart (p. 510) An interesting case study exploring how Wal-Mart uses data warehousing and data mining to get the right product on the appropriate shelf at the lowest cost

Web-Based Data Management Systems Business intelligence activities – from data acquisition, through warehousing, to mining – can be performed with Web tools or are interrelated with Web technologies and e-Commerce. e-Commerce software vendors are providing Web tools that connect the data warehouse with EC ordering and cataloging systems. e.g. Tradelink, a product of Hitachi Data warehousing and decision support vendors are connecting their products with Web technologies and EC. e.g. Comshare’s DecisionWeb, Web Intelligence from Business Objects, and Cognos’s DataMerchant.

Managerial Issues Cost–benefit issues & justification. A cost–benefit analysis must be undertaken before any commitment to new technologies. Where to store data physically. Should data be distributed close to their sources? Or should data be centralized for easier control. Legal issues. Data mining gives raise to a variety of legal issues. The legacy data problem. What should be done with masses of information already stored in a variety of formats, often known as the legacy data acquisition problem?

Managerial Issues, cont’d. Disaster recovery. How well can an organization’s business processes recover after an information system disaster? Internal or external? Should a firm store & maintain its databases internally or externally? Data security and ethics. Are the company’s competitive data safe from external snooping or sabotage? Ethics. Should people have to pay for use of online data?

Managerial Issues, cont’d. Privacy. Collecting data in a warehouse and conducting data mining may result in the invasion of individual privacy. Data purging. When is it beneficial to “clean house” and purge information systems of obsolete or non–cost-effective data? Data delivery. A problem regarding how to move data efficiently around an enterprise also exists.