Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Management: Warehousing, Analyzing, Mining & Visualization

Similar presentations


Presentation on theme: "Data Management: Warehousing, Analyzing, Mining & Visualization"— Presentation transcript:

1 Data Management: Warehousing, Analyzing, Mining & Visualization
Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

2 Difficulties of Managing Data
The amount of data increases exponentially. Data are scattered throughout organizations and are collected by many individuals using several methods and devices. Only small portions of an organization’s data are relevant for any specific decision. An ever-increasing amount of external data needs to be considered in making organizational decisions. Data are frequently stored in several servers and locations in an organization.

3 Difficulties, cont’d. Raw data may be stored in different computing systems, databases, formats, and human and computer languages. Legal requirements relating to data differ among countries and change frequently. Selecting data management tools can be a major problem because of the huge number of products available. Data security, quality, and integrity are critical yet are easily jeopardized.

4 Data Sources and Collection
Internal Data: An organization’s internal data are about people, products, services, and processes. Personal Data: IS users or other corporate employees may document their own expertise by creating personal data. External Data: There are many sources for external data, ranging from commercial databases to sensors and satellites. The Internet & Commercial Database Services: Some external data flow to an organization through electronic data interchange (EDI), through other company-to-company channels, or the Internet.

5 Data Quality Data Quality (DQ) is an extremely important issue since quality determines the data’s usefulness as well as the quality of the decisions based on the data

6 Data Quality Problems (Strong, et al., 1997)
Intrinsic DQ: Accuracy, objectivity, believability, and reputation. Accessibility DQ: Accessibility and access security. Contextual DQ: Relevancy, value added, timeliness, completeness, amount of data. Representation DQ: Interpretability, ease of understanding, concise representation, consistent representation.

7 Object-Oriented Databases
Last time we discussed hierarchical, network, and relational databases An object-oriented database is a part of the object-oriented paradigm, which also includes object-oriented programming, operating systems, and modeling. Object-oriented databases are sometimes referred to as multimedia databases and are managed by special multimedia database management systems

8 Document Management Document Management is the automated control of electronic documents, page images, spreadsheets, word processing documents, and complex, compound documents through their entire life cycle within an organization, from initial creation to final archiving. Benefits of Document Management: Greater control over production, storage, and distribution of documents Greater efficiency in the reuse of information Control of a document through a workflow process Reduction of product cycle times

9 Data Processing Data processing in organizations can be viewed as either transactional or analytical Transactional The data in TPS are organized mainly in a hierarchical structure and are centrally processed. Databases and processing systems are known as operational systems. Analytical Analytical processing involves analysis of accumulated data, mainly by end-users. Includes DSS, EIS, Web applications, and other end-user activities.

10 Delivery Systems A good data delivery system should be able to support: Easy data access by the end-users. A quick decision-making process. Accurate and effective decision making. Flexible decision making.

11 Data Warehouses The purpose of a data warehouse is to establish a data repository that makes operational data accessible in a form readily acceptable for analytical processing activities (e.g. decision support, EIS) Data warehouses include a companion called metadata, meaning data about data.

12 Benefits of Data Warehousing
The ability to reach data quickly, as they are located in one place. The ability to reach data easily, frequently by end-users themselves, using Web browsers.

13 Characteristics of Data Warehousing
Organization: Data are organized by detailed subjects. Consistency: Data in different operational databases may be encoded differently. In the warehouse they will be coded in a consistent manner. Time variant: The data are kept for 5 to 10 years so they can be used for trends, forecasting, and comparisons over time. Non-volatile: Once entered into the warehouse, data are not updated. Relational: The data warehouse uses a relational structure. Client/Server: The data warehouse uses the client/server to provide the end user an easy access to its data.

14 Data Warehouse Framework

15 Data Warehouse Suitability
Data warehousing is most appropriate for organizations in which some of the following apply: Large amounts of data need to be accessed by end-users. The operational data are stored in different systems. An information-based approach to management is in use. There is a large, diverse customer base. The same data are represented differently in different systems. Data are stored in highly technical formats that are difficult to decipher. Extensive end-user computing is performed.

16 Data Mart An alternative to data warehousing used by many smaller firms is the creation of a lower cost, scaled-down version of a data warehouse, called a data mart. A data mart refers to a small warehouse designed for a strategic business unit (SBU) or a department.

17 Data Mart Types Replicated (dependent) Data Marts: Sometimes it is easier to work with a subset of the data warehouse. In such cases one can replicate functional subsets of the data warehouse in smaller databases. Stand-Alone Data Marts: A company can have one or more independent data marts without having a data warehouse.

18 Knowledge Discovery in Databases (KDD)
KDD is the process of extracting useful knowledge from volumes of data. It is the subject of extensive research. KDD’s objective is to identify valid, novel, potentially useful, and ultimately understandable patterns in data. KDD is useful because it is supported by three technologies that are now sufficiently mature: Massive data collection Powerful multiprocessor computers Data mining algorithms

19 Tools and Techniques of KDD
Ad-hoc queries allow users to request in real time information from the computer that is not available in the periodic reports. Online analytical processing (OLAP) refers to such end-user activities as DSS modeling using spreadsheets and graphics, which are done online. Ready-made Web-based Analysis. Many vendors provide ready made analytical tools, mostly in finance, marketing, and operations.

20 Data Mining Data mining derives its name from the similarities between searching for valuable business information in a large database, and mining a mountain for valuable ore. Data mining technology can generate new business opportunities by providing these capabilities: Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in large databases. Automated discovery of previously unknown patterns. Data mining tools identify previously hidden patterns in one step.

21 Applications of Data Mining
Retailing & Sales Banking Manufacturing & Production Brokerage & Securities trading Computer hardware & software Insurance Police work Government & Defense Airlines Health care Broadcasting Marketing

22 Text Mining Text mining is the application of data mining to non-structured or less structured text files. Text mining helps organizations to do the following: Find the 'hidden' content of documents, including additional useful relationships. Group documents by common themes.

23 Web Mining Web Mining refers to mining tools used to analyze a large amount of data on the Web, such as what customers are doing on the Web—that is, to analyze clickstream data.

24 Data Visualization Data visualization refers to the presentation of data by technologies such as digital images, geographical information systems, graphical user interfaces, multidimensional tables and graphs, virtual reality, three-dimensional presentations, and animation.

25 Multidimensionality Modern data and information may have several dimensions. e.g. Management may be interested in examining sales figures in a certain city by product, by time period, by salesperson, and by store. It is important to provide the user with a technology that allows her to add, replace, or change dimensions quickly and easily in a table and/or graphical presentation. The technology of slicing, dicing, and similar manipulations of data is called Multidimensionality.

26 Multidimensionality, cont’d.
3 factors are considered in multidimensionality: Dimensions Measures Time

27 Examples of Dimensions
Products Salespeople Market segments Business units Geographical locations Distribution channels Countries Industries

28 Examples of Measures Money Sales volume Head count Inventory Profit
Actual vs. forecasted results

29 Examples of Time Daily Weekly Monthly Quarterly Yearly

30 Advantages of Multidimensionality
Data can be presented and navigated with relative ease. Multidimensional databases are easier to maintain. *Multidimensional databases are significantly faster than relational databases as a result of the additional dimensions and the anticipation of how the data will be accessed by users.*

31 Geographical Information Systems (GIS)
A geographical information system (GIS) is a computer-based system for capturing, storing, checking, integrating, manipulating, and displaying data using digitized maps. Every record or digital object has an identified geographical location.

32 Example of GIS in Action
Banks are using GIS for plotting the following: Branch and ATM locations Customer demographics Volume and traffic patterns of business activities Geographical area served by each branch Market potential for banking activities Strengths and weaknesses against the competition Branch performance

33 GIS, cont’d. GIS Software varies in its capabilities, from simple computerized mapping systems to enterprise wide tools for decision support data analysis. GIS Data are available from a wide variety of sources. Government sources (via the Internet and CD-ROM) provide some data, while vendors provide diversified commercial data as well. GIS & Decision Making: The graphical format of makes it easy for managers to visualize the data & make decisions. GIS and the Internet or intranet. Most major GIS software vendors are providing Web access, such as embedded browsers, or a Web/Internet/intranet server that hooks directly into their software.

34 Visual Interactive Modeling (VIM)
Visual interactive modeling (VIM) uses computer graphic displays to represent the impact of different management decisions on goals such as profit or market share. A VIM can be used both for supporting decisions & training. It can represent a static or a dynamic system.

35 Visual Interactive Simulation (VIS)
Visual interactive simulation (VIS) is one of the most developed areas in VIM. It is a decision simulation in which the end-user watches the progress of the simulation model in an animated form using graphics terminals.

36 Virtual Reality (VR) Virtual reality (VR) is interactive, computer-generated, three-dimensional graphics. VR applications to date have been used to support decision making indirectly. Boeing has developed a virtual aircraft mock-up to test designs. At Volvo, VR is used to test virtual cars in virtual accidents. Data visualization helps financial decision makers by using visual, spatial & aural immersion virtual systems. Some stock brokerages have a VR application in which users surf over a landscape of stock futures, with color, hue, and intensity.

37 Data Mining and Warehousing Implementation Examples
Alamo Rent-a-Car discovered that German tourists liked bigger cars. So now, when Alamo advertises its rental business in Germany, the ads include information about its larger models. Au Bon Pain Company discovered that they were not selling as much cream cheese as planned. When they analyzed point-of-sale data, they found that customers preferred small, one-serving packaging. AT&T and MCI sift through terabytes of customer phone data to fine-tune marketing campaigns and determine new discount calling plans.

38 CASE: Data Mining Powers Wal-Mart (p. 510)
An interesting case study exploring how Wal-Mart uses data warehousing and data mining to get the right product on the appropriate shelf at the lowest cost

39 Web-Based Data Management Systems
Business intelligence activities – from data acquisition, through warehousing, to mining – can be performed with Web tools or are interrelated with Web technologies and e-Commerce. e-Commerce software vendors are providing Web tools that connect the data warehouse with EC ordering and cataloging systems. e.g. Tradelink, a product of Hitachi Data warehousing and decision support vendors are connecting their products with Web technologies and EC. e.g. Comshare’s DecisionWeb, Web Intelligence from Business Objects, and Cognos’s DataMerchant.

40 Managerial Issues Cost–benefit issues & justification. A cost–benefit analysis must be undertaken before any commitment to new technologies. Where to store data physically. Should data be distributed close to their sources? Or should data be centralized for easier control. Legal issues. Data mining gives raise to a variety of legal issues. The legacy data problem. What should be done with masses of information already stored in a variety of formats, often known as the legacy data acquisition problem?

41 Managerial Issues, cont’d.
Disaster recovery. How well can an organization’s business processes recover after an information system disaster? Internal or external? Should a firm store & maintain its databases internally or externally? Data security and ethics. Are the company’s competitive data safe from external snooping or sabotage? Ethics. Should people have to pay for use of online data?

42 Managerial Issues, cont’d.
Privacy. Collecting data in a warehouse and conducting data mining may result in the invasion of individual privacy. Data purging. When is it beneficial to “clean house” and purge information systems of obsolete or non–cost-effective data? Data delivery. A problem regarding how to move data efficiently around an enterprise also exists.


Download ppt "Data Management: Warehousing, Analyzing, Mining & Visualization"

Similar presentations


Ads by Google