Download presentation
Published byLydia Martin Modified over 9 years ago
1
Foundations of Business Intelligence: Databases and Information Management
Chapter 6 (10e)
2
Problems with the Traditional File Environment
Data redundancy and inconsistency: the presences of duplicate data in multiple data files so that the same data are stored in more than one place or location Data inconsistency – the same attribute may have different values Program – data dependence: the coupling of data stored in files and the specific programs required to update and maintain those files Lack of flexibility: traditional file systems can deliver routine scheduled reports, but cannot deliver ad-hoc reports or respond to unanticipated requirements.
3
Problems with the Traditional File Environment (Continued)
Lack of data sharing and availability: Information cannot flow freely across different functional areas or different parts of the organization. Users find different values of the same piece of information in two different systems. Poor security: Because there is little control or management of data, management will have no knowledge of who is accessing or even making changes to the organization’s data.
4
Other Database Concepts
Object-oriented database model Successor to the relational model Integration of data and programs Handles wider variety of field types Entity-relationship diagrams Graphical method of displaying relationships between tables Tool for IS professionals
5
Types of Database Models
Hierarchical Network Relational Object-oriented Extension of the relational model Stores both data and the procedures that act on the data Stores more complex types of information (graphics)
6
CREATING A DATABASE ENVIRONMENT An Entity-Relationship Diagram
Figure 7-12
7
Physical versus Logical Views
In managing information, physical deals with the structure of information as it resides on various storage media. Logical deals with how knowledge workers view their information needs, and includes such terms as: CHARACTER - our smallest unit of information. FIELD - group of related characters. RECORD - group of related fields. FILE - group of related records. DATABASE - group of logically associated files. DATA WAREHOUSE - information from many databases.
8
Other Logical Structures in a Database
DATA DICTIONARY - contains the logical structure of information in a database. An INTEGRITY CONSTRAINT is a rule that helps assure the quality of the information in a database. A registration database at your school includes integrity constraints concerning prerequisites for certain classes. Designating primary keys, enforcing referential integrity, using input masks, and validation rules are ways to establish integrity constraints
9
Sample Data Dictionary Report
10
Components of a DBMS DBMS engine- accepts logical requests from the various other DBMS subsystems, converts them to their physical equivalent, and actually accesses the database and data dictionary as they exist on a storage device. DATA DEFINITION Language (DML) - helps you create and maintain the data dictionary and define the structure of the files in a database You use this subsystem to define the information logical structure when you first create a database. Once you’ve created a database, you use this subsystem to define new fields, delete fields, or change field properties.
11
More Components of a DBMS
Data Manipulation Language (DML) helps you add, delete or modify data in the database provides query language (QBE and SQL) and ability to generate views provides ability to generate reports (in Access see the Report Wizard) To confuse the issue SQL is both a DDL and a DML DATA ADMINISTRATION SUBSYSTEM-helps you manage the overall database environment by providing facilities for: Backup and recovery Security management
12
More Components of a DBMS
DATA ADMINISTRATION SUBSYSTEM-helps you manage the overall database environment by providing facilities for: Backup and recovery Security management
13
Database Architectures- Centralized
Centralized database use a single central processor or multiple processors in a client/server network. The major feature is that the database is in a single physical location. Advantages of this design are that security tends to be higher and risks are lower When data demands in terms of access are highly decentralized this design tends to be costly and inflexible
14
Database Architectures- Distributed
Databases can be decentralized either by partitioning or by replicating Partitioned database: Database is divided into segments or regions. For example, a customer database can be divided into Eastern customers and Western customers, and two separate databases maintained in the two regions. Duplicated database: The database is duplicated at two or more locations. The separate databases are synchronized in off hours on a batch basis.
15
Distributed Databases
16
Data Warehouse Definition- a database with tools (software) that stores current and historical data that is designed to support business analysis activities and decision-making tasks of managers; typically a relational database model is used Benefits improved access improved information isolation from operational systems tools permit advanced data analysis Users Data marts
17
Comparison of Data in a Data Warehouse and Operational Data
Data is on many systems Current operational data Inconsistent data definitions Functionally organized data Data are constantly changing Warehouse Data Integrated in one enterprise-wide system Recent and historical data Consistent data definitions Data are organized around business entities Data are stabilized
18
Building a Data Warehouse (ETL)
Extraction phase – create files on the computer that will store the data warehouse and move transaction data to this machine; data may come from many sources or parts of the organization Transformation phase – cleanse and standardize the data. Why is this necessary? Load phase – transfer the data from the transformation phase into the data warehouse The ETL process becomes automated to make regular transfers of transaction data into the data warehouse
19
Data-Mining and Data-Mining Tools
Data-mining is the process of selecting, exploring, and modeling large amounts of data to discover previously unknown relationships that support decision making. Traditional data mining tools answer questions about variables that we think are related Query languages (QBE or SQL) Report generators Multidimensional analysis tools (OLAP and pivot tables) Standard statistical procedures (regression, ANOVA) Knowledge discovery data-mining tools look for relationships that are not discernable to the human eye (e. g., hidden patterns)
20
Types of Information Obtainable from Knowledge Discovery Data Mining Tools
Associations are occurrences linked to a single event. Often used to discover products that are unexpectedly bought together (market basket analysis) Sequences are events that are linked over time. For example, Fingerhut found that people who purchase a new home spend more in the first six months of occupancy. .
21
Types of Information Obtainable from Knowledge Discovery Data Mining Tools (continued)
Classification recognizes patterns that describe the group to which an item belongs by examining existing items that have been classified. The data warehouse for a bank or telephone company usually has a large number of customers who have left. Data mining software exists that will look at the behavior patterns of these customers prior to their leaving and develop a profile. Existing customers who fit the profile can be targeted with promotions designed for retention. Clustering works in a manner similar to classification except you don’t know the nature of the clusters beforehand. In other words, data mining groups items together that you would not expect to be grouped together
22
Multidimensionality Multidimensional data analysis enables users to view data using various dimensions, measures and time frames OLAP dimensions: products, business units, country, industry (categories) measures: money, unit sales, head count, variances time: daily, weekly, monthly, quarterly, yearly) This type of analysis also provides the ability to view data in different ways (tables, charts, 3-D, geographically) OLAP tools provide for this Pivot tables in Excel or Access
23
A Data Cube
24
Examples of OLAP Tools Go to www.fedscope.opm.gov
Under data cubes on entry page click on employment Demonstrate drill down and adding charts Data for this example comes from the Central Personnel Data File (CPDF) of the federal government The OLAP tool used to build this site is from a company named Cognos (PowerPlay) OLAP tools based on Excel
25
Databases and the Web Physical relationship of the hardware
The role of middleware which is software residing on the application server is to (1) convert the HTML information captured by the Web server into SQL and (2) convert information from the database back to HTML so it can be displayed on the Web server. Using the Web The browser is a virtual standard and easy to use The browser does not require training in a database query tool The use of the browser requires no change to the internal database; this enables firms to provide access to internal databases with little cost thus leveraging their investment in older systems.
26
Linking Internal Databases to the Web
27
Managing Data Resources -- Is there a problem?
Corporate and government databases have unexpectedly poor levels of data quality. National consumer credit reporting databases have error rates of 20-35%. 32% of the records in the FBI’s Computerized Criminal History file are inaccurate, incomplete, or ambiguous. Gartner Group estimates that consumer data in corporate databases degrades at the rate of 2% a month.
28
Managing Data Resources
Step 1: Establish an information policy that defines rules for sharing, disseminating, acquiring, standardizing, classifying, and inventorying data. Data administration function Data governance from IBM Database administration group for databases Step 2: Ensure data quality (on going activity) Data quality audits Data cleansing
29
Spreadsheets Versus DBMS
Linkage between elements spreadsheet - between cells in same table DBMS - between elements in different tables Orientation spreadsheet is toward calculations DBMS is tilted toward organization and linkage of data elements in different tables Capabilities DBMS has extensive querying and reporting power spreadsheet is limited Memory requirements entire spreadsheet table must be in memory not true for the database table
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.