Presentation on theme: "Data and Knowledge Management"— Presentation transcript:
1 Data and Knowledge Management CHAPTER 5Data and Knowledge Management
2 CHAPTER OUTLINE 5.1 Managing Data 5.2 The Database Approach 5.3 Database Management Systems5.4 Data Warehouses and Data Marts5.5 Knowledge Management
3 LEARNING OBJECTIVES1. Identify three common challenges in managing data, and describe one way organizations can address each challenge using data governance.2. Name six problems that can be minimized by using the database approach.3. Demonstrate how to interpret relationships depicted in an entity-relationship diagram.4. Discuss at least one main advantage and one main disadvantage of relational databases.
4 Learning Objectives (continued) 5. Identify the six basic characteristics of data warehouses, and explain the advantages of data warehouses and marts to organizations.6. Demonstrate the use of a multidimensional model to store and analyze data.7. List two main advantages of using knowledge management, and describe the steps in the knowledge management system cycle.
5 Big Data Case – pages 112 & 113Walmart processes over 1,000,000 transactions per hourFrom 2006 to 2010 IBM invested over $12,000,000,000 for setting up business intelligence centersUsing big data to spot trends before your competitors spot them can be a strategic advantage (Best Buy success, Nestle failure)
6 Annual Flood of Data from….. Credit card swipessDigital videoOnline TVRFID tagsBlogsDigital video surveillanceRadiology scansSource: Media Bakery
8 5.1 Managing Data The Difficulties of Managing Data Data Governance Difficulties in managing data:Amount of data increasing exponentiallyData are scattered throughout organizations and collected by many individualsusing various methods and devices.Data come from many sources.Data security, quality, and integrity are critical.
9 Difficulties in Managing Data Difficult to manage data for many reasons:Amount of data increasing exponentially over time;Data are scattered throughout organizations;Data obtained from multiple internal and external sources;Data degrade over time;Data subject to data rot;Data security, quality, and integrity are critical, yet easily jeopardized;Information systems that do not communicate with each other can result in inconsistent data;Federal regulations.Source: Media Bakery
10 Data GovernanceBig data can have big data errorsData Governance – manage data across the entire organizationMaster Data Management – have all organization processes access a single version of the dataMaster Data – an enterprise system of core dataData governance is an approach to managing information across an entire organization.Master data management is a process that spans all of an organization’s business processesand applications.Master data are a set of core data that span all of an enterprise’s information systems.See video
11 Master Data Management John Stevens registers for Introduction to ManagementInformation Systems (ISMN 3140) from 10 AM until 11 AM on Mondays and Wednesdays in Room 41 Smith Hall, taught by Professor Rainer.Transaction Data Master DataJohn Stevens StudentIntro to Management Information Systems CourseISMN Course No.10 AM until 11 AM TimeMondays and Wednesdays WeekdayRoom 41 Smith Hall LocationProfessor Rainer Instructor
12 5.2 The Database ApproachDatabase management system (DBMS) minimize the following problems:Data redundancyData isolationData inconsistencyData redundancy: The same data are stored in many places.Data isolation: Applications cannot access data associated with other applications.Data inconsistency: Various copies of the data do not agree.
13 Database Approach (continued) DBMSs maximize the following issues:Data securityData integrityData independenceData security: Keeping the organization’s data safe from theft, modification,and/or destruction.Data integrity: Data must meet constraints (e.g., student grade point averagescannot be negative).Data independence: Applications and data are independent of one another.applications and data are not linked to each other, meaning thatapplications are able to access the same data.
15 Data Hierarchy Bit Byte Field Record File (or table) Database A zero or a one8 bits, a single character or numberA column in a spreadsheet like a nameA row in a spreadsheet like name and address and phone #A collection of related recordsA bit is a binary digit, or a “0” or a “1”.A byte is eight bits and represents a single character (e.g., a letter, number or symbol).A field is a group of logically related characters (e.g., a word, small group of words,or identification number).A record is a group of logically related fields (e.g., student in a university database).A file is a group of logically related records.A database is a group of logically related files.A collection of related files
17 Data Hierarchy (continued) Bit (binary digit)Byte (eight bits)
18 Data Hierarchy (continued) Example of Field and Record
19 Data Hierarchy (continued) Example of Field and Record
20 Designing the Database Data modelEntityAttributePrimary keySecondary keysThe data model is a diagram that represents the entities in the database and their relationships.An entity is a person, place, thing, or event about which information is maintained. A record generally describes an entity.An attribute is a particular characteristic or quality of a particular entity.The primary key is a field that uniquely identifies a record.Secondary keys are other field that have some identifying information but may not identify the file with complete accuracy.The data model is a diagram that represents the entities in the database and their relationships.An entity is a person, place, thing, or event about which information is maintained.A record generally describes an entity.An attribute is a particular characteristic or quality of a particular entity.The primary key is a field that uniquely identifies a record.Secondary keys are other field that have some identifying information but typically do notidentify the file with complete accuracy.
21 Entity-Relationship Modeling Database designers plan the database design in a process called entity-relationship (ER) modeling.ER diagrams consists of entities, attributes and relationships.Entity classesInstanceIdentifiersEntity classes are groups of entities of a certain type.An instance of an entity class is the representation of a particular entity.Entity instances have identifiers, which are attributes that are unique to that entity instance.
22 Relationships Between Entities (see page 120) Maximum number of instancesMinimum number of instances
24 5.3 Database Management Systems Database management system (DBMS)[defines both the data structure and the data relationships]Relational database modelStructured Query Language (SQL)Query by Example (QBE)One table is a “flat file”, it is the relationship between tables that make a databaseA database management system is a set of programs that provide users with tools to add,delete, access, and analyze data stored in one location.The relational database model is based on the concept of two-dimensional tables.Structured query language allows users to perform complicated searches by usingrelatively simple statements or keywords.Query by example allows users to fill out a grid or template to construct a sample ordescription of the data he or she wants.
25 Student Database Example Can you determine an attribute?A primary key?A secondary key?An instance?
26 Normalization Normalization Minimum redundancy Maximum data integrity Best processing performanceNormalized data occurs when attributes in the table depend only on the primary key.Normalization is a method for analyzing and reducing a relational database to its moststreamlined form for minimum redundancy, maximum data integrity, and best processingperformance.
32 5.4 Data Warehousing Data warehouses and Data Marts Organized by business dimension or subjectMultidimensionalHistoricalUse online analytical processingA data warehouse is a repository of historical data organized by subject to supportdecision makers in the organization.Historical data in data warehouses can be used for identifying trends, forecasting, andmaking comparisons over time.Online analytical processing (OLAP) involves the analysis of accumulated data byend users (usually in a data warehouse).In contrast to OLAP, online transaction processing (OLTP) typically involves a database, where data from business transactions are processed online as soon as they occur.A data warehouse is a repository of historical data organized by subject to supportdecision makers in the organization.Historical data in data warehouses can be used for identifying trends, forecasting, and makingcomparisons over time.Online analytical processing (OLAP) involves the analysis of accumulated data by end users(usually in a data warehouse).In contrast to OLAP, online transaction processing (OLTP) typically involves a database, wheredata from business transactions are processed online as soon as they occur.
33 Data Warehouse Framework & Views This figure (Figure 4.9) shows the process of building and using a data warehouse.
34 Relational DatabasesThis is the first slide (Figure 3.10) of five showing the relationship between relational databasesand a multidimensional data structure (or data cube).
35 Multidimensional Database Figure 3.11 a, b, and c.
36 Equivalence Between Relational and Multidimensional Databases Figure 3.12 a, b, and c.
37 Equivalence Between Relational and Multidimensional Databases
38 Equivalence Between Relational and Multidimensional Databases
39 Benefits of Data Warehousing End users can access data quickly and easily via Web browsers because they are located in one place.End users can conduct extensive analysis with data in ways that may not have been possible before.End users have a consolidated view of organizational data.
40 Data ConceptsMetadata – data about data such as relationships between tables or table definitionsData quality – data is seldom 100% “clean”Data governance (link)Users include information producers and consumers