Download presentation
Presentation is loading. Please wait.
1
Data and Knowledge Management
5 Data and Knowledge Management 70 Slides
2
[ LEARNING OBJECTIVES]
Discuss ways that common challenges in managing data can be addressed using data governance. Discuss the advantages and disadvantages of relational databases. Define Big Data, and discuss its basic characteristics. Explain the elements necessary to successfully implement and maintain data warehouses. Describe the benefits and challenges of implementing knowledge management systems in organizations.
3
5.1 Managing Data The Difficulties of Managing Data Data Governance
4
Difficulties in Managing Data
Data increases exponentially with time The amount of data increases exponentially over time
5
Difficulties in Managing Data
Data increases exponentially with time Multiple sources of data Multiple Sources of Data: Data are scattered throughout organizations Data are generated from multiple sources (internal, personal, external) New sources of data (e.g., blogs, podcasts, videocasts, and RFID tags and other wireless sensors)
6
Difficulties in Managing Data
Data increases exponentially with time Multiple sources of data Data rot, or data degradation Data Degradation (e.g., customers move to new addresses, change their names, etc.) Data Rot: refers primarily to problems with the media on which the data are stored. Over time, temperature, humidity, and exposure to light can cause physical problems with storage media and thus make it difficult to access the data. Leaving a compact disk in the sunlight, for example, can cause it to warp.
7
Difficulties in Managing Data
Data increases exponentially with time Multiple sources of data Data rot, or data degradation Data security, quality, and integrity The security, quality, and integrity of your data are all critical
8
Difficulties in Managing Data
Data increases exponentially with time Multiple sources of data Data rot, or data degradation Data security, quality, and integrity Government Regulation Here in the US, there are government regulations that control what kind of data you can collect and distribute. -- Sarbanes–Oxley Act of 2002 requires that: (1) public companies evaluate and disclose the effectiveness of their internal financial controls (2) They hire independent auditors in order to do this Legal requirements change frequently and differ among countries and industries
9
Multiple Sources of Data
Internal Sources Corporate databases, company documents Personal Sources Personal thoughts, opinions, experiences External Sources Commercial databases, government reports, and corporate Web sites. These are the Multiple Sources of Data Have students read and copy
10
Data Governance An approach to managing information across an entire organization. Master Data Management Master Data You have to understand how your role as a customer to a company has changed. Before we became an information oriented society, a business made money by selling a customer a product or performing a service for them. Today, companies not only make money by doing this, but by selling information about customers to other companies. Because of this, there needs to be rules and regulations about how this is going to be done. Data Governance: What unambiguous rules do you have across your entire organization rules for creating, collecting, handling, and protecting your customer’s information. Master Data Management: what is your company’s strategy for storing, maintaining, exchanging, and synchronizing the company’s master data Master Data: Master Data refers to any type of information that’s needed to run the company on a day to day basis (e.g., book-keeping, customer, product, employee, vendor, geographic location, etc.)
11
5.2 The Database Approach The Data Hierarchy
The Relational Database Model
12
Database Management Systems Minimize Three Main Problems
Data Redundancy – one storage place Database Management System (DBMS): a set of programs that provide users with tools to create and manage data. Putting all your data into a DBMS reduces three problems: Data redundancy: Is the same data are stored in multiple locations? It’s only stored in one place.
13
Database Management Systems Minimize Three Main Problems
Data Redundancy – one storage place Data Isolation – one access point Data isolation: Can an app or a software program access data associated with another app or a software program? The answer to that is No
14
Database Management Systems Minimize Three Main Problems
Data Redundancy – one storage place Data Isolation – one access point Data Inconsistency Data inconsistency: What happens when the various copies of the data do not agree? If the data is only stored in one place, then this won’t happen.
15
Database Management Systems Maximize Three Things
Data Security – automatic saving, warnings There are three things that Database Management System (DBMS) enable you to do well: Data security: Because data are “put in one place” in databases, there is a risk of losing a lot of data at one time. Therefore, databases must have extremely high security measures in place to minimize mistakes and deter attacks.
16
Database Management Systems Maximize Three Things
Data Security – automatic saving, warnings Data Integrity – validation rules Data integrity: Data meet certain constraints; for example, there are no alphabetic characters in a Social Security number field.
17
Database Management Systems Maximize Three Things
Data Security Data Integrity Data Independence – same data for all apps Data independence: Applications and data are independent of one another; that is, applications and data are not linked to each other, so all applications are able to access the same data.
18
Data Hierarchy Bit Bit (binary digit): represents the smallest unit of data a computer can process and it consists only of a 0 or a 1.
19
Data Hierarchy Bit Byte
Byte: A group of eight bits represents a single character (letter, number, or symbol).
20
Data Hierarchy Bit Byte Field
Field: A column of data containing a logical grouping of characters into a word, a small group of words (e.g., last name, social security number, etc.).
21
Data Hierarchy Bit Byte Field Data File or Table
Data File: logical grouping of related records is called a data file or a table similar in Excel or Access consisting of multiple columns and multiple rows.
22
Data Hierarchy Bit Byte Field Data File or Table Database
Database: logical grouping of related data files (aka database tables).
23
The Relational Database Model
Key Terms Database Model Database Model: a diagram that shows what information you’re going to be keeping track of.
24
The Relational Database Model
Key Terms Database Model Relational Database Model Relational Database Model: when something gets updated on one table, it gets updated on all tables.
25
The Relational Database Model
Key Terms Database Model Relational Database Model Entity Entity: a person, place, thing, or event (e.g., customer, an employee, or a product).
26
The Relational Database Model
Key Terms Database Model Relational Database Model Entity Instance Instance of an entity: refers to each row (or record) in a relational table, which is a specific, unique representation of the entity.
27
The Relational Database Model
Key Terms Database Model Relational Database Model Entity Instance Attribute Attribute: each characteristic or quality of a particular entity.
28
The Relational Database Model
Key Terms Database Model Relational Database Model Entity Instance Attribute Primary Key Primary Key: a field in a database that uniquely identify each record so that it can be retrieved, updated, and sorted. (usually a number)
29
The Relational Database Model
Key Terms Database Model Relational Database Model Entity Instance Attribute Primary Key Foreign or Secondary Keys Foreign or Secondary Key: a field that has some identifying information, but typically does not identify the record with complete accuracy and therefore cannot serve at the Primary Key. You can create a link between the primary key field in one table and foreign or secondary fields of another table
30
FIGURE 5.3 Student database example.
What’s the key field about students on this campus?
31
5.3 Big Data Defining Big Data Characteristics of Big Data
Managing Big Data Leveraging Big Data
32
Defining Big Data Big data is difficult to define
Two Descriptions of Big Data Big Data involves building a profile of a person, company, or transaction
33
Defining Big Data Big Data involves building a profile of a person, company, or transaction Big Data involves building a profile of a person, company, or transaction
34
Defining Big Data Big Data involves building a profile of a person, company, or transaction Diverse, high-volume, high-velocity information assets Diverse, high-volume, high-velocity information assets that require new forms of processing to enable enhanced decision making, insight discovery, and process optimization. You need very high speed and sophisticated computer processor.
35
Defining Big Data Big Data involves building a profile of a person, company, or transaction Diverse, high-volume, high-velocity information assets Do not fit neatly into traditional, structured, relational databases It enables you to make decisions regarding the discovery of insights and optimizations of processes. You’re looking at numbers, for instance, and you’re seeing patterns or trends in those numbers. You then have to design an experiment to test out what you think is going on. Trying to find a cure for cancer is the best example that I can give you of a situation that can benefit from big data. Every time someone gets sick, there’s a whole new set of data that can get collected and analyzed.
36
Defining Big Data Big Data Generally Consist of:
Traditional enterprise data Machine-generated/sensor data Social Data Images captured by billions of devices located around the world Digital cameras, camera phones, medical scanners, and security cameras Getting back to business, though, this is what big data involves. Have students read and copy
37
Issues with Big Data Untrusted data sources Big Data is dirty
Big Data changes, especially in data streams Here are 3 problems with big data: Big Data can come from untrusted sources. Big Data is dirty: Dirty data refers to inaccurate, incomplete, incorrect, duplicate, or erroneous data. The cancer patient, for instance, can have diabetes or blood pressure in addition to the cancer that might be causing some of the symptoms. Big Data changes, especially in data streams: Organizations must be aware that data quality in an analysis can change, or the data itself can change, because the conditions under which the data are captured can change.
38
Managing Big Data When properly analyzed big data can reveal valuable patterns and information. Database environment Traditional relational databases versus NoSQL (intuitive) databases When properly analyzed big data can reveal valuable patterns and information It always happens in a Database environment, using Traditional relational databases and well as more intuitive ones.
39
Putting Big Data to Use There are five things to remember when putting big data to use
40
Putting Big Data to Use Making Big Data Available
Making Big Data Available: Making Big Data available for relevant stakeholders can help organizations gain value.
41
Putting Big Data to Use Making Big Data Available
Enabling Organizations to Conduct Experiments Enabling Organizations to Conduct Experiments: Big Data allows organizations to improve performance by conducting controlled experiments. For example, Amazon (and many other companies such as Google and LinkedIn) constantly experiments by offering slight different “looks” on its Web site.
42
Putting Big Data to Use Making Big Data Available
Enabling Organizations to Conduct Experiments Microsegmentation of Customers Micro-Segmentation of Customers: Segmentation of a company’s customers means dividing them up into groups that share one or more characteristics.
43
Putting Big Data to Use Making Big Data Available
Enabling Organizations to Conduct Experiments Microsegmentation of Customers Creating New Business Models Creating New Business Models: Companies are able to use Big Data to create new business models. For example, a commercial transportation company operated a large fleet of large, long-haul trucks. The company recently placed sensors on all its trucks. These sensors wirelessly communicate large amounts of information to the company, a process called telematics. The sensors collect data on vehicle usage (including acceleration, braking, cornering, etc.), driver performance, and vehicle maintenance. By analyzing this Big Data, the transportation company was able to improve the condition of its trucks through near-real-time analysis that proactively suggested preventive maintenance.
44
Putting Big Data to Use Making Big Data Available
Enabling Organizations to Conduct Experiments Microsegmentation of Customers Creating New Business Models Organizations Can Analyze Far More Data Organizations Can Analyze Far More Data: In some cases, organizations can even process all the data in a population relating to a particular phenomenon, meaning that they do not have to rely as much on sampling.
45
Big Data Used in the Functional Areas of the Organization
Human Resources Product Development Operations Marketing Government Operations Ask students to give examples of how big data could be used In human resources: People that have a pattern of taking Mondays and Fridays off have more of a tendency to find a new job Product Development: Reece’s Peanut Butter Cups were invented when a guy carrying a chocolate bar accidentally dipped it in his girlfriend's peanut butter jar. Operations: Stock people used to wait on loading docs for trucks to come in so that they could unload them. Now the trucks have GPS. The GPS lets the loading doc manager know when the truck is a few miles away. The stock people can then go down to the loading dock. Marketing: People now sign up for accounts linking their frequent buyer cards to their addresses. This lets you know exactly what the person likes to buy so that you can them about upcoming sales. Government Operations: People don’t have to visit government offices anymore as frequently as they used to. They can submit documents and check balances on line. A company like ADP, which processes payroll, can provide earnings data automatically to the IRS to help people when it comes time to do their taxes.
46
Data Warehouses and Data Marts
5.4 Data Warehouses and Data Marts Describing Data Warehouses and Data Marts A Generic Data Warehouse Environment There are times when your company doesn’t want to hold on to the data itself. You don’t want customer credit card numbers sitting on your file servers. If those servers get compromised, your company is labile. So you pay a data warehouse or a data mart to hold on to them for you.
47
Describing Data Warehouses & Data Marts
A repository of historical data that are organized by subject to support decision makers in the organization Data Warehouse: a repository of historical data that are organized by subject to support decision makers in the organization. A data warehouse holds on to the data and does complex statistics on it for your company.
48
Describing Data Warehouses & Data Marts
A repository of historical data that are organized by subject to support decision makers in the organization Data Mart A low-cost, scaled-down version of a data warehouse designed for end-user needs in a strategic business unit (SBU) or individual department. Data Warehouse: a repository of historical data that are organized by subject to support decision makers in the organization. A data warehouse holds on to the data and does complex statistics on it for your company. Data Mart: a low-cost, scaled-down version of a data warehouse that is designed for the end-user needs in a strategic business unit (SBU) or an individual department.
49
FIGURE 5.4 Data warehouse framework.
This is what it looks like on a diagram
50
Describing Data Warehouses & Data Marts
Basic characteristics of data warehouses and data marts Organized by business dimension or subject Organized by business dimension or subject - Data can be organized by subject. For example, by customer, vendor, product, price level, and region. This arrangement differs from transactional systems, where data is described as it comes in by business process, such as order entry, inventory control, and accounts receivable.
51
Describing Data Warehouses & Data Marts
Basic characteristics of data warehouses and data marts Organized by business dimension or subject Use online analytical processing (OLAP) Integrated Time variant Nonvolatile Multidimensional Use online analytical processing (OLAP) Integrated - Data are collected from multiple systems and then integrated around subjects. Time variant - Data warehouses and data marts maintain historical data (i.e., data that include time as a variable). Nonvolatile - Data warehouses and data marts are nonvolatile—that is, users cannot change or update the data. Multidimensional - Typically the data warehouse or mart uses a multidimensional data structure. Recall that relational databases store data in two-dimensional tables.
52
A Generic Data Warehouse Environment
Source Systems Now looking at some of the things that make a data warehouse Source Systems: Systems that provide a source of organizational data, the transactions for your company. Common Examples of Source Systems Include: operational/transactional systems enterprise resource planning (ERP) systems Web site data third-party data (e.g., customer demographic data) operational databases
53
A Generic Data Warehouse Environment
Source Systems Data Integration Data Integration: Typically organizations need to Extract, Transform, and Load (ETL) data from source system into a data warehouse or data mart. The data needs to be cleaned up and formatted.
54
A Generic Data Warehouse Environment
Source Systems Data Integration Storing the Data It needs to be stored in a data base for that company
55
A Generic Data Warehouse Environment
Source Systems Data Integration Storing the Data Metadata Metadata: Data needs to be maintained. The new data coming in has be to be added to the data that’s already there
56
A Generic Data Warehouse Environment
Source Systems Data Integration Storing the Data Metadata Data Quality Data Quality: You have to make sure that the quality of the data in the warehouse meet users’ needs. If it doesn’t, users will not trust the data and ultimately will not use it.
57
A Generic Data Warehouse Environment
Source Systems Data Integration Storing the Data Metadata Data Quality Data Governance Governance: You have to make sure that your company can collect and use the data that your’re compiling from both a legal and a company policy standpoint.
58
A Generic Data Warehouse Environment
Source Systems Data Integration Storing the Data Metadata Data Quality Data Governance Users Users: You have to consider the user or users that you’re collecting and analyzing the data for. These could IT developers; frontline workers; analysts; information workers; managers and executives; and suppliers, customers, and regulators.
59
FIGURE 5.5 Relational databases.
The first step in to take the data and configure it in a Relational database like this one.
60
FIGURE 5.6 Data cube. You can then analyze it according to different dimensions.
61
FIGURE 5.7 Equivalence between relational and multidimensional databases.
This allows you to pick out the different trends or patterns in your data
62
5.5 Knowledge Management Concepts and Definitions
Knowledge Management Systems The KMS Cycle
63
Concepts & Definitions
Knowledge Management (KM) A process that helps manipulate important knowledge that comprises part of the organization’s memory, usually in an unstructured format. Knowledge management (KM): a process that helps organizations manipulate important knowledge that comprises part of the organization’s Knowledge: information that is contextual, relevant, and useful. It is information in action. Intellectual capital (or intellectual assets) is another term for knowledge.
64
Concepts & Definitions
Knowledge Management (KM) A process that helps manipulate important knowledge that comprises part of the organization’s memory, usually in an unstructured format. Explicit Explicit Knowledge: more objective, rational, and technical knowledge. In an organization, explicit knowledge consists of the policies, procedural guides, reports, products, strategies, goals, core competencies, and IT infrastructure of the enterprise.
65
Concepts & Definitions
Knowledge Management (KM) A process that helps manipulate important knowledge that comprises part of the organization’s memory, usually in an unstructured format. Explicit & Tacit Knowledge Tacit Knowledge: the cumulative store of subjective or experiential learning. In an organization, tacit knowledge consists of an organization’s experiences, insights, expertise, know-how, trade secrets, skill sets, understanding, and learning. It is generally imprecise and costly to transfer.
66
Knowledge Management Systems (KMS)
Refer to the use of modern information technologies – the Internet, intranet, extranets, databases – to systematize, enhance, and expedite intrafirm and interfirm knowledge management. Best practices Read the definition and have students copy it Intra-firm – between companies Inter-firm – within a company
67
FIGURE 5.8 The knowledge management system cycle.
The KMS Cycle Consists of Six Steps:
68
Data and Knowledge Management
5 Data and Knowledge Management The End
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.