2 What’s “File Processing”? The “old” way of doing things; still often used in practice.Separate information stored on separate files.2
3 File Processing Example: SalesProductionMarketingKnows howmany ofProducts A,B, and C havebeen sold.File storesProd. Name,ProductionSchedule,and Sales.Knows howmuch ofProducts A,B, and C havebeen produced.File storesProd. Name,ProductionSchedule, andNumber Produced.Knows theprice ofProducts A,B, and C.File storesProd. Nameand ProductPrice.3
4 Any problems here? Duplication (redundancy). Inconsistency. Does anyone know how much money we made? No integration.Set format. Data dependence. Y2K!!4
5 Database Management Database Management System (DBMS) Provides one integrated repository for data to be stored and queried.Standards for data can be defined and enforced.Reports and queries are easy (er).SQL, etc.5
7 DATABASE MANAGEMENT SYSTEMS Four components of a DBMSThe components of the DBMS are discussed in detail on the following slidesA DBMS contains:Data definition component – helps create and maintain the data dictionary and the structure of the databaseData manipulation component – allows users to create, read, update, and delete information in a databaseApplication generation component – includes tools for creating visually appealing and easy-to-use applicationsData administration component – provides tools for managing the overall database environment by providing faculties for backup, recovery, security, and performance
9 Another Look (thanks to John Gallaugher, Boston College) Server - responds to client requestsDBMS - the program. Manages interaction with databases.requestresponseClient - makes requests of the DBMS serverdatabase - the collection of data.Created and defined to meet theneeds of the organization.Databasea collection of related data. Usually organized according to topics: e.g. customer info, products, transactionsDatabase Management System (DBMS)a program for creating & managing databases; ex. Oracle, MS-Access, SybaseWhen we talk about databases, we’re really talking about nothing more than lists of information.DBMS are used to create and manage databases (sometimes broadly called a database or database program)Show where DBMS & Databases relate to client/server computingThe request in a request / response might include things like:add a recordretrieve a recordmodify a recorddelete a record
10 A Simple Database File/Table Field/Column Record/Row Customers 5 shown: CUSTID, FIRST, LAST, CITY, STATERecord/Row5 shown: one for each customerWhy do we even need a customer ID?to uniquely identify a user - there may be two people with the same name
11 A More Complex Example Entry & Maintenance is complicated How many records do I have to change if Abby Johnson gets married & decides she wants to change her name?TwoHow many records do I change if Warren Buffet moves?ThreeYou get the picture - there’s got to be a better way to manage this & there is - it’s called the relational model.Entry & Maintenance is complicatedredundant data exists, increases chance of error, complicates updates/changes, takes up space
12 Normalize Data: Remove Redundancy Customer TableTransaction TableOneManyNormalization is nothing more than removing redundancy. There are forma l steps & levels of normal form, but those are for a database course (a grad database elective is offered by the CS department).One normalizes to:eliminate redundancysimplify maintenancesave spacepromote re-useFiles are related via a common key.How many transactions has Buffet made? Three. How do we tell that? The unique ID
13 Key Terms Relational DBMS SQL - Structured Query Language manages databases as a collection of files/tables in which all data relationships are represented by common values in related tables (referred to as keys).a relational system has the flexibility to take multiple files and generate a new file from the records that meet the matching criteria (join).SQL - Structured Query LanguageMost popular relational database standard. Includes a language for creating & manipulating data.Note: other types of databases exist (e.g. older - hierarchical. newer - OO). SQL is by far the most popular. It is a standard that most follow.Many desktop applications can generate SQL statements using a GUI without any programming (e.g. Excel, Access). These appellations act as a client & send database access requests (e.g. get records, add record, delete record) to the database server.
14 Using SQL for QueryingSQL (Structured Query Language) Data language English-like, nonprocedural, very user friendly language Free format Example: SELECT Name, Salary FROM Employees WHERE Salary >2000
15 THE VALUE OF QUALITY INFORMATION Five common characteristics of high-quality informationClass Activity: Break your students into groups and ask each group to provide an additional example of each of the five common characteristics of high-quality information that is not provided in Figure 2.4For example, Accuracy – does a purchase price on a bill match the item description on the bill? Item 1: Kids juice cup, cost $10, Chances are a kids juice cup would not cost $10,000 and this is an inaccurate item.
16 THE VALUE OF QUALITY INFORMATION Low-quality information exampleWalk-thru each of the six issues and have your students extrapolate a potential business problem that might be associated with each issue. The example does not state what type of database or spreadsheet this information is contained (sales, marketing, customer service, billing, etc), so allow your students use their imagination when they are extrapolating the potential business problemsIssue 1: Without a first name it would be impossible to correlate this customer with customers in other databases (Sales, Marketing, Billing, Customer Service) to gain a compete customer view (CRM)Issue 2: Without a complete street address there is no possible way to communicate with this customer via mail or deliveries. An order might be sitting in a warehouse waiting for the complete address before shipping. The company has spent time and money processing an order that might never be completedIssue 3: If this is the same customer, the company will waste money sending out two sets of promotions and advertisements to the same customers. It might also send two identical orders and have to incur the expense of one order being returnedIssue 4: This is a good example of where cleaning data is difficult because this may or may not be an error. There are many times when a phone and a fax have the same number. Since the phone number is also in the address field, chances are that the number is inaccurateIssue 5: The business would have no way of communicating with this customer viaIssue 6: The company could determine the area code based on the customer’s address. This takes time, which costs the company money. This is a good reason to ensure that information is entered correctly the first time. All incorrect information needs to be fixed, which costs time and money
17 THE VALUE OF QUALITY INFORMATION The four primary sources of low-quality information include:Online customers intentionally enter inaccurate information to protect their privacyInformation from different systems that have different information entry standards and formatsCall center operators enter abbreviated or erroneous information by accident or to save timeThird party and external information contains inconsistencies, inaccuracies, and errorsAddressing the above sources of information inaccuracies will significantly improve the quality of organizational informationAsk your students to determine a few additional sources of low-quality informationAns: A customer service representative could accidentally transpose a number in an address or misspell a last name
18 Understanding the Costs of Low-quality Information Potential business effects resulting from low-quality informationInability to accurately track customersDifficulty identifying valuable customersInability to identify selling opportunitiesMarketing to nonexistent customersDifficulty tracking revenue due to inaccurate invoicesInability to build strong customer relationships – which increases buyer powerAsk your students if they can list any additional business effects resulting from low-quality informationAsk your students to focus on organizational strategies such as SCM, CRM, and ERPAns: Low-quality information could cause the SCM system to order too much inventory from a supplier based on inaccurate ordersLow-quality information could cause a CRM system to send an expensive promotional item (such as a fruit basket) to the wrong address of one of its best customers
19 Structures Hierarchical: The old way. “Tree”. Access elements by moving down tree.One-to-many.Network: Criss-cross patterns.Many-to-many.Relational: a common element relates “tables” to one another. Permits “ad hoc”.Object-oriented: “objects” have data, processes, and properties “encapsulated” in them.8
21 Pros and Cons Obj. Relat. Ad Hoc Flexibility ==> Net. Hier. Speed ==>10
22 Data Dictionaries The Data Dictionary A reference work of data about data (metadata) compiled by the systemsanalyst to guide analysis and design.As a document, the data dictionary collects, coordinates, and confirms themeaning of data terms to various users throughout the organization.Uses of the Data DictionaryDocumentation, Elimination of data redundancyValidate the data flow diagram for completeness and accuracyProvide a starting point for developing screens and reportsDetermine contents of data stored in filesDevelop the logic for data flow diagram processes11
23 Data Flow Diagrams (“DFD”) ProcessFile or Data StoreSource or Entity
26 New Names, Same IdeasData Mining, OLAPData Warehousing
27 Data Miningautomated information discovery process, uncovers important patterns in existing datacan use neural networks or other approaches. Requires ‘clean’, reliable, consistent data. Historical data must reflect the current environment.e.g. “What are the characteristics that identify when we are likely to lose a customer?”OLAP is user-driven discoveryGIGO - garbage in, garbage out. Results are only as good as the data being mined.Data mining - knoweldge discovery rather than verification. Identifies and characterizes interrelationships among multi-variable dimensions without requiring a human to formulate specific answers. Data mining is about knowledge discover: extracting previously unknown & useful information. Usually use neural networks. Expert Systems - rules are identifiable . Neural Networks - rules are unknown & the machine discovers pattnerns.88% of global 2000 companies believe data mining will be very important or critical to their success by the year 2000.Examples: FLEET: 11th largest bank. Testing a $35 million data warehousing project. 1TB warehouse & two 500 GB marts. Data on consumer & small businesses is fed in from 36, mostly mainframe applications. A commercial mart is created from 34 applications. Annual support costs are $6.5 million for 100 users. Expects profits of $50 million / year from the effort.Banks - notorious for having unprofitable customers & products. Also, what’s more expensive - attracting new customers or retaining existing customers? New customers - so you don’t want existing custoemrs to leave (churn, online industry faces this problem).Reducing attrition by 2% among the top 1/3 of customers would save $20 million/year!Example: FedEx saw a 300% increase in direct mail response rates. Revenue growth from identified accounts increases at an 8-1 ratio compared with the cost of other marketing campaigns.
28 Warehouses & Marts Data Warehouse Data Mart a database designed to support decision-making in an organization. It is batch-updated and structured for fast online queries and exploration. Data warehouses may aggregate enormous amounts of data from many different operational systems.Data Marta database focused on addressing the concerns of a specific problem or business unit (e.g. Marketing, Engineering). Size doesn’t define data marts, but they tend to be smaller than data warehouses.
29 Data Warehouses & Data Marts 3rd party dataData Mart(Marketing)TPS& otheroperational systemsDataWarehouseData Mart(Engineering)= operational clients= query, OLAP, mining, etc.