Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 DATABASES. 2 Conventional Files versus the Database File – a collection of similar records.  Files are unrelated to each other except in the code of.

Similar presentations


Presentation on theme: "1 DATABASES. 2 Conventional Files versus the Database File – a collection of similar records.  Files are unrelated to each other except in the code of."— Presentation transcript:

1 1 DATABASES

2 2 Conventional Files versus the Database File – a collection of similar records.  Files are unrelated to each other except in the code of an application program.  Data storage is built around the applications that use the files. Database – a collection of interrelated files  Records in one file (or table) are physically related to records in another file (or table).  Applications are built around the integrated database

3 3 Files Versus Database

4 4 Pros and Cons of Conventional Files Pros Easy to design because of their single- application focus Excellent performance due to optimized organization for a single application Easy to design because of their single- application focus Excellent performance due to optimized organization for a single application

5 5 Cons Harder to adapt to sharing across applications Harder to adapt to new requirements Need to duplicate attributes in several files.

6 6 Pros and Cons of Databases Pros Data independence from applications increases adaptability and flexibility Superior scalability Ability to share data across applications Less, and controlled redundancy (total non-redundancy is not achievable)

7 7 Cons More complex than file technology Somewhat slower performance Investment in DBMS and database experts Need to adhere to design principles to realize benefits Increased vulnerability due to consolidating data in a centralized database

8 8 Data is stored in some combination of:  Conventional files  Operational databases – data bases that support day-to-day operations and transactions for an information system. Also called transactional databases.

9 9  Data warehouses – databases that store data extracted from operational databases. To support data mining  Personal databases  Work group databases

10 10 A Modern Data Architecture

11 11 Data administrator – a database specialist responsible for data planning, definition, architecture, and management. Database administrator – a specialist responsible for database technology, database design, construction, security, backup and recovery, and performance tuning.  A database administrator will administer one or more databases

12 12 Why Use A Database? Data overload is a common problem in business today. Corporations and individuals have plenty of raw data, but can't always find it or aren't aware that they even have it. Raw data must be filtered and organized to become useful information. Databases are a primary tool for the task; a tool which takes advantage of the speed and power of modern computers.

13 13 Why Design a Database? Goal:  To produce an information system that adds value for the user  Reduce costs  Increase sales/revenue  Provide competitive advantage Objective:  To understand the system  To improve it  To communicate with users and IT staff

14 14 Requirements Collection and Analysis This task results in a concise set of user requirements, which should be detailed and complete. The functional requirements should be specified, as well as the data requirements. Functional requirements consist of user operations that will be applied to the database, including retrievals and updates. Functional requirements can be documented using diagrams such as sequence diagrams, data flow diagrams etc.

15 15 Designing Systems Designs are a model of existing & proposed systems:  They provide a picture or representation of reality  They are a simplification  Someone should be able to read your design (model) and describe the features of the actual system. You build models by talking with the users  Identify processes  Identify objects  Determine current problems and future needs  Collect user documents (views) Break complex systems into pieces and levels

16 16 Conceptual Design Once the requirements are collected and analyzed, the designers go about creating the conceptual schema (model). Conceptual schema: concise description of data requirements of the users, and includes a detailed description of the entity types, relationships and constraints. The concepts do not include implementation details; therefore the end users easily understand them, and they can be used as a communication tool. The conceptual schema is used to ensure all user requirements are met, and they do not conflict.

17 17 Entity Relationship (ER) Model The most popular high-level conceptual data model is the ER model. It is frequently used for the conceptual design of database applications. The diagrammatic notation associated with the ER model, is referred to as the ER diagram. ER diagrams show the basic data structures and constraints.

18 18 Entity Relationship (ER) cont… The basic object of an ER diagram is the entity. An entity represents a ‘thing’ in the real world. Examples of entities might be a physical entity, such as a student, a house, a product etc, or conceptual entities such as a company, a job position, a course, etc. Entities have attributes, which basically are the properties/characteristics of a particular entity.

19 19 Entity Relationship (ER) cont… EntityAttributesValues CarColorRed MakeVolkswagen ModelBora Year2000

20 20 KEY An important constraint on entities of an entity type is the uniqueness constraint. A key attribute is an attribute whose values are distinct for each individual entity in the entity set. The values of the key attribute can be used to identify each entity uniquely. Sometimes a key can consist of several attributes together, where the combination of attributes is unique for a given entity. This is called a composite key.

21 21 Relationships Each time an attribute of one entity type refers to another entity type, some relationship exists. In ER diagrams, these references should be represented as relationships, rather than attributes. Relationships between entities are represented using a diamond shape.

22 22 Relationships…. Wor ks for Employee Depart ment

23 23 Summary of ER, EER Diagram Notation  Strong Entities Weak Entities Attributes Multi Valued Attributes Composite Attributes Relationships Entity Name Relation ship Name

24 24 Constraints 1:N – One Customer buys many products, each product is purchased by only one customer. N:1 - Each customer buys at most one product, each product can be purchased by many customers. Purchases Customer Product 1N Purchases Customer Product N1

25 25 ER DIAGRAM – Entity Types are: EMPLOYEE, DEPARTMENT, PROJECT, DEPENDENT

26 26 COMPANY ER Schema Diagram using (min, max) notation

27 27 Transforming an Entity Type to a Relation

28 28 Figure Representing a 1:N Relationship

29 29 CLASS DIAGRAMS Class: Description of an entity, that includes its attributes (properties) and behavior (methods). Object:One instance of a class with specific data. Property:A characteristic or description of a class or entity. Method:A function that is performed by the class. Association:A relationship between two or more classes.

30 30 Entities/Classes

31 31 Association Example Employee Name... Component CompID Type Name Product ProductID Type Name * ** Assembly EmployeeID CompID ProductID Multiplicity is defined as the number of items that could appear if the other N-1 objects are fixed. Almost always “many.” 1 1 1

32 32 Example of relationships Customer CustomerID Phone FirstName LastName Address ZipCode CityID BalanceDue Customer Transaction CustomerID TransactionDate EmployeeID Amount Description Reference Retail Store StoreID StoreName Phone ContactFirstName ContactLastName Address ZipCode CityID Bicycle::Bicycle BicycleID … CustomerID StoreID … 1…1 0…* 1…1 0…* 0…1

33 33 CustomerID Phone FirstName LastName Address ZipCode CityID BalanceDue Customer CustomerID TransDate EmployeeID Amount Description Reference CustomerTrans StoreID StoreName Phone ContacFirstName ContactLastName Address Zipcode CityID RetailStore State TaxRate StateTaxRate SerialNumber CustomerID ModelType PaintID FrameSize OrderDate StartDate ShipDate ShipEmployee FrameAssembler Painter Construction WaterBottle CustomName LetterStyleID StoreID EmployeeID TopTube ChainStay HeadTubeAngle SeatTueAngle ListPrice SalePrice SalesTax SaleState ShipPrice FramePrice ComponentList Bicycle CityID ZipCode City State AreaCode Population1990 Population1980 Country Latitude Longitude Customer ModelType Description ComponentID ModelType Paint EmployeeID TaxpayerID LastName FirstName HomePhone Address ZipCode CityID DateHired DateReleased CurrentManager SalaryGrade Salary Title WorkArea Employee SerialNumber TubeID Quantity BicycleTube ModelType MSize TopTube ChainStay TotalLength GroundClearance HeadTubeAngle SeatTubeAngle ModelSize LetterStyle Description LetterStyle PurchaseID EmployeeID ManufacturerID TotalList ShippingCost Discount OrderDate ReceiveDate AmountDue PurchaseOrder SerialNumber TubeName TubeID Length BikeTubes SerialNumber ComponentID SubstituteID Location Quantity DateInstalled EmployeeID BikeParts PurchaseID ComponentID PricePaid Quantity QuantityReceived PurchaseItem ManufacturerID ManufacturerName ContactName Phone Address ZipCode CityID BalanceDue Manufacturer CompGroup GroupName BikeType Year EndYear Weight Groupo ComponentID ManufacturerID ProductNumber Road Category Length Height Width Weight Year EndYear Description ListPrice EstimatedCost QuantityOnHand Component ManufacturerID TransactionDate EmployeeID Amount Description Reference ManufacturerTrans TubeID Material Description Diameter Thickness Roundness Weight Stiffness ListPrice Construction TubeMaterial GroupID ComponentID GroupCompon ComponentName AssemblyOrder Description ComponentName PaintID ColorName ColorStyle ColorList DateIntroduced DateDiscontinued

34 34 Planning and analysis  Data modeling is preceded by planning and analysis.  The effort devoted to this stage is proportional to the scope of the database.  The planning and analysis of a database intended to serve the needs of an enterprise will require more effort than one intended to serve a small workgroup.

35 35  An accurate and up-to-date data model can serve as an important reference tool for DBAs, developers, and other members of a JAD (joint application development) team.

36 36  By building quality into the project, the team reduces the overall time it takes to complete the project, which in turn reduces project development costs.  An effective data model completely and accurately represents the data requirements of the end users. It is simple enough to be understood by the end user yet detailed enough to be used by a database designer to build the database.

37 37  The model eliminates redundant data, it is independent of any hardware and software constraints, and can be adapted to changing requirements with a minimum of effort.

38 38  Data modeling is a bottom up process. A basic model, representing entities and relationships, is developed first. Then detail is added to the model by including information about attributes and business rules.  The information needed to build a data model is gathered during the requirements analysis.

39 39  The requirements analysis is usually done at the same time as the data modeling.  As information is collected, data objects are identified and classified as either entities, attributes, or relationship; assigned names; and, defined using terms familiar to the end-users. The objects are then modeled and analyzed using an ER or class diagram.

40 40  The diagram can be reviewed to determine its completeness and accuracy, and/or modified.  The review and edit cycle continues until the model is certified as correct.

41 41 Points to note a)Talk to the end users about their data in "real-world" terms. Users do not think in terms of entities, attributes, and relationships but about the actual people, things, and activities they deal with daily.

42 42 b) Take the time to learn the basics about the organization and its activities that you want to model. Having an understanding about the processes will make it easier to build the model. c) End-users typically think about and view data in different ways according to their function within an organization. Therefore, it is important to interview the largest number of people that time permits.

43 43 What makes an object an entity or attribute? For example, given the statement "employees work on projects". Should employees be classified as an entity or attribute? Very often, the correct answer depends upon the requirements of the database. In some cases, employee would be an entity, in some it would be an attribute.

44 44 Some commonly given guidelines are: entities contain descriptive information attributes either identify or describe entities relationships are associations between entities

45 45 Achieving a Well-Designed Database A table should have an identifier. A table should store only data for a single type of entity. A table should avoid nullable columns. A table should not have repeating values or columns.

46 46 Some Common Database Design Mistakes 1.Poor design/planning 2.Ignoring normalization 3.Poor naming standards 4.Lack of documentation 5.Lack of testing

47 47 1.Poor Design/Planning "If you don't know where you are going, any road will take you there" – George Harrison

48 48 2. Ignoring Normalization Normalization defines a set of methods to break down tables to their constituent parts until each table represents one and only one "thing", and its columns serve to fully describe only the one "thing" that the table represents.

49 49 Normalization Normalization is a database design approach that seeks the following four objectives: i.minimization of data redundancy, ii.minimization of data restructuring, iii.minimization of I/O by reduction of transaction sizes, and iv.enforcement of referential integrity.

50 50 Normalization…. Consider the following example Customer table:  A payment does not describe a Customer and should not be stored in the Customer table.  Details of payments should be stored in a Payment table, in which you could also record extra information about the payment, like when the payment was made, and what the payment was for.

51 51 3.Poor naming standards Consistency. The names you choose are not just to enable you to identify the purpose of an object, but to allow all future programmers, users, and so on to quickly and easily understand how a component part of your database was intended to be used, and what data it stores.

52 52 Poor naming standards ……  Present to the users clear, simple, Descriptive names, such as Customer and Address.  Avoid names such as: - colVarcharAddress - X304_DSCR These mean nothing to the user.  The usage of dashes, spaces, digits and special characters is discouraged

53 53 4.Lack of Documentation Poorly documented code is a synonym for "job security." Your goal should be to provide enough information that when you turn the database over to a support programmer, they can figure out your minor bugs and fix them.

54 54 Lack of Documentation….. In many cases, you may want to include sample values, where the need arose for the object, and anything else that you may want to know in a year or two when "future you" has to go back and make changes to the code.

55 55 5.Lack of Testing Proper test plan takes into consideration all possible types of failures, codes them into an automated test, and tries them over and over. Good testing won't find all of the bugs, but it will get you to the point where most of the issues that correspond to the original design are ironed out.

56 56 DATABASE SECURITY SECURITY CONCERNS AND MEASURES

57 57 Database Integrity Database integrity ensures that data entered into the database is accurate, valid, and consistent. Any applicable integrity constraints and data validation rules must be satisfied before permitting a change to the database. Business applications have several similar problems such as: Multiple users trying to change the same data Multiple changes need to be made concurrently

58 58 Database Integrity…. For example: A customer uses the ATM and instructs it to transfer 20,000 shillings from the savings account to the current account. This transaction require two steps – 1) subtracting money from the savings account 2) adding money to the current account These are two updates or SQL statements. If the system crashes in between, the customer could loose their money.

59 59 Database Integrity….. How does the computer know that both operations must be completed at the same time? As an application developer, you must tell the computer system what operations belong to a transaction. You do this by marking the start and the end of all transactions inside the code. This would ensure that all the updates complete together or fail together

60 60 Database Integrity…… Concurrent access can also be problematic. An example is if two people try to change the same data at the same time. Some data could be overwritten and lost. One solution is to prevent concurrent access by forcing transactions to be completely isolated.

61 61 Concurrent Access  Multiple users or processes changing the same data at the same time.  Final data will be wrong! Force sequential  Locking  Delayed, batch updates Two processes  Receive payment ($200)  Place new order ($150) Initial balance $800  Result should be $800 - 200 + 150 = $750  Interference result is either $600 or $950 IDBalance Jones$800 $600 $950 Customers 1) Read balance800 2) Subtract pmt-200 4) Save new bal.600 3) Read balance800 5) Add order150 6) Write balance950 Receive PaymentPlace New Order

62 62 Pessimistic Locks: Serialization One answer to concurrent access is to prevent it. When a transaction needs to alter data, it places a SERIALIZABLE lock on the data used, so no other transactions can even read the data until the first transaction is completed. IDBalance Jones$800 $600 Customers 1) Read balance800 2) Subtract pmt-200 4) Save new bal.600 3) Read balance Receive error message that it is locked. Receive PaymentPlace New Order SET TRANSACTION SERIALIZABLE, READ WRITE

63 63 Database Integrity The concept of integrity is fundamental to databases. One of the strengths of the database approach is that the DBMS has tools to handle the common problems. In terms of transactions, many of these concepts can be summarized in the acronym ACID. The following figure shows the meaning of the term.

64 64 ACID Transactions Atomicity: all changes succeed or fail together. Consistency: all data remain internally consistent (when committed) and can be validated by application checks. Isolation: The system gives each transaction the perception that it is running in isolation. There are no concurrent access issues. Durability: When a transaction is committed, all changes are permanently saved even if there is a hardware or system failure.

65 65 Referential Integrity Referential integrity is a property of data that applies (or fails to apply) to a database as a whole. In this sense, referential integrity means that in the database as a whole, things are set up in such a way that if a column exists in two or more tables in the database (typically as a primary key in one table and as a foreign key in one or more other tables), then any change to a value in that column in any one table will be reflected in corresponding changes to that value where it occurs in other tables. This means that the RDBMS must be set up so as to take appropriate actions to spread a change—in one table—from that table to the other tables where the change must also occur.

66 66 Database Security The major technical areas of computer security are usually represented by the initials CIA: confidentiality, integrity, and authentication or availability. Confidentiality means that information cannot be access by unauthorized parties. Confidentiality is also known as secrecy or privacy. Integrity means that information is protected against unauthorized changes that are not detectable to authorized users; Authentication means that users are who they claim to be. Availability means that resources are accessible by authorized parties.

67 67 Database Security Database security is the system, processes, and procedures that protect a database from unintended activity. Unintended activity can be categorized as authenticated misuse, malicious attacks or inadvertent mistakes made by authorized individuals or processes. Database security is also a specialty within the broader discipline of computer security

68 68 Database Security cont…. Traditionally databases have been protected from external connections by firewalls on the network perimeter with the database environment existing on the internal network. Additional network security devices that detect and alert on malicious database protocol traffic include network intrusion detection systems along with host-based intrusion detection systems. Database security is more critical as networks have become more open.

69 69 Firewalls firewall is a part of a computer system or network that is designed to block unauthorized access while permitting authorized communications. It is a device or set of devices that is configured to permit or deny network transmissions based upon a set of rules and other criteria.

70 70 Firewall Cont…. Firewalls can be implemented in either hardware or software, or a combination of both. Firewalls are frequently used to prevent unauthorized Internet users from accessing private networks connected to the Internet, especially intranets. All messages entering or leaving the intranet pass through the firewall, which inspects each message and blocks those that do not meet the specified security criteria.

71 71 Firewall

72 72 Vulnerability Assessments An important procedure when evaluating database security is performing vulnerability assessments against the database. A vulnerability assessment attempts to find vulnerability holes that could be used to break into the database. Database administrators or information security administrators run vulnerability scans on databases to discover a breach of controls, along with known vulnerabilities within the database software. The results of the scans should be used to harden the database in order to mitigate the threat of compromise by intruders.

73 73 Database Security Cont… A database security program should include the regular review of permissions granted to individually owned accounts and accounts used by automated processes. The accounts used by automated processes should have appropriate controls around password storage such as sufficient encryption and access controls to reduce the risk of compromise

74 74 Database Security cont… In conjunction with a sound database security program, an appropriate disaster recovery program should exist to ensure that service is not interrupted during a security incident or any other incident that results in an outage of the primary database environment. An example is that of replication for the primary databases to sites located in different geographical regions.

75 75 Database Security cont… Native database audit capabilities are also available for many database platforms. The native audit trails are extracted on a regular basis and transferred to a designated security system where the database administrators do not have access. This ensures a certain level of segregation of duties that may provide evidence that the native audit trails were not modified by authenticated administrators.

76 76 Database Forensics A forensic examination of a database may relate to the timestamps that apply to the update time of a row in a relational table being inspected and tested for validity in order to verify the actions of a database user. Alternatively, a forensic examination may focus on identifying transactions within a database system or application that indicate evidence of wrong doing, such as fraud. The forensic study of relational databases requires a knowledge of the standard used to encode data on the computer disk.

77 77 Physical Security Hardware  Preventing problems Fire prevention Site considerations Building design  Hardware backup facilities Continuous backup (mirror sites) Hot sites Shell sites “Sister” agreements  Telecommunication systems  Personal computers Data and software  Backups  Off-site backups  Personal computers Policies and procedures Network backup Disaster planning  Write it down  Train all new employees  Test it once a year  Telecommunications Allowable time between disaster and business survival limits.

78 78 Threats The primary threat to any company comes from insiders. Employees must be trusted, because in order for them to do their jobs they need access to the computers and the database. Once they are granted access it becomes more difficult to control what they do. Another threat comes from programmers.

79 79 Threats… One technique used by programmers is to insert a time bomb in a program. A time bomb requires a programmer to enter a secret code every day. If the programmer is sacked or leaves work and cannot enter the code, the program starts deleting files. In other cases programmers have deliberately created programs that alter data or transfer money to their accounts.

80 80 Managerial Controls “Insiders”  Hiring  Termination  Monitoring  Job segmentation  Physical access limitations Locks Guards and video monitoring Badges and tracking

81 81 Managerial Controls….. Consultants and Business alliances  Limited data access  Limited physical access  Paired with employees

82 82 Logical Security Unauthorized disclosure. Unauthorized modification. Unauthorized withholding. Disclosure example  Letting a competitor see the strategic marketing plans. Modification example  Letting employees change their salary numbers. Withholding example  Preventing a finance officer from retrieving data needed to get a bank loan.

83 83 Basic Security Ideas Limit access to hardware  Physical locks.  Video monitoring.  Fire and environment monitors.  Employee logs / cards.  Dial-back modems Monitor usage  Hardware logs.  Access from network nodes.  Software and data usage. Background checks  Employees  Consultants Dialback modem  User calls modem  Modem gets name, password  Modem hangs up phone  Modem calls back user  Machine gets final password phone company phone company 1 4 5 2 3 Jones 1111 Smith 2222 Olsen 3333 Araha 4444

84 84 Separation of Duties SupplierIDName… 673Acme Supply 772Basic Tools 983Common X Supplier OrderIDSupplierID 8882772 8893673 8895009 PurchaseOrder Referential integrity Clerk must use SupplierID from the Supplier table, and cannot add a new supplier. Purchasing manager can add new suppliers, but cannot add new orders.

85 85 Encryption Protection for open transmissions  Networks  The Internet  Weak operating systems Single key (AES) Dual key  Protection  Authentication Plain text message Encrypted text Key: 9837362 AES Encrypted text Plain text message AES Single key: e.g., AES

86 86 Dual Key Encryption Using Bob’s private key ensures it came from him. Using Alice’s public key means only she can read it. Alice Bob Public Keys Alice 29 Bob 17 Private Key 13 Private Key 37 Use Bob’s Public key Use Bob’s Private key Message Encrypt+T Encrypt+T+M Encrypt+M Use Alice’s Public key Use Alice’s Private key Transmission

87 87 Backup and Recovery Backups are crucial! Offsite storage! Scheduled backup.  Regular intervals.  Record time.  Track backups. Journals / logs Checkpoint Rollback / Roll forward OrdIDOdateAmount... 1922/2/01252.35… 1932/2/01 998.34… OrdIDOdateAmount... 1922/2/01 252.35… 1932/2/01 998.34… 1942/2/01 77.23... OrdIDOdateAmount... 1922/2/01252.35… 1932/2/01998.34… 1942/2/0177.23… 1952/2/01101.52… Snapshot Changes Journal/Log

88 88 Database Security Authorization, Access Control:  protect intranet from hordes: Firewalls Confidentiality, Data Integrity:  protect contents against snoopers: Encryption Authentication:  both parties prove identity before starting transaction: Digital certificates Non-repudiation:  proof that the document originated by you & you only: Digital signature

89 89 What can go wrong? Security issues Intruders  Casual prying (read other peoples e-mail, documents, etc.)  Snooping by insiders  Determined attempt to make money  Commercial or military espionage  Simply for fun or to prove it can be done How to deal with intruders  Identify every user  Advise users to log off when they leave their desk  Limit the privileges of users  Log files to monitor users activity  Encryption  Etc.

90 90 Insiders What could some of the employees do?  Read other people’s emails  Attempt to read documents and access information that is NOT intended for their eyes  Commercial espionage  Install unauthorised software

91 91 Insiders….. How to prevent all of the above?  Each employee should log in the system using a unique username / password  Advice all employees not to disclose their password to anyone  Advice all employees to log off when they leave their desk  Advice all employees to change their password regularly

92 92 Insiders…..  Put in place a system that tracks employees actions and network resources accessed  Limit privileges of employees allowing them to perform only authorised tasks and obtain only authorised information  Encrypt or password protect all confidential documents / data  Any other measures?

93 93 Outsiders What could they do?  As a hobby, prove that “it can be done”  Commercial and military espionage  Access bank accounts  Access and use other people’s credit card details  Shut down systems, etc.

94 94 Outsiders…. How to prevent outsiders gaining access to resources  Identify every user of the system  Put in place a system that tracks users actions and network resources accessed  Encrypt confidential documents / data  Put firewalls in place to protect the network  Keep all software and operating systems up to date to prevent hackers exploit security holes

95 95 Have a security policy in place and ENFORCE IT Have clear guidelines as how security should be implemented Management has to make sure that all IT technicians apply all the security measures Management has to make sure that all employees are aware of the security measures and apply them Technology used to implement security guidelines  Sophisticated tools used to analyse, interpret, configure and monitor the state of the network security

96 96 Identify each user….  Install access control programs and physical security devices on all systems. Access control programs run extra checks on users before allowing access. Physical security devices include biometric scanning devices fitted to a computer which check a user’s face, retina, fingerprint, hand, voice, typing rhythm, signature and so on against a set of stored data for all legitimate users. Make sure to delete the accounts of employees no longer working for the company

97 97 Monitor the network Security monitor  Test and monitor the state of the network security Technology used to monitor the network  Network log files that record Who logged in, for how long, from which computer, what resources they have accessed, etc.

98 98 Monitor the network…..  Network vulnerability scanners  Antivirus software  Disaster recovery backup technology Check security logs and audit trails regularly Conduct regularly a through risk analysis of the network Have a disaster recovery plan

99 99 Monitor and restrict access from outside into the network Monitor remote access into the network by  Allowing only a limited number of attempts to log in  Block the account if all attempts to log in are unsuccessful  Use log files to monitor the resources accessed by remote users Put firewalls in place before allowing Internet access

100 100 Database Security Summary Stay aware of data security holes Explore possible third-party options Perform audits tests on your databases regularly Encryption of data in motion Encryption of data at rest within the database

101 101 Monitor your log files Implement Intrusion Detection p.s Provide multiple levels of security The data stored in a database is managed by a Data Base Management System (DBMS).

102 102 Data Warehousing A data warehouse is where information is organized for quick retrieval. Data is got from different sources (usually databases) set up for different purposes

103 103 Differences to Traditional Database Data is organized around major subjects rather than individual transactions Summarized data is used rather than detailed data Data is framed for long time decision making They are organized for quick queries not so much for efficient storage

104 104 Optimized for complex queries known as OLAP (online analytical processing). Allows managers to look at a database at different dimensions Allows easy access via data mining (swift ware) that searches for patterns and is able to identify relationships

105 105 Include multiple databases that have been processed so that data is uniform (clean data) They include data from outside sources and the one generated internally Building a warehouse is complex. An analyst gathers information from a variety of sources, translates it into a common form e.g. a database of gender could be “male” “female”, another one could have “M” and “F” while a third one could have “0” and “1”

106 106 Once clean, the analyst has to decide how to summarize data and predict the type of queries that might be asked (details are usually lost during summarization). The warehouse is then designed both logically and physically Note: the analyst must know a lot about the business. Because of its size, a warehouse is expensive

107 107 Data Mining Data mining can identify patterns that human is unable to detect The data mining algorithms search data warehouses for patterns. It is known by another name Knowledge Data Discovery (KDD).

108 108 Software for Data Mining Known as decision aids include: Statistical analysis software Neural networks Fuzzy networks Intelligent argents Logic and data visualization

109 109 Patterns that decision makers try to identify include: Associations: Patterns that occur together at the same time. For example, a person who buys milk usually buys bread Sequences: Actions that take place over a period of time, e.g. if a family buys a house this year, they will most likely buy a fridge and cooker next year.

110 110 Clustering: A pattern that develops among a group of people. e.g. Customers who live in a particular area tend to buy a particular product Trends: Patterns that are noticed over a period of time. E.g. Customers may move from buying processed food to natural foods (herbal products) or African attires

111 111 Data mining also targets customers. Assuming that past behavior is a good predictor for the future. A large amount of data is captured from a particular person and companies share this information. Credit companies have taken advantage of this where they target customers.

112 112 Problems with Data Mining Cost could be too high to justify data mining Coordination of several customers or departments could be problematic Customers could resent their privacy being invaded and reject the offers that are coming their way Erroneous profiles could be made of people, stored, and not deleted. The police could act on these profiles without meeting the people

113 113 Ethical Issues Analysts should take the responsibilities for considering the ethical aspects of any data mining projects that are proposed. Length of time the material is kept Privacy safe guards should be installed Confidentially of the material The uses to which inferences are put should be asked and considered with the client.

114 114 The opportunities for abuse are apparent and must be guarded against. For consumers, data mining is a push technology and if consumers do not want to be pushed, data mining efforts could back fire.

115 115 Data Warehousing Internal Data sources External Data sources accounting databases Operational databases Customer databases Manufacturing databases Historical databases External databases Extract and transform Data extraction and transformation Extract Filter Transform Classify Aggregate Summarize Custome r Data Product data Sales data Data access and analysis OLAP Data Mining Querying Reporting Data warehouses Business intelligence Integrated Subject oriented Time-variant Non-volatile Data

116 116 END


Download ppt "1 DATABASES. 2 Conventional Files versus the Database File – a collection of similar records.  Files are unrelated to each other except in the code of."

Similar presentations


Ads by Google