Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Systems for Health:

Similar presentations


Presentation on theme: "Information Systems for Health:"— Presentation transcript:

1 Information Systems for Health:
Health Informatics in Low- and Middle-Income Countries Short Course for Health Information System Professionals Information Systems for Health: Databases and Structured Query Language (SQL)

2 Databases and SQL Learning Objectives
Define and describe the purpose of databases Describe a relational database Describe data modeling and normalization The learning objectives for Databases and SQL are to: Define and describe the purpose of databases Describe a relational database Describe data modeling and normalization Describe the structured query language, or SQL

3 Data Storage Data management is a large component of computer systems.
Storing and retrieving data are important functions for: Efficiency Accessibility Speed This lecture explores various data storage options and the advantages and disadvantages of each. A large component of computer systems is the data management. Consider the information maintained on a personal computer; it might include programs, photos, music, videos, tax returns, and class papers. Some files may remain unchanged; others might be modified over time, such as revisions to a class paper. Now consider an electronic health care record, or EHR, system that may contain information for thousands or tens of thousands of patients. With this volume of information, it is important that the information is stored efficiently, is accessible easily, and can be updated without too much difficulty.

4 Data Storage Options Text and data files Spreadsheets Databases
Data can be stored electronically in different ways. One way is to store them in a simple text file. Another is to store them in a spreadsheet, which is more powerful than a simple text file. Finally, data can be stored in databases, which is the topic of this unit. Before discussing databases, this lecture will provide information about the other options for data storage and when it is appropriate to use them.

5 Files A collection of data stored electronically in a single location
Can store text or data Can have different formats A file is a collection of data, stored in a single electronic location. How that information is stored in files is important. Files can contain text or data that are not readable by humans. If the data are to be accessed by a person, then they need to be human-readable. However, a computer system may use a different format; it just needs to know how to interpret the data. For example, an audio file and a text file contain information that is stored in different formats. A text editor cannot edit an audio file, and a music player cannot play a text file. An audio file is not readable by humans, but its data can be interpreted by a music player and converted to the music that humans listen to.

6 Advantages and Disadvantages of Files
Easy to create and store Easy to share Used by many applications Disadvantages Limited security Multiple user access is not supported Redundant and inconsistent data Files are stored within file systems, specific for each operating system. Files can be easily created and shared; and shared drives are some of the sharing options. Files can be accessed and used by different applications. For example, genomic data are often stored in large data files that are searched and parsed by different programs. Files have limitations. The security of files is limited to that of the file system. Also, by default, multiple users cannot use the same file at the same time. Usually, only one user can open the file for editing; any additional users open a read-only copy of the file. Finally, using files to store structured data with relationships can result in redundancy and inconsistency, as shown in the following example.

7 Contact Information Example File with contact information: Sriveni Sharma, 1312 Main, Portland, OR, Community Hospital, Inc. Walter Chen, 14 12th St., Oakland, CA, Oakland Providers LLC Rachel Cohen, 1414 Main St., Oakland, CA, Oakland Providers LLC Karthik Subramanian, 1312 Main Street, Portland, OR, Community Hospital Incorporated Kelly David, 14 12th Street, Oakland, CA, Oakland Providers LLC This slide shows a file that contains names and contact information of some individuals.

8 Quick! Do Sriveni and Karthik work for the same company?
Is there an issue with Kelly’s and Walter’s information? Can a computer application tell? Give me a contact list sorted by last name. Imagine if the list had 10,000 contacts! After reviewing the previous slide, answer the following questions: Do Sriveni and Karthik work for the same company? If a computer application was looking at the data, would it be able to tell that there was an issue with the addresses for Kelly and Walter? Can you sort this list by last name? Could you sort a list of 10,000 contacts?

9 Quick! Answers Sriveni and Karthik work for the same company, but it is represented differently. Kelly and Walter have the same address, but it is represented differently. It is hard for a computer application to tell. You CAN sort by hand—but it is a challenge. While it is easy to see that Sriveni and Karthik work for the same company, note that Sriveni’s company name is Community Hospital Inc. and Karthik’s is Community Hospital Incorporated. There is a similar issue with Kelly’s and Walter’s information. Kelly’s address is 14 12th Street with “street” spelled out. Walter has the same address, but notice in his case the word “street” is abbreviated to St. Humans can easily handle these variations in data and determine that they are the same. However, a computer system, even one with an artificial intelligence system, would have significantly greater challenges in determining that the companies and addresses are the same. And while sorting these few entries may be feasible just looking at the list, sorting a file with 10,000 contacts would be extremely time-consuming without the use of software.

10 Another Problem What do you do if “Community Hospital” becomes “Community General”? Find every instance of “Community Hospital” or variation thereof. Change EVERY entry. A bigger challenge might be if “Community Hospital, Inc.” becomes “Community General”. If this change were done manually, or with an automated system, every single instance of “Community Hospital” would have to be identified in the data. Additionally, every name variation of “Community Hospital”, for example, “Community Hospital, Inc.” and “Community Hospital Incorporated”, would also need to be located; and there is no guarantee that the word “Community” was spelled correctly in every instance. When all of the entries are identified, each one needs to be modified to correctly read “Community General”. If done manually, this still has the potential for human data-entry error and in a large system would be very time-consuming. If it is done using a simple search and replace automated function, it may not take as long, but it may or may not result in partial changes to other existing records, for example, Portland Community Hospital being changed to Portland Community General.

11 Another Solution: Spreadsheets
Spreadsheet applications store, manipulate, and present data. Spreadsheets provide more functionality than plain text files by offering: Calculations Sorting Filtering Data analysis Spreadsheet applications were first developed for businesses to automate accounting tasks. Today, spreadsheets are widely used for storing, manipulating, and presenting data. Today's spreadsheet applications perform calculations using predefined or user-created formulas. They provide features for easily sorting and filtering data and can even perform data analysis. Advanced spreadsheet users can create very complex calculations and relationships between data. Spreadsheets have become very powerful tools for data analysis and manipulation. However, they still have the same limitations as plain text files, as shown on the following slide.

12 Spreadsheet Example Figure 1: OpenOffice Calc spreadsheet example
Here is an example of an OpenOffice Calc spreadsheet, other spreadsheet applications include Microsoft Excel, Zoho Sheet, Sheetster, Apple iWork Numbers, and Google Sheets. The data are organized into numbered rows and lettered columns; column names can be seen in the first row. The data itself do not look very different from the data in the simple text file; however, there are vast options for manipulating and presenting these data, shown on the tool bars. We can sort the data very easily and quickly, unlike plain text files. However, spreadsheets have the same problems as the text file: data are defined multiple times—for example, company name and address—which is inefficient and error-prone. Figure 1: OpenOffice Calc spreadsheet example (Barratt, 2016)

13 Advantages and Disadvantages of Spreadsheets
Widely available Powerful calculations Basic sorting and filtering Disadvantages Limited security Multiple user access is not supported May contain redundant and inconsistent data Because spreadsheets are just a special type of file, they have similar advantages and disadvantages. While spreadsheets do require a special application, such as Microsoft Excel, these applications are widely available. Spreadsheet applications provide powerful calculations and basic sorting and filtering. But like files, they have limited security and multiple user access, and may contain redundant and inconsistent data. Spreadsheets are good for doing calculations on static snapshots of data sets, but they aren't the best solution for long-term storage and access of data.

14 Database vs. File Systems
File system challenges: Data redundancy Different and conflicting versions of same data Results of uncontrolled data redundancy Data anomalies Modification Insertion Deletion Data inconsistency Lack of data integrity Database consists of logically related data stored in a single repository Provides advantages over file system management approach Eliminates inconsistency, data anomalies, data dependency, and structural dependency problems Stores data structures, relationships, and access paths 14

15 Databases Definition:
Structured data collection accessed electronically Files can be seen as simple databases Relational databases maintain relationships between data So what exactly is a database? A database is a structured data collection that is accessed electronically. The text file used in this lecture that contained contact information can be considered to be a very simple database. It contains organized, though not necessarily consistent, information, which might be accessed through a text editor. A relational database is one that maintains relationships between data elements and is the focus of this unit.

16 Relational Databases Introduced by Dr. Edgar Codd of IBM Research Laboratory in 1970 “Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).” Definition: An organized collection of data accessible by electronic means in which the information type and information relationships are maintained The concept of a relational database was first published by E.F. Codd in the journal “Communications of the ACM,” in June Codd held the view that users should not have to keep track of how the information is stored in a computer in order to use it. To quote Codd, “Future users of large data banks must be protected from having to know how the data is organized in the machine, the internal representation.” So a relational database is an organized collection of data accessible by electronic means where the information type and information relationships are maintained internally by the system itself.

17 Database Design Process
Conceptual database design Process of constructing a model of information used in an enterprise, independent of all physical considerations. Logical database design Process of constructing a model of information used in an enterprise based on a specific data model (e.g. relational), but independent of a particular DBMS and other physical considerations. Physical database design Process of producing a description of the implementation of the database on secondary storage; it describes the base relations, file organizations, and indexes design used to achieve efficient access to the data, and any associated integrity constraints and security measures.

18 Advantages and Disadvantages of Relational Databases
Secure Multiple user access Relationships prevent redundancy and inconsistency Optimized operations Complex queries Disadvantages Expertise required Limited data calculations A relational database has quite a number of advantages over files and spreadsheets. Database systems are designed to be highly secure; control of the data can be precisely defined. In addition, databases are designed to be accessed and modified by multiple users at the same time. Relationships between tables support organized data that prevents data redundancy and inconsistency. The highly optimized underlying data structures used by the relational database result in highly efficient and fast access. Because a database system is designed for the specific purpose of data organization, the basic operations of retrieving, adding, modifying, and deleting data are more efficient than in general-purpose applications and storage such as spreadsheets and files. Furthermore, relationships and efficient access allow for complex queries and searches of data. However, databases are complex systems that require expertise to install, maintain, and use. There are free, open-source databases, but commercially available databases are expensive. In comparison, files and spreadsheets are more widely available and easy to use. Also, data in databases are not as easily analyzed using complex data calculations. Instead, data are usually exported from databases into a spreadsheet or data file for statistical software.

19 Database Management System Application Design
1 Identify business rules. 2 Define tables and relationships. Jerry Post (2007): Steps in database design. The business rules and data are used to define database tables. Forms are used to enter new data. The database system retrieves data to answer queries and produce reports. Users see only the application in terms of forms and reports. 3 Create input forms and reports. 4 Combine as applications for users.

20 Relational Database Contents
Tables contain multiple rows and columns of data. A relational database contains tables. Rows contain data of specified types in a column order. Data and type are independent. Row order does not matter, but column order does. A table is a collection of information organized into rows and columns. Each table contains one or more rows of data. A relational database consists of one or more tables defined by the database designer in a meaningful fashion. The data in a row is ordered by columns, and each column is of a known and specified type where the data and type are independent. The order of rows in the table is irrelevant, but the order of the columns in the row is significant.

21 Representing Data—1 Identify entities (tables)
Identify attributes (columns) Identify table relationships This lecture introduces the topics of data relationship types and structures, database normalization, and building a database. A database contains a set of tables each containing data. Each table is modeled by an entity—some unit of data that can be classified and have stated relationships to other entities. For example, a table can be a list of contacts. Within a table, rows are the entries, and columns are attributes that need to be defined. The relationships between different tables also need to be defined.

22 Representing Data—2 Identify entities (tables)
Identify attributes (columns) Identify table relationships Data can be modeled using an entity-relationship diagram, also known as an ER diagram or ERD. These diagrams show the entities, the attributes, and the relationships between entities. The relationships are shown using lines and can also be represented in words, such as "has a“: for example, if there are two tables, one for storing information about a patient. Figure 1.1: Entity-relationship diagram

23 Representing Data—3 Identify entities (tables)
Identify attributes (columns) Identify table relationships Data can be modeled using an entity-relationship diagram, also known as an ER diagram or ERD. These diagrams show the entities, the attributes, and the relationships between entities. The relationships are shown using lines and can also be represented in words, such as "has a.“ For example, if there are two tables, one for storing information about a patient. . . Patient Figure 1.2: Entity-relationship diagram with Patient table

24 Representing Data—4 Identify entities (tables)
Identify attributes (columns) Identify table relationships …then that relationship can be described as "has a" relationship, since a patient has an insurance provider. This relationship can be represented as a simple line, but the next slide will show how it can be represented differently. Patient Insurance Company Figure 1.3: Entity-relationship diagram with patient and insurance company tables

25 Relationships—1 One-to-one One-to-many Many-to-many
Relationships can be of three types or cardinalities: one-to-one, one-to-many, and many-to-many. In a one-to-one relationship, each entry in a table is linked to exactly one entry in another table and vice-versa. One-to-one relationships are not very common in databases, because two such tables would usually be merged into one. A one-to-one relationship is represented by a simple line. The one-to-many relationship is commonly used in databases…

26 Relationships—2 One-to-one One-to-many Many-to-many
…In this case, each entry in the first table is linked to one entry in the second, but each entry in the second table is linked to multiple entries in the first table. Figure 1.4: Entity-relationship diagram with one-to-many relationship 26

27 Relationships—3 One-to-one One-to-many Many-to-many Patient
In the example of a patient and an insurance company, it can be assumed that a patient may have only ONE insurance provider. Therefore, each entry in the patient table is linked to just one entry in the second table. But, there may be multiple patients in the patient table who have the same insurance provider. One-to-many relationships are represented by a line with a crow's foot on one end. That crow’s foot points to the table that has the multiple links back to the other table. In this example, multiple patients can have one insurance provider, so the crow's foot is pointing to the patient table. Patient Insurance Company Figure 1.5: Entity-relationship diagram with one-to-many relationship between Insurance Company table and Patient table

28 Relationships—4 One-to-one One-to-many Many-to-many Patient
Now if a patient has more than one insurance provider, the relationship is considered to be many-to-many. Patient Insurance Company Figure 1.6: Entity-relationship diagram with tables for patient and insurance company

29 Relationships—5 One-to-one One-to-many Many-to-many Patient
In this case, an entry in the patient table can be linked to multiple entries in the insurance table and each entry in the insurance table can be linked to multiple entries in the patient table. Many-to-many relationships are represented by a line with a crow's foot at each end. Patient Insurance Company Figure 1.7: Entity-relationship diagram with many-to-many relationship between Patient and Insurance Company tables

30 Requirements: Patient Visit
Identify the Entities, Attributes, and Relationship Patients make visits to doctors located in medical centers. Patients and medical centers have addresses that are stored in an Addresses table for easy validation. Medical procedures such as x-rays are carried out on patients. Each procedure has a cost, and the total cost of procedures for each visit is derived and stored. During a visit, the doctor might prescribe medication for the patient. Each medication has a cost, and the total cost is derived and stored. Source: Barry Williams DatabaseAnswers.org

31 Entities Patients Doctors Visits Medical centers Addresses
Medical procedures Medication

32 2/16/2019 Samuel Dery

33 Simple Database Structure
One entity/table Create a unique row identifier Attributes Key (integer) Person’s first name (varchar) Person’s last name (varchar) Company name (varchar) Company address (varchar) Company city (varchar) Company state (char) Figure 2: Contact attributes Consider a database containing contacts data: for each contact, the name and company information are recorded. A simple database structure can be created using a single entity that looks like the contact file. The data about a contact comprise the attributes of the contact entity. A key, identification number, is then added to uniquely identify each row. The contact’s name is not necessarily unique, so it cannot be used to identify each entry. For example, a user must be able to tell the difference between a “Maria Perez” who works at “Community Hospital” and a “Maria Perez” who works at “Main Medical Center.” The data types associated with each data element also need to be specified. For the names and addresses, character data type, char or varchar, should be used; the key is usually an integer value, since it is more efficient to do numeric comparisons than text comparisons.

34 Contact Table Problems Company data are stored in multiple locations.
Figure 3: Contact table This slide shows an example of a contact table with some entries in it. The attributes of the table are the columns. Each entry in the table is a row. However, the company data are stored in multiple locations, and as a result, it is easy for the company data to be entered inconsistently. Additionally, if the general information about the company changes, such as a relocation to a new address or a name change, the entries in all the rows will need to be changed. Problems Company data are stored in multiple locations. Company data may be inconsistent. Significant work is needed to update data when company information changes.

35 Data Normalization Primarily used as a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data The process of decomposing relations with anomalies to produce smaller, well-structured relations

36 Well-Structured Relations
A relation that contains minimal data redundancy and allows users to insert, delete, and update rows without causing data inconsistencies Goal is to avoid anomalies Insertion anomaly—adding new rows forces user to create duplicate data Deletion anomaly—deleting rows may cause a loss of data that would be needed for other future rows Modification anomaly—changing data in a row forces changes to other rows because of duplication Rule of thumb: A table should not pertain to more than one entity type.

37 Database Normalization
Prevent data inconsistency Prevent update errors Eliminate data redundancy To address these issues, the database can be normalized. Normalization is a process that helps ensure that the database design is optimized and the data are consistent, by reducing the possibility of update errors and reducing data redundancy. Early concepts of data normalization were defined by E. F. Codd, in the journal “Communications of the ACM,” in 1970.

38 Normalized Database Structure
In this case, errors can be reduced by looking again at the data. The relationship between the people and their companies is a simple relationship; however, separating this information into two sets—one for the information about the person and the other for information about the company—would provide more distinct access to the details of each set. By separating the single database table into two tables, there are no longer multiple copies of the company information within one table. This action meets some of the requirements of database normalization by eliminating redundant copies of the company information and by creating a structure that can prevent the problem of inconsistent company data in the database. As with the single table design, there needs to be a key, or unique identifier, for each of the two tables. Just as it is possible to have two people with the same name at different companies, it is possible to have two different companies with the same name in different parts of the country. Therefore, the company name should not be used as the unique identifier. Instead, an integer key will be used for the company table and shorter names for some of the attributes in both tables can be used now that the attributes that belong to a person and to a company have been separated. One more step is needed to connect the person to the company. To do this, a field can be added to the person table for the integer key representing the company, or the CompanyKey, associated with that particular contact. In the person table, this CompanyKey would be considered a foreign key which refers to the primary key in the company table. A foreign key is a field or group of fields in one table that uniquely identifies a row in a different table. Specifying foreign keys in tables is how relationships are built within a database. Note that the cardinality of this relationship is one to many. Each person has one company, but each company may have more than one person related to it. Figure 4: Normalized database structure Two tables: Person and Company Each has a unique row identifier—a primary key Need to link the company to the contact

39 New Tables This slide shows what the two tables would look like now using the same previous data. The person table now has fewer columns than the old contact table; instead of storing the company name, company address, company city, and company state, it just stores the company key. The company table includes a key along with the name, address, city, and state.  The person table has two entries, like the old contact table: one for Bill and one for Albert. The company table has only one entry, however, for Community Hospital, Inc. Both contacts are from the same company, so they both have the same value in the CompanyKey column. Presumably for a larger data set there would be multiple entries in the company table. Figure 5: Top: New Person table using same data as previous Contact table Bottom: New Company table using same data as previous Contact table

40 How Do We Do This? Database Management System Create Maintain Use
Many available (NOT an exhaustive list) Oracle Microsoft SQL Server IBM DB2 MySQL PostgreSQL This lecture has shown some examples of data models and diagrams as well as sample tables. There are several steps to turn this design into a database. First, a database program and its database management system, or DBMS, need to be installed on a server. This database management system will allow the database administrator to create and maintain the database, and users to access the database. There are several different database management systems, listed on this slide: commercial ones like Oracle, Microsoft SQL Server, and IBM DB2, and open source ones like MySQL and PostgreSQL, or Postgres, for short.

41 Building a Database Database management system tools Create tables
Create relationships Control access More… Structured Query Language (SQL) Database management Access to data After the database and DBMS are installed, then tables and relationships can be created. DBMSs often include software tools that allow users to create tables, add relationships, control access to the tables, and other functionality such as adding procedures and triggers. The Structured Query Language, or SQL, is a querying language that also provides database management functionality as well as ways to access, insert, update, and delete data stored in the database. SQL will be covered in more detail in another lecture.

42 Why Do We Need Queries Natural languages (English) are too vague.
With complex questions, it can be hard to verify that the question was interpreted correctly and that the answer we received is truly correct. Consider the question: Who are our best customers? We need a query system with more structure. We need a standardized system so users and developers can learn one method that works on any (most) systems. SQL

43 Four Questions to Create a Query
What output do you want to see? What do you already know (or what constraints are given)? What tables are involved? How are the tables joined together? According to Jerry Post (2007), four questions are needed to create a query. Every query is built by asking these four questions. These questions are vital. Always note this any time you are going to write SQL language.

44 Critical Success Factors in Database Design
Work interactively with users as much as possible. Follow a structured method throughout the data modeling process. Employ a data-driven approach. Incorporate structural and integrity considerations into the data models. Use diagrams to represent as much of the data models as possible. Build a data dictionary to supplement the data model diagrams. Be willing to repeat steps.

45 Databases and SQL Summary—1
Data can be stored in files, spreadsheets, or databases Files and spreadsheets - Widely available - Good for computations Databases - Secure - Optimized for speed - Multiple user access - Store relationships This concludes Lecture A of Databases and SQL. There are several options for data storage including files, spreadsheets, or databases. Files and spreadsheets are widely available and are good for data computations. Databases are secure and optimized systems for storing, accessing, and modifying data over the long term. Multiple users can access and modify data at the same time. Furthermore, relationships are stored in a database along with the data, which allows for less data redundancy and inconsistency as well as for complex queries.

46 Databases and SQL Summary—2
Databases can be modeled using entity-relationship diagrams. Relationships can have three different types of cardinality. Normalization avoids duplicate data. Relationships are created using foreign keys. In summary, this lecture demonstrated how databases can be modeled using entity-relationship diagrams and how to use those diagrams to design a simple database. Data relationships can have three different cardinalities: one-to-one, one-to-many, and many-to-many. The design of a database can be improved through normalization: a structure that removes duplicate data that would have otherwise been stored in multiple rows. Examples were provided of one-to-many relationships and how relationships are created using foreign keys.

47 Databases and SQL References
American National Standards Institute. (2007). Information Systems - Coded Character Sets - 7-Bit American National Standard Code for Information Interchange (7-Bit ASCII). (No. ANSI INCITS (R2007)). Chen, P. P. (1976). The entity-relationship model—Toward a unified view of data. ACM Transactions on Database Systems, 1(1). Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), International Organization for Standardization. (2008). Information technology -- Database languages -- SQL (No. ISO/IEC 9075-(1-4,9- 11,13,14)). Kent, W. (1983). A simple guide to five normal forms in relational database theory. Communications of the ACM, 26(2), References slide. No audio.

48 Databases and SQL References
Figures Figure 1: Baratt, A. (2016). OpenOffice Calc spreadsheet example. Public domain. Figures 1.1–1.7: Entity-relationship diagrams. (2012). Public domain. Figure 2: Contact attributes. (2012). Public domain. Figure 3: Contact table. (2012). Public domain. Figure 4: Normalized database structure. (2012). Public domain. Figure 5: Top: New Person table using same data as previous Contact table. Bottom: New Company table using same data as previous Contact table. (2012). Public domain. References slide. No audio.

49 This material was developed by Oregon Health & Science University, funded by the Department of Health and Human Services, Office of the National Coordinator for Health Information Technology under Award Number 90WT0001. This presentation was produced with the support of the United States Agency for International Development (USAID) under the terms of MEASURE Evaluation cooperative agreement AID-OAA-L MEASURE Evaluation is implemented by the Carolina Population Center, University of North Carolina at Chapel Hill in partnership with ICF International; John Snow, Inc.; Management Sciences for Health; Palladium; and Tulane University. Views expressed are not necessarily those of USAID or the United States government. No audio.


Download ppt "Information Systems for Health:"

Similar presentations


Ads by Google