Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Computer Science

Similar presentations


Presentation on theme: "Introduction to Computer Science"— Presentation transcript:

1 Introduction to Computer Science
Databases and SQL Welcome to Introduction to Computer Science: Databases and Structured Query Language, or SQL. This is lecture a. The component, Introduction to Computer Science, is a basic overview of computer architecture; data organization, representation and structure; structure of programming languages; networking and data communication. It also includes the basic terminology of computing. Lecture a This material (Comp 4 Unit 5) was developed by Oregon Health & Science University, funded by the Department of Health and Human Services, Office of the National Coordinator for Health Information Technology under Award Number 90WT0001. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit Health IT Workforce Curriculum Version 4.0

2 Databases and SQL Learning Objectives - 1
Define and describe the purpose of databases (Lecture a) Describe a relational database (Lecture a) Describe data modeling and normalization (Lecture b) Describe the structured query language (SQL) (Lecture c) The learning objectives for Databases and SQL are to: Define and describe the purpose of databases Describe a relational database Describe data modeling and normalization Describe the structured query language, or SQL

3 Databases and SQL Learning Objectives - 2
Define the basic data operations for relational databases and how to implement them in SQL (Lecture c) Design a simple relational database and create corresponding SQL commands (Lecture c) Examine the structure of a health care database component (Lecture d) Define the basic data operations for relational databases and how to implement them in SQL Design a simple relational database and create corresponding SQL commands And, examine the structure of a health care database component

4 Data Storage Large component of computer systems is data management
Storing and retrieving data are important functions Efficiency Accessibility Speed This lecture explores various data storage options and the advantages and disadvantages of each. A large component of computer systems is the data management. Consider the information maintained on a personal computer – this might include programs, photos, music, videos, tax returns and class papers, just to name a few. Some files may remain unchanged; others might be modified over time, such as revisions to a class paper. Now consider an electronic health care record, or EHR, system that may contain information for thousands or tens of thousands of patients. With this volume of information, it is important that the information is stored efficiently, accessible easily, and can be updated without too much difficulty.

5 Data Storage Options Text/data files Spreadsheets Databases
Data can be stored electronically in different ways. One way is to store it in a simple text file. Another is to store it in a spreadsheet, which is more powerful than a simple text file. Finally, data can be stored in databases, which is the topic of this unit. Before discussing databases, this lecture will provide information about the other options for data storage and when it is appropriate to use them.

6 Files A collection of data stored electronically in a single location
Can store text or data Files have different formats A file is a collection of data, stored in a single electronic location. How that information is stored in files is important. Files can contain text or data that is not readable by humans. If the data is to be accessed by a human, then it needs to be human-readable; however, a computer system may use a different format – it just needs to know how to interpret the data. For example, an audio file and a text file contain information that is stored in different formats. A text editor cannot edit an audio file, and a music player cannot play a text file. An audio file is not readable by humans, but its data can be interpreted by a music player and converted to the music that humans listen to.

7 Advantages/Disadvantages of Files
Easy to create and store Easy to share Used by many applications Disadvantages Limited security Multiple user access is not supported Redundant and inconsistent data Files are stored within file systems, specific for each operating system. Files can be easily created and shared; and shared drives are some of the sharing options. Files can be accessed and used by different applications. For example, genomic data is often stored in large data files that are searched and parsed by different programs. On the other hand, files have limitations. The security of files is limited to that of the file system. Also, by default, multiple users cannot use the same file at the same time. Usually, only one user can open the file for editing; any additional users open a read-only copy of the file. Finally, using files to store structured data with relationships can result in redundancy and inconsistency, as shown in the following example.

8 Contact Information Example File with contact information: Sriveni Sharma, 1312 Main, Portland, OR, Community Hospital, Inc. Walter Chen, 14 12th St., Oakland, CA, Oakland Providers LLC Rachel Cohen, 1414 Main St., Oakland, CA, Oakland Providers LLC Karthik Subramanian, 1312 Main Street, Portland, OR, Community Hospital Incorporated Kelly David, 14 12th Street, Oakland, CA, Oakland Providers LLC This slide shows a file that contains names and contact information of some individuals.

9 Quick! Do Sriveni and Karthik work for the same company?
Is there an issue with Kelly’s and Walter’s information? Can a computer application tell? Give me a contact list sorted by last name Imagine with 10,000 contacts! After reviewing the previous slide, answer the following questions: Do Sriveni and Karthik work for the same company? If a computer application was looking at the data, would it be able to tell that there was an issue with the addresses for Kelly and Walter? Can you sort this list by last name? Could you sort a list of 10,000 contacts?

10 Quick! Answers Sriveni and Karthik work for the same company, but it is represented differently Kelly and Walter have the same address, but it is represented differently It is hard for a computer application to tell You CAN sort by hand – but it is a challenge While it is easy to see that Sriveni and Karthik work for the same company, note that Sriveni’s company name is Community Hospital Inc. and Karthik’s is Community Hospital Incorporated. There is a similar issue with Kelly’s and Walter’s information. Kelly’s address is 14 12th Street with “street” spelled out. Walter has the same address, but notice in his case the word “street” is abbreviated to St. Humans can easily handle these variations in data and determine that they are the same. However, a computer system, even one with an artificial intelligence system, would have significantly greater challenges in determining that the companies and addresses are the same. And while sorting these few entries may be feasible just looking at the list, sorting a file with 10,000 contacts would be extremely time-consuming without the use of software.

11 Another Problem What do you do if “Community Hospital” becomes “Community General”? Find every instance of “Community Hospital” or variation thereof Change EVERY entry A bigger challenge might be if “Community Hospital, Inc.” becomes “Community General”. If this change were done manually, or with an automated system, every single instance of “Community Hospital” would have to be identified in the data. Additionally, every name variation of “Community Hospital”, for example, “Community Hospital, Inc.” and “Community Hospital Incorporated”, would also need to be located; and there is no guarantee that the word “Community” was spelled correctly in every instance. When all of the entries are identified, each one needs to be modified to correctly read “Community General”. If done manually, this still has the potential for human data-entry error and in a large system would be very time-consuming. If it is done using a simple search and replace automated function, it may not take as long, but it may or may not result in partial changes to other existing records, for example, Portland Community Hospital being changed to Portland Community General.

12 Another Solution: Spreadsheets
Spreadsheet applications store, manipulate, and present data Provide more functionality than plain text files Calculations Sorting Filtering Data analysis Spreadsheet applications were first developed for businesses to automate accounting tasks. Today, spreadsheets are widely used for storing, manipulating, and presenting data. Today's spreadsheet applications perform calculations using predefined or user-created formulas. They provide features for easily sorting and filtering data and can even perform data analysis. Advanced spreadsheet users can create very complex calculations and relationships between data. Spreadsheets have become very powerful tools for data analysis and manipulation. However, they still have the same limitations as plain text files, as shown on the following slide.

13 Spreadsheet Example Here is an example of an OpenOffice Calc spreadsheet, other spreadsheet applications include Microsoft Excel, Zoho Sheet, Sheetster, Apple iWork Numbers and Google Sheets. The data is organized into numbered rows and lettered columns; column names can be seen in the first row. The data itself does not look very different from the data in the simple text file; however, there are vast options for manipulating and presenting this data shown on the tool bars. We can sort the data very easily and quickly, unlike plain text files. However, spreadsheets have the same problems as the text file – there is data defined multiple times, for example company name and address, which is inefficient and error prone. Figure 1: OpenOffice Calc spreadsheet example. (Barratt, 2016, PD-US)

14 Advantages/Disadvantages of Spreadsheets
Widely available Powerful calculations Basic sorting and filtering Disadvantages Limited security Multiple user access is not supported May contain redundant and inconsistent data Because spreadsheets are just a special type of file, they have similar advantages and disadvantages. While spreadsheets do require a special application, such as Microsoft Excel, these applications are widely available. Spreadsheet applications provide powerful calculations and basic sorting and filtering. But like files, they have limited security and multiple user access, and may contain redundant and inconsistent data. Spreadsheets are good for doing calculations on static snapshots of datasets, but they aren't the best solution for long term storage and access of data.

15 Databases Definition Files can be seen as simple databases
Structured data collection accessed electronically Files can be seen as simple databases Relational databases maintain relationships between data So what exactly is a database? A database is a structured data collection which is accessed electronically. The text file used in this lecture that contained contact information can be considered to be a very simple database – it contains organized, though not necessarily consistent, information, that might be accessed through a text editor. A relational database is a database that maintains relationships between data elements and is the focus of this unit.

16 Relational Databases Introduced by Dr. Edgar Codd of IBM Research Laboratory in 1970 “Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation)” Definition An organized collection of data accessible by electronic means where the information type and information relationships are maintained The concept of a relational database was first published by E.F. Codd in the journal “Communications of the ACM” in June Codd held the view that users should not have to keep track of how the information is stored in a computer in order to use it. To quote Codd, “Future users of large data banks must be protected from having to know how the data is organized in the machine, the internal representation.” So a relational database is an organized collection of data accessible by electronic means where the information type and information relationships are maintained internally by the system itself.

17 Relational Database Contents
Tables contain multiple rows and columns of data A relational database contains tables Rows contain data of specified type(s) in a column order Data and type are independent Row order does not matter, but column order does A table is a collection of information organized into rows and columns. Each table contains one or more rows of data. A relational database consists of one or more tables defined by the database designer in a meaningful fashion. The data in a row is ordered by columns, and each column is of a known and specified type where the data and type are independent. The order of rows in the table is irrelevant, but the order of the columns in the row is significant.

18 Advantages/Disadvantages of Relational Databases
Secure Multiple user access Relationships prevent redundancy and inconsistency Optimized operations Complex queries Disadvantages Expertise required Limited data calculations A relational database has quite a number of advantages over files and spreadsheets. Database systems are designed to be highly secure; control of the data can be precisely defined. In addition, databases are designed to be accessed and modified by multiple users at the same time. Relationships between tables support organized data that prevents data redundancy and inconsistency. The highly optimized underlying data structures used by the relational database result in highly efficient and fast access. Because a database system is designed for the specific purpose of data organization, the basic operations of retrieving, adding, modifying, and deleting data are more efficient than in general-purpose applications and storage such as spreadsheets and files. Furthermore, relationships and efficient access allow for complex queries and searches of data. On the other hand, databases are complex systems that require expertise to install, maintain, and use. There are free, open-source databases, but commercially available databases are very expensive. In comparison, files and spreadsheets are more widely available and easy to use. Also, data in databases is not as easily analyzed using complex data calculations. Instead, data is usually exported from databases into a spreadsheet or data file for statistical software.

19 Databases and SQL Summary – Lecture a
Data can be stored in files, spreadsheets, or databases Files and spreadsheets Widely available Good for computations Databases Secure Optimized for speed Multiple user access Store relationships This concludes lecture a of Databases and SQL. There are several options for data storage including files, spreadsheets, or databases. Files and spreadsheets are widely available and are good for data computations. Databases are very secure and optimized systems for storing, accessing, and modifying data over the long term. Multiple users can access and modify data at the same time. Furthermore, relationships are stored in a database along with the data which allows for less data redundancy and inconsistency as well as for complex queries.

20 Databases and SQL References – Lecture a
American National Standards Institute. (2007). Information Systems - Coded Character Sets - 7-Bit American National Standard Code for Information Interchange (7-Bit ASCII). (No. ANSI INCITS (R2007)). Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), Figures Figure 1: OpenOffice Calc spreadsheet example. Baratt, A. (2016). Public Domain. References slide. No audio.

21 Introduction to Computer Science Databases and SQL Lecture a
This material was developed by Oregon Health & Science University, funded by the Department of Health and Human Services, Office of the National Coordinator for Health Information Technology under Award Number 90WT0001. No audio.


Download ppt "Introduction to Computer Science"

Similar presentations


Ads by Google