Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Information and Computer Science

Similar presentations


Presentation on theme: "Introduction to Information and Computer Science"— Presentation transcript:

1 Introduction to Information and Computer Science
Welcome to Introduction to Information and Computer Science: Databases and SQL. This is Lecture (a). The component, Introduction to Information and Computer Science, is a basic overview of computer architecture; data organization, representation and structure; structure of programming languages; networking and data communication. It also includes basic terminology of computing. Databases and SQL Lecture a This material (Comp4_Unit6a) was developed by Oregon Health and Science University, funded by the Department of Health and Human Services, Office of the National Coordinator for Health Information Technology under Award Number IU24OC

2 Databases and SQL Learning Objectives
Define and describe the purpose of databases (Lecture a) Define a relational database (Lecture a) Describe data modeling and normalization (Lecture b) Describe the structured query language (SQL) (Lecture c) Define the basic data operations for relational databases and how to implement them in SQL (Lecture c) Design a simple relational database and create corresponding SQL commands (Lecture c) Examine the structure of a healthcare database component (Lecture d) The Objectives for Databases and SQL are to: Define and describe the purpose of databases Define a relational database Describe data modeling and normalization Describe the structured query language (SQL) Define the basic data operations for relational databases and how to implement them in SQL Design a simple relational database and create corresponding SQL commands Examine the structure of a healthcare database component Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

3 Data Representation It’s all 1s and 0s 01000001 can mean
65 as a binary number ‘A’ as alphanumeric character (ASCII) Many other options, including CPU instructions and multimedia data It is important to review some computer science basics in order to understand the details of information storage. Remember that for a computer, data consists of ones and zeros. In other words, every data value is represented by a combination of binary ones and zeroes, or simply values of on and off, for example, the number [zero-one-zero-zero-zero-zero-zero-one] on this slide. When a computer system examines this value it does not know what it represents. It depends on the system application’s knowledge of the underlying data. If the application is a text editor, it knows that this value represents a capital ‘A’ as defined by the American Standard Code for Information Interchange, better known as ASCII [as key]. On the other hand, in the context of data in a spreadsheet, the cell formatting may indicate that the stored binary number actually represents the number 65. That binary data may also be used in many other ways, including as a central processing unit instruction or as part of an audio or video file. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

4 Data Storage Large component of computer systems is management of data
Storing and retrieving data are important functions Efficiency Speed One very large component of computer systems is the management of data. Consider the information maintained on a personal computer – this might include programs, photos, music, videos, tax returns and class papers, just to name a few. Some files may remain unchanged; others might be modified over time, such as revisions to a class paper. Now consider an electronic healthcare record (EHR) system that may contain information for thousands or tens of thousands of patients. With this volume of information, it is important that information be stored efficiently, is quickly accessible, and has the capacity to be updated. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

5 Data Storage Options Text/data files Spreadsheets Databases
Data can be stored electronically in different ways. The first way is to store it in a simple text file. Another is to store it a spreadsheet, which is more powerful than a simple text file. Finally, data can be stored in databases, which is the topic of this unit. Before discussing databases, this lecture will provide information about the other options for data storage and when they are appropriate to use. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

6 Files A collection of information stored electronically in a single location Can store text or data Files have different formats A file is a collection of information, or data, stored in a single electronic location. How that information is stored in files is also important. Files can contain text or data that is not readable by humans. If data is to be accessed by a person, then it needs to be human-readable; however, a computer system may use a different format – it just needs to know how to interpret the data. For example, an audio file and a text file contain information that is stored in different formats. A text editor cannot edit an audio file, and a music player cannot play a text file. An audio file is not readable by humans, but its data can be interpreted by a music player and converted to the music that humans listen to. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

7 Advantages/Disadvantages of Files
Easy to create and store Easy to share Used by many applications Input or output data from scientific computations Disadvantages Limited security Multiple user access isn't supported Redundant and inconsistent data Files are stored on file systems, which every computer has. Because of that, every computer has the ability to create, use and store files. Files can be easily shared; and shared drives are some options for sharing. They're also used by many applications; for example, a PowerPoint presentation is stored in a file. Also, many scientific applications and instruments use input data files and/or generate output data files. For example, genomic data is often stored in large data files that are searched and parsed by special programs. On the other hand, files have limitations. The security of files is limited to that of the file system. Also, by default, multiple users cannot use the same file at the same time. Usually, one user can open the file for editing; any additional users open a read-only copy of the file. Finally, using files to store structured data with relationships can result in redundancy and inconsistency as shown in the following example. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

8 Contact Information Example File with contact information: Bill Robeson, 1312 Main, Portland, OR, Community Hospital, Inc. Walter Schmidt, 14 12th St., Oakland, CA, Oakland Providers LLC Mary Stahl, 14 12th St., Oakland, CA, Oakland Providers LLC Albert Brookings, 1312 Main, Portland, OR, Community Hospital Incorporated Catherine David, 14 12th Street, Oakland, CA, Oakland Providers LLC This slide shows a file that contains contact information for individuals and their organizations. It contains names, such as Bill Robeson, Walter Schmidt and Mary Small and the corresponding organization names and addresses – there are only two different organizations in this small sample. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

9 Quick! Do Bill and Albert work for the same company?
Is there an issue with Catherine and Walter? Can a computer application tell? Give me a contact list sorted by last name Imagine with 10,000 contacts! After review of the previous slide, answer the following questions: Do Bill and Albert work for the same company? What is the difference between Catherine and Walter’s information? If a computer application was looking at the data would it be able to tell that there was an issue with the addresses for Catherine and Walter? Can you sort this list by last name? Could you sort a list of 10,000 contacts? Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

10 Quick! Answers Bill and Albert work for the same company – but it’s represented differently Catherine and Walter have the same addressed – again represented differently It’s hard for a computer application to tell You CAN sort by hand – but it’s a challenge While it is easy to see that Bill and Albert work for the same company, note that in the file, Bill’s company name is Community Hospital Inc. and Albert’s is Community Hospital Incorporated. There is a similar issue with Catherine and Walter’s information. Catherine’s address is 14 12th Street with street spelled out. Walter has the same address, but notice in this case that 14 12th St. uses the abbreviation for street. Humans can easily handle these variations in data and determine that they are the same. However, a computer system, even one with an artificial intelligence system, would have significantly greater challenges in determining that the companies and addresses are the same. And while sorting these few entries may be feasible just looking at the list, sorting a file with 10,000 contacts would be extremely time-consuming without the use of technology. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

11 Another Problem What do you do if “Community Hospital” becomes “Community General” ? Find every instance of “Community Hospital” or variation thereof Change EVERY entry A bigger challenge might be if “Community Hospital, Inc.” becomes “Community General”. If this change were done manually, or with an automated system, every single instance of “Community Hospital” would have to be identified in the data. Additionally, every different representation of “Community Hospital”, for example, “Community Hospital, Inc.” and “Community Hospital Incorporated”, would also need to be located; and there is no guarantee that the word “Community” was spelled correctly in every instance. Once all of the entries are identified, each one needs to be modified to correctly read “Community General”, If done manually, this still has the potential for human data-entry error and in a large system would be very time-consuming. If it is done using a simple search and replace automated function, it may not take as long, but it may or may not result in partial changes to other existing records, for example, Portland Community Hospital being changed to Portland Community General. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

12 Another Solution: Spreadsheets
Spreadsheet applications store, manipulate and present data Provide more functionality than plain text files Calculations Sorting Filtering Data analysis Spreadsheet applications were first developed for businesses to automate accounting tasks. Today, spreadsheets are widely used for storing, manipulating and presenting data. Today's spreadsheet applications perform calculations using predefined or user-created formulas. They provide features for easily sorting and filtering data and can even perform data analysis. Advanced spreadsheet users can create very complex calculations and relationships between data. Spreadsheets have become very powerful tools for data analysis and manipulation. However, they still have the same limitations as plain text files as shown on the following slide. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

13 OpenOffice Calc spreadsheet example. (PD-US, 2011).
Here is an example of an OpenOffice Calc spreadsheet. (Other spreadsheet applications include Microsoft Excel, IBM Lotus Symphony, Corel QuatroPro, Apple Numbers and Google documents spreadsheets). The data is organized into numbered rows and lettered columns; column names can be provided in the first row. The data itself does not look very different from the data in the simple text file; however, there are vast options for manipulating and presenting this data on the menus above the data. We can sort the data very easily and quickly, unlike plain text files. Regardless, spreadsheets have the same problems as the text file--there is data defined multiple times (company name and address) which is inefficient and error prone. OpenOffice Calc spreadsheet example. (PD-US, 2011). Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

14 Advantages/Disadvantages of Spreadsheets
Widely available Powerful calculations Basic sorting and filtering Disadvantages Limited security Multiple user access isn't supported Redundant and inconsistent data Since spreadsheets are just a special type of file, they have similar advantages and disadvantages. While spreadsheets require a special application such as Microsoft Excel, these applications are widely available. Spreadsheet applications provide powerful calculations and basic sorting and filtering. But like files, they have limited security, multiple user access and a potential for redundant and inconsistent data. Spreadsheets are good for doing calculations on static snapshots of datasets, but they aren't the best solution for long term storage and access of data. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

15 Databases Definition:
Structured data collection accessed electronically Files are simple databases Relational databases maintain relationships between data So what exactly is a database? A database is a structured data collection which is accessed electronically. in other words, it is information stored on a computer for access through a computer program. The text file used in this lecture that contained the contact information can be considered to be a very simple database – it contains organized, though not necessarily consistent, information, that might be accessed through a text editor. A relational database is a database that maintains relationships between data elements and are the focus of this unit. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

16 Relational Database Introduced by Dr. Edgar Codd of IBM Research Laboratory in 1970 “Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).” Definition: An organized collection of data accessible by electronic means where the information type and information relationships are maintained The concept of a relational database was first published by E.F. Codd in “Communications of the ACM” in June Codd held the view that users should not have to keep track of how the information is stored in a computer in order to use it. To quote Codd, “Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).” In other words, users should not have to know whether the binary bits discussed previously in this lecture represent a capital A or the number 65, or even what they are related to – rather, the system should keep track of that information once it is provided by the user. So a relational database is an organized collection of data accessible by electronic means where the information type and information relationships are maintained internally by the system itself. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

17 Relational Database Contents
A relational database contains tables Tables contain multiple rows of data Rows contain data of specified type(s) in a column order Data and type are independent Row order does not matter, but column order does. A relational database consists of one or more tables defined by the database designer in a meaningful fashion. A table is a collection of information organized into rows and columns. Each table contains one or more rows of data. The data in a row is ordered by columns, and each column is of a known and specified type where the data and type are independent. The order of rows in the table is irrelevant, but the order of the columns in the row is significant. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

18 Advantages/Disadvantages of Relational Databases
Secure Multiple user access Relationships prevent redundancy and inconsistency Optimized operations Complex queries Disadvantages Expertise required Limited data calculations A relational database has quite a number of advantages over files and spreadsheets. Database systems are designed to be highly secure; control to data can be precisely defined. In addition, databases are designed to be accessed and modified by multiple users at the same time. Relationships between tables support organized data that prevents data redundancy and inconsistency. The highly optimized underlying data structures used by the relational database result in highly efficient and fast access. Because a database system is designed for the specific purpose of data organization, the basic operations of retrieving, adding, modifying and deleting data are more efficient than general-purpose applications and storage such as spreadsheets and files. Furthermore, relationships and efficient access allow for complex queries and searches of data. On the other hand, databases are complex systems that require expertise to install, maintain and use. There are free, open-source databases, but the commercially available databases are very expensive. In comparison, files and spreadsheets are more widely available and easy to use. Also, data in databases is not as easily analyzed using complex data calculations. Instead, data is usually exported from databases into a spreadsheet or data file for statistical software. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

19 Databases and SQL Summary – Lecture a
Data can be stored in files, spreadsheets or databases Files and spreadsheets Widely available Good for computations Databases Secure Optimized for speed Multiple user access Store relationships This concludes Lecture (a) of Databases and SQL. There are several options for data storage including files, spreadsheets or databases. Files and spreadsheets are widely available and are good for data computations. Databases are very secure and optimized systems for storing, accessing and modifying data over the long term. Multiple users can access and modify data at the same time. Furthermore, relationships are stored in a database along with the data which allows for less data redundancy and inconsistency as well as for complex queries. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a

20 Databases and SQL References – Lecture a
American National Standards Institute. (2007). Information Systems - Coded Character Sets - 7-Bit American National Standard Code for Information Interchange (7-Bit ASCII) (No. ANSI INCITS (R2007)). Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), Images Slide 13: OpenOffice Calc spreadsheet example. (PD-US, 2011). References slide. No audio. Health IT Workforce Curriculum Version 3.0/Spring 2012 Introduction to Information and Computer Science Databases and SQL Lecture a


Download ppt "Introduction to Information and Computer Science"

Similar presentations


Ads by Google