We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byCorinne Rickert
Modified over 2 years ago
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 1 Data Management Using R – Interfacing with the Structured Query Language STAT 7550 – Statistical Computing Utah State University November 21, 2008 Bill Welbourn
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 2 Objectives of the Project Introduce the notion of the database. Database applications. General overview of the SQLite Relational Database Management System (RDBMS). Explain how R 1 (and other programming languages) interfaces with the SQLite RDBMS. Highlights of the R commands for the interface to the SQLite RDBMS, the RSQLite library. A working example, demonstrating the procedure for storing and retrieving an R dataframe within a SQLite database. Further motivation for the use of the R-SQLite interface, working with “massive” databases. 1 R Development Core Team (2008), Version 2.8.0.
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 3 What is a Database? Essentially a series of structured files on a computer that are organized in a highly efficient manner. The organization is comprised in a hierarchical manner, from the “top, down,” as shown in Figure 1 below. Figure 1: The anatomy of a database Database Table Column RowField RowField Table Column RowField RowField Table Column RowField RowField
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 4 Components of a Table As Figure 1 suggests, at the highest level, a database is comprised of a series of tables. Each table is made up of a series of columns. Think of the columns as characteristics (variables) collected for a study. Data is stored in rows of the table, where each row of the table is called a record. Records of a table are essentially synonymous with observations for a study. The location where each row intersects a column is known as a field. Each table contains specific, common data. A table of a database is analogous to a worksheet within an Excel workbook.
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 5 What is a Relational Database? It is a database comprised of tables which relate to one another. The table relationships are based on “Key fields.” To illustrate, consider two relational tables within a database. A column of each table is affixed with the same naming convention, say “ID.” Each field within these columns is designated a unique (key) label, so that a one-to-one (relationship) mapping between the tables is obtained. Table i IDColumn 2…Column k Row 1Key 1Field i (1,2)…Field i (1,k) Row 2Key 2Field i (2,2)…Field i (2,k) …………… Row nKey nField i (n,2)…Field i (n,k) Table j Column 1ID…Column m Row 1Field j (1,1)Key 1…Field j (1,m) Row 2Field j (2,1)Key 2…Field j (2,m) …………… Row nField j (n,1)Key n…Field j (n,m) Figure 2: Example of two relational tables
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 6 Three Types of Relationships One-to-One A record in table one must have a record in table two, and vice-versa. Example: More variables (columns) were collected in a study than allowed to be stored in a single database table. Study participants (observations/records) are labeled with unique ID’s which allow for the table-to-table relationship to be established. One-to-Many A record in table one has many corresponding records in table two, while table two has many records which correspond to a single record in table one. Example: Study participant identifiers (unique ID’s) are stored in table one, while repeated measurements are stored in table two. Many-to-Many Like the one-to-many relationship, table one has many corresponding records in table two. However, unlike the one-to-many relationship, table two has many corresponding records in table one. Example: Customer product orders. Each order can contain multiple products, and one product can be in many orders.
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 7 Database Applications World Wide Web. Medical Data. Data analysis situations which warrant the consideration in utilizing a database: You possess a flat (text) file(s) with an inordinate number of observations. You have collected an insurmountable quantity of characteristics for your observations (e.g., genetic data). You are ready to execute a large simulation analysis. You need to prepare a portable file, so that another statistician has easy access to your data. Anytime there is data in your possession. It is fairly straightforward to maintain a SQL database in R.
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 8 The SQLite RDBMS Created by D. Richard Hipp. Version 1.0 released August 17, 2000. Most recent version, 3.6.4, released October 15, 2008. An ACID (Atomicity, Consistency, Isolation, Durability) compliant RDBMS. In computer science, ACID is a set of properties which guarantee that database transactions (logical operations) are processed reliably. Contained in a relatively small (~500kB) C programming library. It is not a database, rather a system which manages databases. Microsoft Access, in contrast, is simply a program used to create a database.
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 9 The SQLite RDBMS (cont.) The columns of a table for a SQL database, typically are assigned a “type” (e.g., string, integer, float, double). This is analogous to defining variables in the C programming language. However, SQLite (automatically) assigns types to individual values. Allows for multithread reading of a database. The writing of a database can only occur if no other access to the database is present. Interfacing with programming languages (e.g., BASIC, C, C++, Perl, Ruby, and R). Most widely deployed SQL RDBMS.
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 10 Interfacing the SQLite RDBMS Consists of three components: The application (such as R) which requires access to the database; an interface; and the RDBMS. An interface acts as an interpreter, translating commands from the application, so that the database is accessible to the user. In R, the interface lies within the DBI library. The interface communicates with the database via the applicable database driver. The database driver knows how to “talk” to the database. In R, the SQLite database driver and the source (C library) for the SQLite engine are included within the RSQLite library.
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 11 The R-SQLite Interface Figure 3: The process flow between the application and the database Application (e.g., R) Interface (e.g., DBI library in R) Database Driver (e.g., SQLite in R)
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 12 Accessing a SQLite DB A five step cycle: 1)Connect to the database. In R, to establish the connection to a database, issue the commands: dbDriver(); and dbConnect(). 2)Issue a query or command to the database. To issue a query in R, the command, dbSendQuery(), is (typically) used. Queries consist of SQL commands. setwd("c:/SQL"); library(DBI); library(RSQLite) dbfile<-"DATA.dbsql " ; drv<-dbDriver("SQLite") con<-dbConnect(drv,dbname = dbfile) rs <- dbSendQuery(con, "select v1,v2,v3 from Table1 where v1==1") rs<-dbSendQuery(con,"select * from Table1") Brief Summary of SQL Commands SQL CommandTabular Parameters Required of SQL Command SelectColumn Label(s) FromTable Name(s) WhereSpecific Values for Column Label(s) Order byColumn Label(s)
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 13 Accessing a SQLite DB (cont.) 3)If a query was issued, we need to retrieve the applicable recordset. To do this in R, we use the command, fetch(). The recordset will exist as a dataframe in R. 4)Clear the query result, manipulate the recordset, and update the database. To clear a query in R, use the command, dbClearResult(). To update a database, issue the R command, dbWriteTable(). 5)Close the connection to the database. To do this in R, use the command, dbDisconnect(). d1<-fetch(rs, n = -1) dbDisconnect(con) dbClearResult(rs) dbWriteTable(con, “Table”, data frame, append, row.names, overwrite)
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 14 Example 1 You have recorded n (unique) observations (records) for a study, and collected m-1 characteristics (excluding the unique observation identifier) for each observation. Having the option to store your data as a flat file or as a (single table) SQL database, which should you choose? To address this issue, you decide to conduct a (small) simulation analysis, investigating data retrieval times for the two types of data repositories. Figure 4, shown on the subsequent slide, displays the results from a simulation, where m=50 and each (of the 49) characteristic is of type “double.” A total of 100 distinct values of n, n 1,…,n 100, were chosen, in accordance to the rule
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 15 Example 1 (cont.) Figure 4: Flat file – DB comparison
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 16 Example 2 You have recorded data for n (unique) participants of a study, and collected m-1 characteristics (excluding the unique observation identifier) for each participant. Further, for the i th participant, you have recorded a total of i record(s). Having the option to store your data as a flat file or as a (single table) SQL database, which should you choose? Given the unique identifier for a participant, suppose it is desirable to have quick access to the records for each participant. Figures 5 and 6, shown on the subsequent slides, display the results of data retrieval times, where n=200 and n=500, respectively, m=50, where each variable type is “double”. The displayed value for the vertical axis, is the required time to read in the i records for the i th participant.
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 17 Example 2 (cont.) Figure 5: Flat file – DB comparison, marginal read I
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 18 Example 2 (cont.) Figure 6: Flat file – DB comparison, marginal read II
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 19 Example 3 You have recorded data for n (unique) participants of a study, and collected allele types at more than two million (2x10 6 ) single nucleotide polymophism (SNP) sites in the human nuclear genome. How could we utilize a relational database to represent the repository for these data? It turns out that a SQLite database table is limited to 999 columns. So, we simply create a sufficient number of tables (each with say m columns), and populate the tabular columns with the SNP data, making sure to create a “Key” column for each table. Figures 7 and 8, display the required time (by database table) to retrieve the first two columns – the “Key” along with a column of data – for SQL databases of size n=2,500 and n=25,000, respectively. For each database, a total of 2,106 tables were created, where m=951 columns for each database table. That is, these two databases, comprise slightly greater than five billion and 50 billion fields, respectively.
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 20 Example 3 (cont.) Figure 7: Retrieval Time for a Massive SQL Database I
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 21 Example 3 (cont.) Figure 8: Retrieval Time for a Massive SQL Database II
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 22 Example 3 (cont.) Figure 9: Retrieval Time for a Massive SQL Database III
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 23 Conclusion Database advantages Ability to store an extraordinary quantity of data. On a Windows NT platform, a SQL table can be as large as 2TB; on 64-bit operating systems, there is virtually no limit to the size of a SQL table. A single file could be utilized as a central data warehouse. Ability to create tabular relationships. Database indexing makes for very fast data retrieval. Portability. Interfacing with programming languages. Multithreading. Database disadvantages Can be a bit of a challenge to recall, “What data, lives in which table?” There is no “safety net” when it comes to overwriting data in a database. Essentially having to learn a new programming (querying) language. Database administration is industry requires continuous careful maintenance.
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 24 References Hogan, R (2002). A practical guide to database design. Prentice Hall, Englewood Cliffs, NJ. This is a good resource to obtain a working knowledge of what a database is all about. It covers the issues (e.g., who will use the database, and what should the database contain), say an employer, would consider prior to implementing a database in practice. It does not, however, discuss how to create and maintain the database, from a programming point of view (I.e., the SQL commands). Maslakowski M, Butcher T (2000). SAMS teach yourself MySQL in 21 days. Macmillan USA, Indianapolis, IN Although the Windows version of R does not have the RMySQL binary package, this book is an excellent resource to “getting your feet wet” with the SQL programming language. The book lacks the theme of what the Hogan (2002) book comprises. Namely, the source does not talk about strategies in database design. R Special Interest Group on Databases (R-SIG-DB). A common database interface (DBI). Software version 0.2-4 retrieved from http://cran.r-project.org/,http://cran.r-project.org/ September 27, 2008. The DBI package is necessary (but not sufficient) to interface any database with R (see slide 11). The DBI.pdf (available at the web link provided above) is a good document to review, prior to creating your first database in R. It provides an overview of the DBI package, and like most programming in R, to learn (effective) database management in R will require practice through working with data frames.
MS Access 2007 Management Information Systems 1. Overview 2 What is MS Access? Access Terminology Access Window Database Window Create New Database.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
Attribute databases. GIS Definition Diagram Output Query Results.
7.1 Managing Data Resources Chapter 7 Essentials of Management Information Systems, 6e Chapter 7 Managing Data Resources © 2005 by Prentice Hall.
Management Information Systems MS Access MS Access is an application software that facilitates us to create Database Management Systems (DBMS)
Introduction to Database Programming with Python Gary Stewart
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
CHAPTER 9 DATABASE MANAGEMENT © Prepared By: Razif Razali.
Introduction to Databases
Data Resource Management Data Concepts Database Management Types of Databases Chapter 5 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies,
Management Information Systems MS Access 2003 By: Mr. Imdadullah Lecturer, Department of M.I.S. College of Business Administration, KSU.
PHP Programming with MySQL Slide 8-1 CHAPTER 8 Working with Databases and MySQL.
1 Lecture 31 Introduction to Databases I Overview Objectives of this lecture History and Evolution of Databases Basic Terms in Database and definitions.
Database Concepts Lec. 5. What Is a Database? Data are unprocessed raw facts that include text, number, images, audio, and video. Information is processed.
1 Database Systems (Part I) Introduction to Databases I Overview Objectives of this lecture. History and Evolution of Databases. Basic Terms in Database.
Java Database Connectivity (JDBC). Topics 1. The Vendor Variation Problem 2. SQL and Versions of JDBC 3. Creating an ODBC Data Source 4. Simple Database.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 Working with MSSQL Server Code:G0-C# Version: 1.0 Author: Pham Trung Hai CTD.
Automating Tasks With Macros
CHAPTER 8: MANAGING DATA RESOURCES. File Organization Terms Field: group of characters that represent something Record: group of related fields File:
Prentice Hall, Database Systems Week 1 Introduction By Zekrullah Popal.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
3-1 Chapter 3 Data and Knowledge Management
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Interpret Application Specifications
COMPUTER SKILLS MS-ACCESS. Introduction Access is a piece of software known as a database management system. At its most basic level, it can be used to.
1 Working with MS SQL Server Textbook Chapter 14.
Web-Enabled Decision Support Systems
® Microsoft Office 2010 Building a Database and Defining Table Relationships.
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 11 System Test Design
® Microsoft Office 2013 Access Building a Database and Defining Table Relationships.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
INTRODUCTION TO ORACLE DATABASE ADMINISTRATION Lynnwood Brown President System Managers LLC Introduction – Lecture 1 Copyright System Managers LLC 2003.
Connecting to Databases. relational databases tables and relations accessed using SQL database -specific functionality –transaction processing commit.
Lection №4 Development of the Relational Databases.
Chapter 12: ADO.NET and ASP.NET Programming with Microsoft Visual Basic.NET, Second Edition.
1 Working with MS SQL Server II. 2 The sqlcmd Utility Command line utility for MS SQL Server databases. Previous version called osql Available on classroom.
1 Working with MS SQL Server Beginning ASP.NET in C# and VB Chapter 12.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Working with SQL and PL/SQL/ Session 1 / 1 of 27 SQL Server Architecture.
3/6: Data Management, pt. 2 Refresh your memory Relational Data Model
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Using Visual Basic 6.0 to Create Web-Based Database Applications
Module 5 Planning for SQL Server® 2008 R2 Indexing.
Chapter 10: The Data Tier We discuss back-end data storage for Web applications, relational data, and using the MySQL database server for back-end storage.
Course FAQ’s I do not have any knowledge on SQL concepts or Database Testing. Will this course helps me to get through all the concepts? What kind of.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Database Connectivity with ASP.NET. 2 Introduction Web pages commonly used to: –Gather information stored on a Web server database Most server-side scripting.
Brian Alderman | MCT, CEO / Founder of MicroTechPoint Pete Harris | Microsoft Senior Content Publisher.
Computers Are Your Future Tenth Edition Spotlight 5: Microsoft Office Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
Advanced Forms Lesson 10.
© 2017 SlidePlayer.com Inc. All rights reserved.