Applied CyberInfrastructure Concepts Fall 2017

Slides:



Advertisements
Similar presentations
MSc IT UFCE8K-15-M Data Management Prakash Chatterjee Room 2Q18
Advertisements

Accounting System Design
Introduction to Structured Query Language (SQL)
Introduction to Structured Query Language (SQL)
Fundamentals, Design, and Implementation, 9/e Chapter 6 Introduction to Structured Query Language (SQL)
Introduction to Structured Query Language (SQL)
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
Database Management Systems (DBMS)
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall Nirav Merchant Bio Computing & iPlant Collaborative Eric Lyons.
Database Lecture # 1 By Ubaid Ullah.
Chapter 9 SQL and RDBMS Part C. SQL Copyright 2005 Radian Publishing Co.
Data at the Core of the Enterprise. Objectives  Define of database systems.  Introduce data modeling and SQL.  Discuss emerging requirements of database.
ASP.NET Programming with C# and SQL Server First Edition
PHP Programming with MySQL Slide 8-1 CHAPTER 8 Working with Databases and MySQL.
Database Technical Session By: Prof. Adarsh Patel.
Chapter 7 Working with Databases and MySQL PHP Programming with MySQL 2 nd Edition.
© 2007 by Prentice Hall (Hoffer, Prescott & McFadden) 1 Introduction to SQL.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 UNIT 6: Chapter 7: Introduction to SQL Modern Database Management 9 th Edition Jeffrey A.
Lecture 7 Integrity & Veracity UFCE8K-15-M: Data Management.
DAY 12: DATABASE CONCEPT Tazin Afrin September 26,
SQL SQL Server : Overview SQL : Overview Types of SQL Database : Creation Tables : Creation & Manipulation Data : Creation & Manipulation Data : Retrieving.
FALL 2004CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
DataBase Management System What is DBMS Purpose of DBMS Data Abstraction Data Definition Language Data Manipulation Language Data Models Data Keys Relationships.
SQL Fundamentals  SQL: Structured Query Language is a simple and powerful language used to create, access, and manipulate data and structure in the database.
Week 8-9 SQL-1. SQL Components: DDL, DCL, & DML SQL is a very large and powerful language, but every type of SQL statement falls within one of three main.
©Silberschatz, Korth and Sudarshan1 Structured Query Language (SQL) Data Definition Language Domains Integrity Constraints.
Starting with Oracle SQL Plus. Today in the lab… Connect to SQL Plus – your schema. Set up two tables. Find the tables in the catalog. Insert four rows.
Chapter 3: Relational Databases
7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.
CSC314 DAY 8 Introduction to SQL 1. Chapter 6 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SQL OVERVIEW  Structured Query Language  The.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
SQL Basics Review Reviewing what we’ve learned so far…….
Understanding Core Database Concepts Lesson 1. Objectives.
Fundamental of Database Systems
Getting started with Accurately Storing Data
Fundamentals of DBMS Notes-1.
SQL Query Getting to the data ……..
Understanding SQL Statements
Fundamentals & Ethics of Information Systems IS 201
Database Management System
Introduction To Database Systems
Quiz Questions Q.1 An entity set that does not have sufficient attributes to form a primary key is a (A) strong entity set. (B) weak entity set. (C) simple.
Chapter 4 Relational Databases
Databases and Information Management
What is a Database and Why Use One?
SQL 101.
Introduction to Database Management System
Chapter 2 Database Environment Pearson Education © 2009.
Basic Concepts in Data Management
Chapter 2 Database Environment.
Accounting System Design
Chapter 8 Working with Databases and MySQL
Normalization Referential Integrity
Database Fundamentals
Teaching slides Chapter 8.
Data Model.
CMPT 354: Database System I
Accounting System Design
Databases and Information Management
SQL-1 Week 8-9.
Contents Preface I Introduction Lesson Objectives I-2
Introduction to Databases & SQL
Indexes and more Table Creation
DATABASE Purpose of database
Understanding Core Database Concepts
Chapter 2 Database Environment Pearson Education © 2009.
INTRODUCTION A Database system is basically a computer based record keeping system. The collection of data, usually referred to as the database, contains.
Presentation transcript:

Applied CyberInfrastructure Concepts Fall 2017

Databases Why do we need them Spreadsheets (good and bad) Working with bulk data for clean up (Openrefine) SQL and NoSQL (and specialty databases)

Relational Database A relational database is a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables The relational database was invented by E. F. Codd at IBM in 1970. A relational database is a set of tables containing data fitted into predefined categories Each table contains one or more data categories in columns. Each row contains a unique instance of data for the categories defined by the columns. The standard user and application program interface to a relational database is the structured query language (SQL)

Tables and Relations

Buzz words you must know Schemas or conceptual view Describes the overall organization / structure of the database Domains Describes what values can be stored in the column of a given table Constraints Rules that govern what values can be stored in a column Many Many more to follow !!

Structured Query Language (SQL) Standard interactive and programming language for getting information from and updating a database SQL is both an ANSI and an ISO standard Was a non procedural language but from SQL:1999 onwards it became procedural SQL can be considered a special purpose language it needs a wrapper to talk to database i.e Python,Perl, C, Java Every vendor has its own unique implementation of SQL, even though they all follow the SQL standard there are subtle variances and supported/unsupported calls. You Query a database using SQL, if a match is found the data is returned

SQL components Data Definition Language (DDL) Deals with structural aspect of the database : creation, modification, deletion of tables Data Manipulation Language (DML) This allows modification of the data contained in the tables: insertion, deletion, selection,changing (even aggregation i.e count,sum,average ) Data Control Language (DCL) This deals with maintaining the integrity of the database using permissions, transactions etc.

Some design concepts ! Database design is not software or database specific Basic steps include: Defining the problem or objective Researching the current database Designing the data structures Constructing relationships Implementing rules and constraints Creating views and reports Implementing the design

Normalization Is your database normalized ? Is that BCNF ? If you hesitate in answering you are not worthy ! ( BCNF: Boyce-Codd Normal Form) Normalization is a way to efficiently organizing data in your database (almost like closet cleaning) The goal is to: Eliminate redundancy in data Ensure data dependencies

Keys A Key is a column or a collection of columns that uniquely identifies a row in a table 2 types of keys: Primary (composite key is a collection of columns) Foreign In many cases, data table keys are constructed by simply adding an additional field to function as the key Can primary key be NULL or have duplicate values ? Foreign key is a column or a collection of columns in a table that reference a primary key in another table

Index Data listed in a table is based on the order it was entered As the amount of data increases (number of rows), the database has to sort through more information (becoming slow) Index is supplementary to a table and keeps track of the corresponding rows Syntax: create index <index name> on <TABLE> ( columns to index) create index by_id on patients (id,dob);

Getting to know “sqlite” Log on to your account on: ssh to hpc.arizona.edu (use ocelote) or use your machine Lets get a sample database http://amadeus.biosci.arizona.edu/~nirav/genotypes.sqlite Now lets open the genotype.sqlite with sqlite3 sqlite3 genotype.sqlite Type .help Type .tables what do you see ?

Some SQL basics To store data the database uses tables Tables consists of rows and columns Column names have to be unique CREATE is for generating tables ALTER for making changes to the tables DROP for deleting the tables SELECT is for ? UPDATE JOIN DELETE

Some Common Column types (SQLite) Check: http://www.hwaci.com/sw/sqlite/datatype3.html For details NULL. The value is a NULL value. INTEGER REAL. The value is a floating point value, TEXT. BLOB.

Your first query ! When writing a SQL query, it is common practice to write SQL commands in uppercase. The -- command indicates a comment, and the database ignores everything else on the rest of the line. The SELECT command tells the database which data fields to retrieve. The FROM command tells the database which table to fetch the data from. Some databases care about table and column name case, but others don’t, so it’s best to always use the correct case when referencing tables and column names. The end of a query is always marked with a semicolon ;. -- This query selects the data in all columns -- from the table 'loci' SELECT * FROM loci;

Typical Errors Incorrect relations. Duplicate field names. Spreadsheet design. Too much data. Compound fields. Missing keys. Bad keys. Missing relations. Unnecessary relationships. Incorrect relations. Duplicate field names. Cryptic field and table names. Missing or incorrect business rules. Referential integrity. Database security. International issues.

End the torture …give me a GUI Plenty of GUI …your mileage may vary ..so try them out http://sqlitestudio.pl/ Command line is reliable and easy to automate  More tutorial for sqlite: http://www.sqlitetutorial.net

Home work We will import data from a file into the database http://amadeus.biosci.arizona.edu/~nirav/cds_product.txt Create database analysis.db using sqlite3 analysis.db Now create a table my_results to store analysis create table my_results (locus TEXT ,secondary_tag TEXT , start INTEGER , stop INTEGER); sqlite> .mode tabs sqlite> .import cds_product.txt my_results Have fun with SQL statements select distinct(locus) from my_results; select locus, start from my_results where start > 100; Turn in 3 different sql statement doing something with this data Export data for “select locus, start from my_results where start > 10000;” as csv delimited into a file