Presentation is loading. Please wait.

Presentation is loading. Please wait.

Relational Databases: Basic Concepts BCHB524 2014 Lecture 21 11/12/2014BCHB524 - 2014 - Edwards.

Similar presentations


Presentation on theme: "Relational Databases: Basic Concepts BCHB524 2014 Lecture 21 11/12/2014BCHB524 - 2014 - Edwards."— Presentation transcript:

1 Relational Databases: Basic Concepts BCHB524 2014 Lecture 21 11/12/2014BCHB524 - 2014 - Edwards

2 11/12/2014BCHB524 - 2014 - Edwards2 Outline What is a (relational) database? When are relational databases used? Commonly used database management systems Using existing databases Creating and populating new databases Python and relational databases Exercises

3 11/12/2014BCHB524 - 2014 - Edwards3 (Relational) Databases Databases store information Bioinformatics has lots of file-based information: FASTA sequence databases Genbank format sequences Store sequence, annotation, references, annotation Good as archive or comprehensive reference Poor for a few items Relational databases also store information Good for a few items at a time Flexible on which items

4 11/12/2014BCHB524 - 2014 - Edwards4 Relational Databases Store information in a table Rows represent items Columns represent items' properties or attributes NameContinentRegionSurface AreaPopulationGNP BrazilSouth America 8547403170115000776739 IndonesiaAsiaSoutheast Asia190456921210700084982 IndiaAsiaSouthern and Central Asia32872631013662000447114 ChinaAsiaEastern Asia95729001277558000982268 PakistanAsiaSouthern and Central Asia79609515648300061289 United StatesNorth America 93635202783570008510700

5 11/12/2014BCHB524 - 2014 - Edwards5 Relational Databases Tables can be millions of rows Can access a few rows fast Countries more than 100,000,000 in population? Countries on the “Asia” continent? Countries that start with “U”? Countries with GNP = 776739 NameContinentRegionSurface AreaPopulationGNP BrazilSouth America 8547403170115000776739 IndonesiaAsiaSoutheast Asia190456921210700084982 IndiaAsiaSouthern and Central Asia32872631013662000447114 ChinaAsiaEastern Asia95729001277558000982268 PakistanAsiaSouthern and Central Asia79609515648300061289 United StatesNorth America 93635202783570008510700

6 11/12/2014BCHB524 - 2014 - Edwards6 When are Relational Databases Used? LARGE datasets Does data fit in memory? Store data first...... ask questions later Lookup or sort by many keys For single key, simple data structures often work Store results of expensive compute or data-cleanup Compute once and return results many times "Random" or unknown access patterns Specialized data-structures not appropriate Use string/sequence indexes for sequence data

7 11/12/2014BCHB524 - 2014 - Edwards7 Common DBMS Oracle Commercial, market leader, widely used in businesses MySQL Free, open-source, widely used in bioinformatics, suitable for large scale deployment Sqlite Free, open-source, minimal installation requirements, no users, suitable for small scale deployment

8 11/12/2014BCHB524 - 2014 - Edwards8 Lets look at some examples We'll use a third-party program to "look at" Sqlite databases: SqliteStudio (Linux), SqliteSpy (Windows), … Download examples: World.db3, taxa.db3 from Course data folder Use SqliteStudio to look at examples World.db3, taxa.db3

9 11/12/2014BCHB524 - 2014 - Edwards9 Using existing databases Use the "select" SQL command to find relevant rows select * from Country where Population > 100000000; select * from Country where Continent = 'Asia'; select * from Country where Name like 'U%'; select * from Country where GNP = 776739; Each command ends in semicolon ";". "where" specifies the condition/constraint/rule. "*" asks for all attributes from the relevant rows. Lets experiment with world and taxa databases.

10 11/12/2014BCHB524 - 2014 - Edwards10 Using existing databases Select can combine (“join”) multiple tables Use the where condition to match rows from each table and “link” corresponding rows… select * from taxonomy, name where taxonomy.rank = 'species' and name.name_class = 'misspelling' and name.tax_id = taxonomy.tax_id

11 11/12/2014BCHB524 - 2014 - Edwards11 Using existing databases Select can sort and/or return top 10 select * from taxonomy limit 10; select * from taxonomy order by scientific_name; select * from taxonomy order by tax_id desc limit 10;

12 11/12/2014BCHB524 - 2014 - Edwards12 Using existing databases Select can count and do string matching. "like" uses special symbols: % matches zero or more symbols _ match exactly one symbol Some RDBMS support regular expressions MySQL, for example. select count(*) from taxonomy where scientific_name like 'D%';

13 11/12/2014BCHB524 - 2014 - Edwards13 Creating databases Use the "create" SQL command to create tables CREATE TABLE taxonomy ( tax_id INTEGER PRIMARY KEY, scientific_name TEXT, rank TEXT, parent_id INT ); CREATE TABLE name ( id INTEGER PRIMARY KEY, tax_id INT, name TEXT, name_class TEXT );

14 11/12/2014BCHB524 - 2014 - Edwards14 Populating databases Use the "insert" SQL command to add rows to tables Usually, the special id column is initialized automatically INSERT INTO name (tax_id,name,name_class) VALUES (9606,'H. sapiens','synonym'); SELECT * from name where tax_id = 9606;

15 11/12/2014BCHB524 - 2014 - Edwards15 Python and Relational Databases Issue select statements from python and iterate through the results Sometimes it is easiest to make Python do some of the work! import sqlite3 conn = sqlite3.connect('taxa.db3') c = conn.cursor() c.execute(""" select * from name where name like 'D%' limit 10; """) for row in c: print row

16 11/12/2014BCHB524 - 2014 - Edwards16 Python and Relational Databases Use parameter substitution for run-time values import sys import sqlite3 tid = int(sys.argv[1]) conn = sqlite3.connect('taxa.db3') params = [tid,'scientific name'] c = conn.cursor() c.execute(""" select * from name where tax_id = ? and name_class = ?; """,params) for row in c: print row

17 11/12/2014BCHB524 - 2014 - Edwards17 Next-time: Object-relational mappers Setup python to treat tables as classes, rows as objects # Set up data-model from model import * hs = Taxonomy.get(9606) for n in hs.names: print n.name, "|", n.nameClass condition = Name.q.name.startswith('Da') for n in Name.select(condition): print n.name, "|", n.nameClass

18 11/12/2014BCHB524 - 2014 - Edwards18 Lab exercises Read through an online course in SQL sqlcourse.com, sql-tutorial.net,... Write a python program to lookup the scientific name for a user-supplied organism name.


Download ppt "Relational Databases: Basic Concepts BCHB524 2014 Lecture 21 11/12/2014BCHB524 - 2014 - Edwards."

Similar presentations


Ads by Google