Presentation is loading. Please wait.

Presentation is loading. Please wait.

MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

Similar presentations


Presentation on theme: "MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006."— Presentation transcript:

1 MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006

2 MSc Bioinformatics 2006 Client HTTP Web Server CGI Application Server Request Response

3 MSc Bioinformatics 2006 e.g. PC + Apache +PERL (with CGI module) + MySQL Client HTTP Web Server CGI DB API DB Server +DB Application Server

4 MSc Bioinformatics 2006 The Database Tier stores and retrieves data. manages updates, allows simultaneous (concurrent) access from web servers, provides security, ensures the integrity of data requires complex software. –Database Management Systems (DBMSs) searches and manages data that's stored in databases a database is a collection of related data, and an application can have more than one database –interface is accessed using SQL (Standard Query Language) that's used to define and manipulate databases and data

5 MSc Bioinformatics 2006 Why use a database server? more than one user who needs to access the data at the same time. a moderate amount of data relationships between the stored data items more than one kind of data object constraints that must be rigidly enforced on the data, such as field lengths, field types etc.. If data must be queried to produce reports or results. large amount of data that must be searched quickly. If security is important When adding, deleting, or modifying data is a complex process.

6 MSc Bioinformatics 2006 Relational databases Entities and Attributes –An entity is a person, place, event, or thing about which data is collected –An entity set is a named collection of entities sharing common characteristics –Attributes are characteristics of the entity –Attributes are also called fields Tables –A table holds related entities or an entity set –Also called relations –Comprised of rows and columns

7 MSc Bioinformatics 2006 Tables Two-dimensional structure with rows and columns Each row (tuple) represents a single entity Columns represent attributes Row/column intersection represents single value Tables must have an attribute to uniquely identify each row, a unique key Column values all have same data format Each column has range of values called attribute domain Order of the rows and columns is immaterial to the DBMS

8 MSc Bioinformatics 2006 Fingerprint Motif Sequence

9 MSc Bioinformatics 2006 Entity-relationship model

10 MSc Bioinformatics 2006 Relational logical model

11 MSc Bioinformatics 2006

12 mySQL Command-line interpreter –UNIX/Mac /usr/local/bin/mysql -uhugh -pshhh –Windows Start > All Programs > mySQL > mySQL Server 5.0 > mySQL command line client

13 MSc Bioinformatics 2006 Managing databases Creating databases mysql> CREATE DATABASE phyloprints; To use that database use phyloprints ; Creating tables CREATE TABLE "fingerprint" ( "fprint_accn" varchar(15), "identifier" varchar(15), "motifs" int2, "date" date, "up_date" date, "family_title" text, "family_doc“ varchar(18), "category" text, "second_accn" varchar(15), "pseudo" bool ); SHOW CREATE TABLE wine;

14 MSc Bioinformatics 2006 Table = fingerprint +----------------------------------+----------------------------------+-------+ | Field | Type | Length| +----------------------------------+----------------------------------+-------+ | fprint_accn | varchar() | 15 | | identifier | varchar() | 15 | | motifs | int2 | 2 | | date | date | 4 | | up_date | date | 4 | | family_title | text | var | | family_doc | varchar() | 18 | | category | text | var | | second_accn | varchar() | 15 | | pseudo | bool | 1 | +----------------------------------+----------------------------------+-------+ Table = motif +----------------------------------+----------------------------------+-------+ |Field | Type |Length | +----------------------------------+----------------------------------+-------+ |fprint_accn | varchar() | 18 | | motif | int2 | 2 | | repeat | varchar() | 4 | | seqn_accn | varchar() | 15 | | seqn_fragment | varchar() | 35 | | start_position | int4 | 4 | | inter_motif_dist | int4 | 4 | | initial | bool | 1 | | final | bool | 1 | +----------------------------------+----------------------------------+-------+ attributes

15 MSc Bioinformatics 2006 Modifiers Are applied to attributes –NOT NULL (row can’t exist without this attribute having a value) identifier varchar(15) NOT NULL, –DEFAULT (sets the data to the value that follows when no data is supplied) pseudo" bool DEFAULT f,

16 MSc Bioinformatics 2006 Keys Primary key – one or more attributes that uniquely identify a row in a table –It is essential that every table has one PRIMARY KEY (fprint_accn), key names*fprint_accn,identifier) type=MyISAM;

17 MSc Bioinformatics 2006 Removing databases and tables DROP statement (make sure you want to do this – it doesn’t check!) DROP TABLE seqn; DROP DATABASE phyloprints;

18 MSc Bioinformatics 2006 Inserting/Updating/Deleting Data Inserting Data INSERT INTO fingerprint VALUES (‘PR90008’, ‘CRYSTALLIN’,0,2000-10-11,2000-10-11,’Crystallin pseudo- signature’,’PDOC90008’,’Others’,t); Inserting lots of data rows INSERT INTO fingerprint VALUES (‘PR90008’, ‘CRYSTALLIN’,0,2000-10-11,2000-10-11,’Crystallin pseudo- signature’,’PDOC90008’,’Others’,t),(‘PR00240’,’ADRENRGCA 1DR’,7,1996-08-14,2000-02-18,’Alpha-1D adrenergic receptor signature’,’PDOC00240’, f);

19 MSc Bioinformatics 2006 DELETE statement –All data DELETE FROM fingerprint; –Deleting specific rows – use WHERE DELETE FROM fingerprint WHERE identifer = ‘CRYSTALLIN’; UPDATE statement UPDATE fingerprint SET fprint_accn = ‘PR00001’ WHERE identifier=‘CRYSTALLIN’;

20 MSc Bioinformatics 2006 Querying with SQL Key operators – SELECT – PROJECT – JOIN Other operators – INTERSECT – UNION (union compatible tables) – DIFFERENCE – PRODUCT – DIVIDE

21 MSc Bioinformatics 2006 Querying databases with SQL SELECT statement SELECT identifier, fprint_accn FROM fingerprint; SELECT * FROM fingerprint; WHERE clause SELECT * from fingerprint WHERE fprint_accn=‘PR00001’;

22 MSc Bioinformatics 2006 select identifier,fprint_accn from fingerprint where pseudo='t' ORDER BY fprint_accn; select fprint_accn,COUNT(*) from fingerprint where pseudo='t' GROUP BY fprint_accn; select DISTINCT seqn_accn from motif; select distinct seqn_accn from motif LIMIT 10;

23 MSc Bioinformatics 2006 Join queries output data that's based on relationships between two or more tables

24 MSc Bioinformatics 2006 Diagram taken from Jane Mabey

25 MSc Bioinformatics 2006 Diagram taken from Jane Mabey

26 MSc Bioinformatics 2006 Querying databases using Perl #! /usr/bin/perl -w # intro6.pl - connect to MySQL, retrieve data, write plain text output use strict; use DBI; my ($dbh, $sth, $count); $dbh = DBI->connect ("DBI:mysql:host=localhost;database=webdb", "webdev", "webdevpass", {PrintError => 0, RaiseError => 1}); $sth = $dbh->prepare ("SELECT name, wins, losses FROM teams"); $sth->execute (); $count = 0; while (my @val = $sth->fetchrow_array ()) { printf "name = %s, wins = %d, losses = %d\n", $val[0], $val[1], $val[2]; ++$count; } print "count rows total\n"; $sth->finish (); $dbh->disconnect (); exit (0);

27 MSc Bioinformatics 2006 %./intro6.pl name = Fargo-Moorhead Twins, wins = 36, losses = 16 name = Winnipeg Maroons, wins = 24, losses = 26 name = Minot Why Nots, wins = 19, losses = 23 name = Warren Wanderers, wins = 16, losses = 30 4 rows total

28 MSc Bioinformatics 2006 Perl example… use DBI; my $dbh = DBI->connect('DBI:MySql:payroll') or die "Couldn't connect to database: ". DBI->errstr; my $sth = $dbh->prepare('SELECT * FROM people WHERE lastname = ?') or die "Couldn't prepare statement: ". $dbh->errstr; print "Enter name> "; while ($lastname = <>) { # Read input from the user my @data; chomp $lastname; $sth->execute($lastname) # Execute the query or die "Couldn't execute statement: ". $sth->errstr; # Read the matching records and print them out while (@data = $sth->fetchrow_array()) { my $firstname = $data[1]; my $id = $data[2]; print "\t$id: $firstname $lastname\n"; } if ($sth->rows == 0) { print "No names matched `$lastname'.\n\n"; } $sth->finish; print "\n"; print "Enter name> "; } $dbh->disconnect;

29 MSc Bioinformatics 2006 Query databases over the Web #! /usr/bin/perl -w # intro7.pl - connect to MySQL, retrieve data, write HTML output use strict; use DBI; use CGI qw(:standard); my ($dbh, $sth, $count); $dbh = DBI->connect ("DBI:mysql:host=localhost;database=webdb", "webdev", "webdevpass", {PrintError => 0, RaiseError => 1}); $sth = $dbh->prepare ("SELECT name, wins, losses FROM teams"); $sth->execute (); print header(), start_html ("team data"); $count = 0; while (my @val = $sth->fetchrow_array ()) { print p (sprintf ("name = %s, wins = %d, losses = %d\n", $val[0], $val[1], $val[2])); ++$count; } print p ("$count rows total"), end_html (); $sth->finish (); $dbh->disconnect (); exit (0);

30 MSc Bioinformatics 2006 More about data retrieval with DBI.pm fetchrow_arrayref $sth = $dbh->prepare ("SELECT name, wins, losses FROM teams"); $sth->execute (); while (my $ref = $sth->fetchrow_arrayref ()) { printf "name = %s, wins = %d, losses = %d\n", $ref->[0], $ref->[1], $ref->[2]; } $sth->finish ();

31 MSc Bioinformatics 2006 #!/usr/bin/perl use DBI; my $db = “sequence”; my $server = ‘localhost’; my $user = ‘root’; my $passwd = ‘passwd’; my $dbConnection = DBI- >connect(“dbi::mysql:$db:$server”,$user,$passwd) my $query = “show tables”; my $sql = $dbConnection->prepare($query); $sql ->execute(); while(my $row = $sql->fetchrow_arrayref){ print join(“\t”,@$row),”\n”; } $dbConnection->disconnect; exit;

32 MSc Bioinformatics 2006 Higher retrieval methods selectrow_array() –to retrieve a single row or a single column value selectcol_arrayref() –returns the first column of a result set, as a reference to an array of values. selectall_arrayref() –retrieves the entire result set as a matrix and returns a reference to it

33 MSc Bioinformatics 2006 Inserting data in the database sub new_employee { # Arguments: database handle; first and last names of new employee; # department ID number for new employee's work assignment my ($dbh, $first, $last, $department) = @_; my ($insert_handle, $update_handle); my $insert_handle = $dbh->prepare_cached('INSERT INTO employees VALUES (?,?,?)'); my $update_handle = $dbh->prepare_cached('UPDATE departments SET num_members = num_members + 1 WHERE id = ?'); die "Couldn't prepare queries; aborting" unless defined $insert_handle && defined $update_handle; $insert_handle->execute($first, $last, $department) or return 0; $update_handle->execute($department) or return 0; return 1; # Success }

34 MSc Bioinformatics 2006 Inserting data in the database sub new_employee { # Arguments: database handle; first and last names of new employee; # department ID number for new employee's work assignment my ($dbh, $first, $last, $department) = @_; my ($insert_handle, $update_handle); my $insert_handle = $dbh->prepare_cached('INSERT INTO employees VALUES (?,?,?)'); my $update_handle = $dbh->prepare_cached('UPDATE departments SET num_members = num_members + 1 WHERE id = ?'); die "Couldn't prepare queries; aborting" unless defined $insert_handle && defined $update_handle; my $success = 1; $success &&= $insert_handle->execute($first, $last, $department); $success &&= $update_handle->execute($department); my $result = ($success ? $dbh->commit : $dbh->rollback); unless ($result) { die "Couldn't finish transaction: ". $dbh->errstr } return $success; }

35 MSc Bioinformatics 2006 A short cut for DELETING, UPDATING and INSERTING data use the do statement $dbh->do('DELETE FROM people WHERE age > 65');

36 MSc Bioinformatics 2006 Reminders Executable CGI files Configuring Apache to read cgi files from cgi-bin In httpd.conf ScriptAlias /cgi-bin/ "C:/Program Files/Apache Group/Apache2/cgi-bin/"


Download ppt "MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006."

Similar presentations


Ads by Google