Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Ensembl Modules and MySQL. SQL and Database Tables Quick Examples 2.

Similar presentations


Presentation on theme: "1 Ensembl Modules and MySQL. SQL and Database Tables Quick Examples 2."— Presentation transcript:

1 1 Ensembl Modules and MySQL

2 SQL and Database Tables Quick Examples 2

3 3 exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id transcript id sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status project id name = pro1 description date set_table id project_id name =testset date description target_set_info set_id target_id rank = 5 cas_rank cas_options select id from target where gene_name = "BBS4";

4 4 MySQL Demo with Ensembl mysql -u anonymous -h ensembldb.ensembl.org show databases; show databases like "%omo%core%"; use homo_sapiens_core_47_36i; show tables; select count(*) from exon; show columns from gene; select * from xref limit 10 select * from xref where dbprimary_acc = "NM_000777"; select stable_id, gene.gene_id from gene_stable_id, gene, transcript, object_xref, xref where gene_stable_id.gene_id = gene.gene_id and gene.gene_id = transcript.gene_id and transcript.transcript_id = object_xref.ensembl_id and xref.xref_id = object_xref.xref_id and xref.dbprimary_acc = 'NM_000777'; select * from transcript where gene_id = 17393; select * from exon_transcript where transcript_id = 33341; select * from exon where exon_id = 193252;

5 5 Ensembl Schema Core Schema http://www.ensembl.org/info/docs/api/core/s chema/index.html#exon_stable_id API Tutorial: http://www.ensembl.org/info/docs/api/core/c ore_tutorial.html

6 6 Code Development 1) Generate random sequence ATGCCCGCTGAGT 2) Generate formatted random sequence 1 ATGCCCGCTT TGACCCTTTA 20 3) Generate random sequence, translated into protein, and formatted… Code revision –adding functionality and features –may introduce bugs that are not discovered until much later –useful to examine the changes to code, that may have caused bugs

7 7 Code Development Solutions May retain a copy of every version of every file –have complete record –redundant and waste of space –responsibility on developer to maintain revision history –Example (V1, V2, V3 experiment, V4 unfinished feature, return in 6 months?)

8 8 Multi-coder Environment Developers D1, D2, and D3 Source code S1, S2, S3, S4, S5. D1 copies S1 and makes changes D2 copies S1 and makes changes D2 returns S1 D1 returns S1 Clearly, this is ineffective for managing and integrating changes

9 9 Brief Overview of CVS CVS – Concurrent Versions System CVS –only stores differences between files/versions –uses repository structure check out check in lock branching, merging etc Reference –http://www.gnu.org/software/cvs/http://www.gnu.org/software/cvs/ –https://www.cvshome.org/https://www.cvshome.org/

10 10 Installing Ensembl Modules Sample program – ens4.pl (simple demo program that obtains exons for a particular gene from Ensembl database, for given accession number, and Ensembl Gene ID) When connected to Ensembl's MySQL database % mysql -u anonymous -h ensembldb.ensembl.org To get a list of their current databases. Find the most recent (highest numbers) version of the homo_sapiens core database. type % show databases; Example: homo_sapiens_core_47_36i Example: homo_sapiens_core_25_36 The final two numbers represent the Ensembl code version and the NCBI human build, respectively (i.e. Ensembl modules 25 and NCBI Human Build 36). In this case, you should be using Ensembl code 47 to do the following:

11 11 NO LONGER VALID (for CSS) %touch ~/.cvspass %chmod 755 ~/.cvspass create the directory %mkdir Ensembl_modules-41 enter the directory %cd Ensembl_modules-41 type the following: %cvs -d :pserver:cvsuser@cvsro.sanger.ac.uk:/cvsroot/CVSmaster login (when prompted, the password is CVSUSER) -- yes, in all CAPS %cvs -d :pserver:cvsuser@cvsro.sanger.ac.uk:/cvsroot/CVSmaster checkout -r branch-ensembl-41 ensembl %cvs -d :pserver:cvsuser@cvsro.sanger.ac.uk:/cvsroot/CVSmaster checkout -r branch-ensembl-41 ensembl-external %cvs -d :pserver:cvsuser@cvsro.sanger.ac.uk:/cvsroot/CVSmaster checkout -r branch-ensembl-41 ensembl-lite Note this is all about 9 Meg Make symbolic link called "Ensembl_modules-current" to point to your newly created directory of modules: %cd.. %ln -s Ensembl_modules-41 Ensembl_modules-current

12 12 http://www.ensembl.org/info/software/api_installation.html # -- Clearly this assumes a Unix flavor -- Create an installation directory $ cd $ mkdir src $ cd src $ cvs -d :pserver:cvs@code.open-bio.org:/home/repository/bioperl login Logging in to :pserver:cvs@code.open-bio.org:2401/home/repository/bioperl CVS password: cvs Install BioPerl (version 1.2.3) $ cvs -d :pserver:cvs@code.open-bio.org:/home/repository/bioperl checkout -r bioperl-release-1-2-3 bioperl-live Log into the Ensembl CVS server at Sanger (using a password of CVSUSER): $ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl login CVS password: CVSUSER Install the Ensembl Core Perl API for version 47 $ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl checkout -r branch-ensembl-47 ensembl If required, install the Ensembl Variation Perl API for version 47 $ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl checkout -r branch-ensembl-47 ensembl-variation If required, install the Ensembl Compara Perl API for verion 47 $ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl checkout -r branch-ensembl-47 ensembl-compara NB: You can install as many Ensembl APIs as you need in this way.

13 13 To install Ensembl modules -- assumes you do need to have BioPerl modules installed (used to be separate step)

14 Run Program Now put the following program into your "src" directory – and it should run. 14

15 15 #!/usr/local/bin/perl use lib "bioperl-live"; # you MAY have to use: use lib "bioperl-live/bioperl-live"; use lib "ensembl/modules"; # use lib "ensembl/ensembl/modules"; use Bio::EnsEMBL::DBSQL::DBAdaptor; #my $host = "kaka.sanger.ac.uk"; my $host = "ensembldb.ensembl.org"; my $user = "anonymous"; #my $dbname = "homo_sapiens_core_41_36c"; my $dbname = "homo_sapiens_core_47_36i"; my $accession_num = "NM_000777"; my $Ensembl_gene_id = "ENSG00000106258"; my $flank_length = 5000; my $db = new Bio::EnsEMBL::DBSQL::DBAdaptor( -host => $host, -user => $user, -dbname => $dbname); my $gene_adaptor = $db->get_GeneAdaptor(); my @genes = @{$gene_adaptor->fetch_all_by_external_name('NM_000777')}; foreach my $gene (@genes) { my $string = feature2string($gene); print "$string\n"; } sub feature2string { my $f = shift; my $stable_id = $f->stable_id(); my $name = $f->external_name(); my $seq_region = $f->slice->seq_region_name(); my $start = $f->start(); my $end = $f->end(); my $strand = $f->strand(); return "$stable_id: $name $seq_region:$start-$end ($strand)"; }

16 16 Output ENSG00000106258: CYP3A5 7:99083759-99115557 (-1) Doesn't seem like much, but remember: 1)Using the language "perl" 2)Using other peoples software (modules) 3)Accessing genomic data in a database in England 4)Accessing data programatically

17 17 Look at API Ensembl API (full): http://www.ensembl.org/info/docs/api/Pdoc/index.html Ensembl->gene_adaptor->fetch_all_by_external_name @genes = @{$gene_adaptor->fetch_all_by_external_name('BRCA2')};

18 18 From the API… # Fetch all clones from a slice adaptor (returns a list reference) my $clones_ref = $slice_adaptor->fetch_all('clone'); # If you want a copy of the contents of the list referenced by # the $clones_ref reference... my @clones = @{$clones_ref}; # Get the first clone from the list via the reference: my $first_clone = $clones_ref->[0];

19 19 Object adaptors have internal knowledge of the underlying database schema and use this knowledge to fetch, store and remove objects (and data) from the database. This way you can write code and use the Ensembl Core API without having to know anything about the underlying databases you are using. Object adaptors are obtained from the Registry via a method named get_adaptor(). To obtain a Slice adaptor or a Gene adaptor (which retrieve Slice and Gene objects respectively) for Human, do the following after having loaded the Registry, here called $registry, as above: my $gene_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Gene' ); my $slice_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Slice' ); Don't worry if you don't immediately see how useful this could be. Just remember that you don't need to know anything about how the database is structured, but you can retrieve the necessary data (neatly packaged in objects) by asking for it from the correct adaptor. Throughout the rest of this document we are going to work through the ways the Ensembl objects can be used to derive the information you want.

20 UCSC http://genome.ucsc.edu/FAQ/FAQdownloa ds#download29http://genome.ucsc.edu/FAQ/FAQdownloa ds#download29 20

21 21 #### genome-mysql.cse.ucsc.edu use DBI; my ($dsn) = "DBI:mysql:hg18:genome-mysql.cse.ucsc.edu"; my ($username) = "genome"; my ($passwd) = ""; my ($query); my $dbh = DBI->connect ($dsn, $username, $passwd,{RaiseError=>1}); if (! defined $dbh) { print "\nConnect to database(Human_annot_mar06): FAILED\n"; } else { print "\nConnect to database (gh18): SUCCESS\n"; } $gene = "BBS4"; my $string = "SELECT geneName, name, exonStarts, exonEnds, chrom, strand FROM refFlat WHERE geneName = '$gene'"; my $sth = $dbh->prepare($string); $sth->execute(); while(my @row = $sth->fetchrow_array) { $GENENAME = $row[0]; $NAME = $row[1]; $EXONSTARTS = $row[2]; $STRAND = $row[5]; } $sth->finish(); print "geneName = $GENENAME\n"; print "Name = $NAME\n"; print "strand = $STRAND\n";

22 22 End

23 23 Output Intron: 11 -9247 -6928 Exon: 12 -6927 -6768 sequence_start = -6927 sequence_stop = -6768 exon length= 160 exon start, exon_stop 6768 6927 exon sequence: CTGTGTTTCTTTACAAGGTTTGAAGGAGAAGTTCTGAAGGACTCTGATTAGAGCAAGTTTCAT GTTCATGAGAGCAAACCTCATGCCAATGCAGTTTCTGGGTCCAGTTCCAAAGGGTGTGTATA TGTAAGGATCTATGCTGTCCTTCTTCTTACTGAAC Intron: 12 -6767 -5096 Exon: 13 -5095 -5000 sequence_start = -5095 sequence_stop = -5000 exon length= 96 exon start, exon_stop 5000 5095 exon sequence: TCATTCTCCACTTAGGGTTCCATCTCTTGAATCCACCTTTAGAACAATGGGTTTTTCTGGTTGA AGAAGTCCTTGCGTGTCTAATTTCAAGGGGAT chr = chr7 seq length= 41692

24 24 Installing bioperl (Linux) 3.5) mkdir ~/perl 3.6) mkdir ~/perl/bioperl 3.8) cd bioperl-1.2.3 4)perl Makefile.PL LIB=~/perl/bioperl (Do it this way -- with "LIB" -- recently changed slide) make test make install (see installing in private space on next slides) To uninstall, just delete ~/perl/bioperl and ~/perl/bioperl-1.2.3 Note: version -1.2.3 was the current version when I made this slide -- it may have updated since.

25 25 5) To use: #!/usr/local/bin/perl use lib "~/local/bioperl/"; # this is supposed to work,but did NOT on CSS use Bio::Tools::BPlite; # Need -- LIB prefix for this to work. csh 5.1) setenv PERL5LIB ~/perl/bioperl bash 5.1) PERL5LIB=~/perl/bioperl; export PERL5LIB mac (bash) 5.1) PERL5LIB=~/perl/bioperl; export PERL5LIB 6) To make docs work (I would just put this in your.cshrc file: set path = ($path ~/perl/bioperl/lib/site_perl/5.8.1) PATH=$PATH:~/perl/bioperl/lib/site_perl/5.8.1; export PATH Test with: cd perldoc Bio::SearchIO FINALLY, please note that the version numbers change over time, and the actual paths may very a little between CPAN and/or bioperl.org It make take some trial and error (it usually does for me). NOTE TO SELF -- check out the CPAN installer (its much easier)

26 26 Using Modules Finally, need DBI.pm % mkdir modules % cd modules % ftp ftp.cpan.org (login: ftp passwd: login@address.edu)ftp.cpan.orglogin@address.edu % bin % cd /pub/CPAN/modules/by-module/DBI % get DBI-1.53.tar.gz % cd../DBD % get DBD-mysql-3.0008.tar.gz % gunzip DBI-1.53.tar.gz % tar –xvf DBI-1.53.tar % cd DBI-1.53 % perl Makefile.PL LIB=~/modules (**** changed this slide) % make % make install (set up Environment for DBI -- next slide), then install DBD

27 27 Connecting /w Perl % mkdir modules (put modules in this dir) Need DBI, DBD-mysql gunzip, and tar (do this for both modules) perl Makefile.PL LIB=~/modules make make install csh5.0) setenv PERL5LIB "$HOME/modules:$HOME/perl/bioperl" bash 5.1) PERL5LIB=$HOME/modules:$HOME/perl/bioperl; export PERL5LIB (note CSS has upgraded perl from 5.6.0 – used the last time)

28 28 Using Modules CSH setenv PERL5LIB "$HOME/local/bioperl/lib/site_perl/5.8.1:$HOME/modules/lib/site_perl/5.8.1:$HOME/Ensembl_mo dules-41/ensembl/modules:$HOME/Ensembl_modules-41/ensembl- external/modules:$HOME/Ensembl_modules-41/ensembl-lite/modules :$HOME/modules:$HOME/perl/bioperl" BASH PERL5LIB="$HOME/local/bioperl/lib/site_perl/5.8.1:$HOME/modules/lib/site_perl/5.8.1:$HOME/Ense mbl_modules-41/ensembl/modules:$HOME/Ensembl_modules-41/ensembl- external/modules:$HOME/Ensembl_modules-41/ensembl- lite/modules:$HOME/modules:$HOME/perl/bioperl" export PERL5LIB This (below) would work if we used the LIB prefix -- but that makes it a pain to install DBD. So just rely on environment settings. NOTE -- if you log out -- and don’t save the environment setting somewhere (such as.chsrc, or.bashrc, you will have to re-type the command). DBI used with: use lib "~/modules/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi/"; Ensembl modules can then be used in a perl program with: use lib "~/Ensembl_modules-current/ensembl/modules"; use lib "~/Ensembl_modules-current/ensembl-external/modules"; use lib "~/Ensembl_modules-current/ensembl-lite/modules";

29 29 Another Module Finally, need DBD.pm % cd modules % ftp ftp.cpan.org (login: ftp passwd: login@address.edu)ftp.cpan.orglogin@address.edu % bin % cd /pub/CPAN/modules/by-module/DBD % get DBD-mysql-2.9003.tar.gz % quit % gunzip DBD-mysql-2.9003.tar.gz % tar –xvf DBD-mysql-2.9003.tar.gz % cd DBD-mysql-2.9003.tar.gz % perl Makefile.PL LIB=~/modules % make % make install

30 30 Does not work on CSS Concluded either –version of Perl incompatible –port blocking./ens3.pl current core DB: homo_sapiens_core_18_34 -------------------- EXCEPTION -------------------- MSG: Could not connect to database homo_sapiens_core_18_34 user anonymous using [DBI:mysql:database=homo_sapiens_core_18_34;host=ensembldb.ensembl.org;port =3306] as a locator STACK Bio::EnsEMBL::DBSQL::DBConnection::new /user/eng/tbraun/Ensembl_modules- 18/ensembl/modules/Bio/EnsEMBL/DBSQL/DBConnection.pm:125 STACK Bio::EnsEMBL::DBSQL::DBAdaptor::new /user/eng/tbraun/Ensembl_modules- 18/ensembl/modules/Bio/EnsEMBL/DBSQL/DBAdaptor.pm:79 STACK main::dbconnect_Ensembl./ens3.pl:150 STACK toplevel./ens3.pl:26 -------------------------------------------

31 31 However… Installed local version of MySQL Needed modules –DBI (perl database interface) –DBD (database specific interface – mysql) Realized that I had failed to install DBD with Ensembl modules

32 End 32

33 No longer need to install BioPerl separately – Ensembl install instructions installs BioPerl now. 33

34 34 Install BioPerl I'll assume Windows XP, Eclipse (if you are using Linux/Unix, then the default documentation with Bioperl is better than these slides www.bioperl.org). Dowload Bioperl: http://pdb.eng.uiowa.edu/~tabraun/biotech/2007/modules/bioperl- 1.4.ziphttp://pdb.eng.uiowa.edu/~tabraun/biotech/2007/modules/bioperl- 1.4.zip The "official version can be found from here: (http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- live.tar.gz?tarball=1) Move this zip file into your Eclipse "workspace" directory and unzip it (mine is H:\windowsdata\workspace) You will need an "unzip" program. Most default versions of XP comes with one. If you don't have one, you can download a free one: –http://www.download.com/jZip/3000-2250_4- 10761563.html?tag=lst-6http://www.download.com/jZip/3000-2250_4- 10761563.html?tag=lst-6 Now in your perl program -- you will need to add line: use lib "H:\windowsdata\workspace\bioperl-live";

35 35 BioPerl continued Depending on if your "zip" program creates a directory for you, you may have to put in: use lib "H:\windowsdata\workspace\bioperl-live\bioperl-live"; You will also need 2 other modules (DBD and DBI). These are used by the Ensembl modules to allow a perl program to connect to a mySql database. DBI - Database independent interface for Perl DBD::mysql - MySQL driver for the Perl5 Database Interface (DBI) I tried to compile a library for Windows to make availabe -- but was unable to get it to work. Therefore I asked CSS to install these two modules for me -- since I do not have administrative permission on CSS nodes.


Download ppt "1 Ensembl Modules and MySQL. SQL and Database Tables Quick Examples 2."

Similar presentations


Ads by Google