Presentation is loading. Please wait.

Presentation is loading. Please wait.

Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.

Similar presentations


Presentation on theme: "Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011."— Presentation transcript:

1 Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011

2 BIOINFORMATICS

3 Bioinformatics Combination of: – Theory and methods (algorithms, statistical methods, machine learning, …) – Applications (sequence analysis, genome assemblies, databases,... ) – Different kinds of datasets (sequence data, microarray, next-gen data, …)

4 Biology Core Concepts Molecular biology Systems biology Evolutionary theory Common lab techniques Sequence comparison Phylogenetic analysis

5 Computer science Programming Database querying Data mining Visualization Machine learning Modeling …

6 Data exceeds analysis Bioinformatician data

7 How to survive? Knowledge of Linux/Unix Scripting: Perl/Python Network based data storage Knowledge biology, genomics Database structures Try to keep up with all new tools!

8 Benifit of using (Bio)perl, example You have a 1000 sequences to blast and analyse… You can do this manually Or… use a perlscript to do this for you and present you the final results!

9 Good journals to keep up the pace Bioinformatics ( http://bioinformatics.oxfordjournals.org/ ) BMC Bioinformatics ( http:// www.biomedcentral.com/bmcbioinformatics/ ) PLoS Computational Biology ( http://www.ploscompbiol.org/ )...

10 DATABASES

11 Types of databases DNA databases Protein databases Genome databases Microarray databases Next-Gen seq databases

12 What to find in databases? Sequences Motifs Mutations, SNPs Gene ineraction profiles Interactions (protein protein interactions) Transcription factor binding sites Etc…

13 Databases? Good Reference http://nar.oxfordjournals.org annual edition http://nar.oxfordjournals.org

14 NCBI: lot of options… feed the need

15 Amino acid databases Uniprot – SWISS-PROT – TrEMBL – PIR

16 Uniprot http://www.uniprot.org Good quality, curated Minimal redundancy Extensive cross linking to useful databases

17 Structural databases Structure leads to function! – Protein Data Base – PDB http://www.pdb.org – SCOP & CATH databases (structural classification) http://scop.mrc- lmb.cam.ac.uk/scop/ ; http://www.cathdb.info/http://scop.mrc- lmb.cam.ac.uk/scop/http://www.cathdb.info/

18 Structure prediction (modeling)  SWISS-MODEL & Repository ( http:// swissmodel.expasy.org/ )  MODELLER & MODBASE ( http://salilab.org )  Study of interactions (docking) & drug design

19 SNPs and pharma To collect, encode, and disseminate knowledge about the impact of human genetic variations on drug response. http://www.pharmgkb.org/

20 DNA Microarray Databases Standard: MIAME = minimum information about microarray experiment Databases: – ArrayExpress (EBI) http://www.ebi.ac.uk/arrayexpress/ – GEO (NCBI) http://www.ncbi.nlm.nih.gov/geo/ Check the database before planning an experiment!

21 Next gen data database http://www.ncbi.nlm.nih.gov/Traces/sra http://www.ebi.ac.uk/ena http://www.ddbj.nig.ac.jp/sub/trace_sra- e.html http://www.ddbj.nig.ac.jp/sub/trace_sra- e.html

22 GENOME BROWSERS

23 Human reference sequences Celera Huref GRCh37 Three reference genomes. Keep this in mind when browsing databases!

24 Useful Genome Browsers Ensembl: http://www.ensembl.org/http://www.ensembl.org/ NCBI Map Viewer: http://www.ncbi.nlm.nih.gov/mapview/map _search.cgi? http://www.ncbi.nlm.nih.gov/mapview/map _search.cgi? UCSC: http://genome.ucsc.edu/http://genome.ucsc.edu/

25 Genome browser: Ensembl

26 EMBL Problems Lots of redundancy Wrong or old annotations Vector contamination Errors in sequences

27 Refseq Better option, NCBI reference Curated Annotations are controlled No redundancy

28 NCBI:Genbank vs RefSeq http://www.ncbi.nlm.nih.gov/RefSeq/ http://www.ncbi.nlm.nih.gov/RefSeq/ Sequence records are created by scientists who submit sequence data to GenBank. As an archival database, GenBank may contain hundreds of records for the same gene. In addition, because there is no independent review system, the types of information may vary from record to record, and GenBank sequence data may contain errors and contaminant vector DNA. To address some of the problems associated with GenBank sequence records, NCBI developed its RefSeq database.

29 Refseq accession numbers NM_ mRNA (provisional, predicted, reviewed) NP_ protein (provisional, predicted, reviewed) NR_ non-coding RNA (provisional, reviewed) NG_ human genes (provisional, reviewed) NC_ chromosomes, complete genomes (provisional, reviewed)

30 Refseq accession numbers (2) XM_ predicted mRNA (model) XP_ predicted protein (model) XR_ predicted non-coding RNA (model) NT_ human and mouse genomic contiqs (model) NW_ mouse supercontiqs (model)

31 Genome browser: NCBI

32 Genome browser: UCSC Example: UCSC Good tutorial: – http://www.openhelix. com/downloads/ucsc/ ucsc_home.shtml http://www.openhelix. com/downloads/ucsc/ ucsc_home.shtml

33 SNPS AND DISEASE RESEARCH

34 SNPs and disease research Association analysis, disease related (?), mapping genome variation… Reference = dbSNP database

35 Example NCBI SNP database, SNP rs33957964

36 Other useful SNPs databases Genome variation center http://gvs.gs.washington.edu/GVS/http://gvs.gs.washington.edu/GVS/ HapMap (Ensembl) http://hapmap.org/http://hapmap.org/ List of all: http://www.hgvs.org/dblist/ccent.html http://www.hgvs.org/dblist/ccent.html

37 Clinical Bioinformatics Microarrays, omics data (genomics, proteomics, interactomics, metabolomics, …) Combination of bioinformatics and medical informatics

38 ALGORITHMS AND TOOLS

39 Algorithms Fundaments for bioinformatic tools – Implemented in ‘front end tools’ (website, Java applications) Can be slow Good for smaller analysis, quick mining – Scripts, programs - use in command line (e.g.local BLAST) Usually local install on server faster large queries, long analysis time required Knowledge of linux/unix essential

40 Hall of Fame Linux operating system, mySQL database (Bio)Perl: programming language  making your life easier! Blast/Blat: comparing sequences Phylip: Phylogenetic analysis, tree building ClustalW: Multiple alignment MEGA5: Multiple alignment and editing sequences HMMER: comparative genomics EMBOSS: combining several tools for sequence analysis Open sourcce  Free to use and develop

41 Tools? Good Reference http://nar.oxfordjournals.org/ - annual edition http://nar.oxfordjournals.org/

42 Analysing next gen sequencing data Different tools for different formats – Roche – Applied Biosystems – Illumina

43 Next gen tools FastQC: quality assesment of FASTQ files

44 Assembly tools next gen A number of specialized tools exist: ABySS, gap4, Geneious, Mira, Newbler, SSAKE, SOAPdenovo, Velvet, …

45 Galaxy! http://galaxy.psu.edu/http://galaxy.psu.edu/ Galaxy provides a web-based application for the analysis of sequence data Includes many tools including NGS data Makes your life easier, less linux knowledge

46 On the cloud

47 Structure Galaxy

48 Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011 So this is why you need a bioinformatician in the lab!!


Download ppt "Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011."

Similar presentations


Ads by Google