Introduction to bioinformatics

Introduction to bioinformatics
Barbera van Schaik Bioinformatics Laboratory Academic Medical Centre (AMC)

Schedule 2013 See (Education) for course material

Databases and ontologies
Genome analysis Bioimage informatics Sequence analysis Databases and ontologies Phylogenetics Subjects in bioinformatics Data and text mining Systems biology Structural bioinformatics Gene expression Genetics and population analysis

What is not in the schedule?
Proteomics (MS, MS/MS, LC-MS) Systems Biology (mathematical modeling) Information Management (Multivariate) statistics Phylogenetics Genotype – Phenotype analysis Text mining Visualization (high-throughput) biomolecular imaging Comparative genomics Protein modeling / protein docking Experimental design etc

Other courses http://www.amc.nl/graduateschool/
Computing in R – October 2013 DNA technology –March 2013 Practical biostatistics – September 2013 Introduction to next generation sequencing Analysis of microarray gene expression data NGS data analysis EBI roadshow at AMC – 9+10 April 2013

Bioinformatics

Lost in translation (biology and informatics)
Inheritance Protocol NGS Infrastructure String Sequence Sonic hedgehog Virus

Information Management / e-Science
Clinical genomics Health care Molecular Biology/ Genomics Information Management / e-Science

Definitions of bioinformatics
Adapted definition according to Wikipedia The application of information technology and statistics to the field of molecular biology. The creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management, analysis and interpretation of biological data.

Bioinformatics Extraction of biological knowledge from complex data

Bioinformatics And information management.........
*convert data to knowledge *generate new hypotheses Experimental Data Knowledge From public databases *Design new experiments And information management

How does molecule A interact with protein B?
A schematic visual model of oxygen-binding process, showing all four monomersand hemes, and protein chains only as diagramatic coils, to facilitate visualization into the molecule. (

Study migration

Study evolution

Which gene(s) causes disease X?
Study of families with particular disease. Some people are affected (in grey) Search for mutations or genes which are involved (bioinformatics) About chances that a particular gene is important (biostatistics)

Bioinformatics Laboratory (AMC) Medical Bioinformatics & e-Bioscience
Part of the KEBB You are welcome if you need bioinformatics expertise

Netherlands Bioinformatics Centre (NBIC)
Research, Support and Education

Other bioinformatics organisations
European Bioinformatics Institute (EBI) National Center for Biotechnology Information (NCBI) EMBnet International Society for Computational Biology (ISCB)

Profile of a bioinformatician
(General) knowledge of biology and genome sciences Translation biology <-> informatics Knowledge of Unix-based operating systems Programming skills (Java, Python, Shell/Perl scripting, R) (Parallel) computing environments Data storage and database technology Statistics Mathematics Tux from: Freely adapted from Richter et al (2009) PLoS computational biology

People who contribute to bioinformatics
Jim Kent Michael Eisen Lincoln Stein Michael Eisen: cluster analysis microarrays (background: biology) Lincoln Stein: reactome, wormbase, bioperl, encode, gmod, cloud computing (background: medicine and cell biology) Jim Kent: human genome project (background: graphics) Robert Gentleman: R, Bioconductor (background: statistics) Carol Goble: e-science (background: computer science) Ewan Birney: interpro, bioperl, encode project (background: biochemistry) Women in bioinformatics: ~25% ( Robert Gentleman Carol Goble Ewan Birney

History of bioinformatics
1965 Margaret Dayhoff's Atlas of Protein Sequences 1970 Needleman-Wunsch algorithm (global alignment) 1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment) 1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall. 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching 1990 The HTTP 1.0 specification is published. First HTML document. 1990 Grid computing as a metaphor for making computer power as easy to access as an electric power grid. 1994 EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache. 1997 PSI-BLAST 1997 International Society for Computational Biology was founded 1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology 2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published. 2001 Minimum information about a microarray experiment (MIAME; Brazma). 2001 Genetical Genomics (Ritsert Jansen) 2002 BioMoby. Web-service repository 2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna). 2004 Bioconductor: open software development for computational biology and bioinformatics 2005 Reactome: knowledge base of biological pathways History of bioinformatics

Publication history

1965 Margaret Dayhoff's Atlas of Protein Sequences
1970 Needleman-Wunsch algorithm (global alignment) 1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment) 1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall. 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching 1990 The HTTP 1.0 specification is published. First HTML document. 1990 Grid computing as a metaphor for making computer power as easy to access as an electric power grid. 1994 EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache. 1997 PSI-BLAST 1997 International Society for Computational Biology was founded 1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology 2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published. 2001 Minimum information about a microarray experiment (MIAME; Brazma). 2001 Genetical Genomics (Ritsert Jansen) 2002 BioMoby. Web-service repository 2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna). 2004 Bioconductor: open software development for computational biology and bioinformatics 2005 Reactome: knowledge base of biological pathways

1970 Needleman-Wunsch algorithm (global alignment)
1965 Margaret Dayhoff's Atlas of Protein Sequences 1970 Needleman-Wunsch algorithm (global alignment) 1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment) 1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall. 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching 1990 The HTTP 1.0 specification is published. First HTML document. 1990 Grid computing as a metaphor for making computer power as easy to access as an electric power grid. 1994 EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache. 1997 PSI-BLAST 1997 International Society for Computational Biology was founded 1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology 2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published. 2001 Minimum information about a microarray experiment (MIAME; Brazma). 2001 Genetical Genomics (Ritsert Jansen) 2002 BioMoby. Web-service repository 2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna). 2004 Bioconductor: open software development for computational biology and bioinformatics 2005 Reactome: knowledge base of biological pathways

Global alignment (toy example)
CATGATGA CTGAGAT Can you “align” these two sequences  introduce “gaps” in these two sequences such that you maximize the number of matching nucleotides

Global alignment (toy example)
CATGATGA CTGAGAT CATGATGA- C-TGA-GAT Helps us to understand the function of ‘new’DNA Dynamic programming gives optimal solution… … but is slow. Often heuristic methods are used (BLAST, BLAT)

1978 Paulien Hogeweg (1943) Dutch theoretical biologist and complex systems researcher studying biological systems as dynamic information processing systems at many interconnected levels. Together with Ben Hesper she coined the term Bioinformatics in 1978 as the study of informatic processes in biotic systems Hogeweg, P. (1978). Simulating the growth of cellular forms. Simulation 31, 90-96; Hogeweg, P. and Hesper, B. (1978) Interactive instruction on population interactions. Comput Biol Med 8:

1981 Smith-Waterman algorithm developed (local sequence alignment)
1965 Margaret Dayhoff's Atlas of Protein Sequences 1967 Scientific director of NBIC was born 1970 Needleman-Wunsch algorithm (global alignment) 1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment) 1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall. 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching 1990 The HTTP 1.0 specification is published. First HTML document. 1990 Grid computing as a metaphor for making computer power as easy to access as an electric power grid. 1994 EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache. 1997 PSI-BLAST 1997 International Society for Computational Biology was founded 1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology 2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published. 2001 Minimum information about a microarray experiment (MIAME; Brazma). 2001 Genetical Genomics (Ritsert Jansen) 2002 BioMoby. Web-service repository 2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna). 2004 Bioconductor: open software development for computational biology and bioinformatics 2005 Reactome: knowledge base of biological pathways

1965 Margaret Dayhoff's Atlas of Protein Sequences
1967 Scientific director of NBIC was born 1970 Needleman-Wunsch algorithm (global alignment) 1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment) 1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall. 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching 1990 The HTTP 1.0 specification is published. First HTML document. 1990 Grid computing as a metaphor for making computer power as easy to access as an electric power grid. 1994 EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache. 1997 PSI-BLAST 1997 International Society for Computational Biology was founded 1998 Worm (multicellular) genome completely sequenced 1999 e-Science was introduced by John Taylor, the Director General of the United Kingdom's Office of Science and Technology 2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published. 2001 Minimum information about a microarray experiment (MIAME; Brazma). 2001 Genetical Genomics (Ritsert Jansen) 2002 BioMoby. Web-service repository 2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna). 2004 Bioconductor: open software development for computational biology and bioinformatics 2005 Reactome: knowledge base of biological pathways

1988 EMBnet network for database distribution

1998 Worm (multicellular) genome completely sequenced

2001 Genetical Genomics (Ritsert Jansen, Jan Peter Nap)
1965 Margaret Dayhoff's Atlas of Protein Sequences 1967 Scientific director of NBIC was born 1970 Needleman-Wunsch algorithm (global alignment) 1977 DNA sequencing and software to analyze it (Staden) 1981 Smith-Waterman algorithm developed (local sequence alignment) 1981 The concept of a sequence motif (Doolittle) 1982 GenBank made public 1983 Sequence database searching algorithm (Wilbur-Lipman) 1987 Perl (Practical Extraction Report Language) is released by Larry Wall. 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM 1988 EMBnet network for database distribution 1990 BLAST: fast sequence similarity searching 1990 The HTTP 1.0 specification is published. First HTML document. 1990 Grid computing as a metaphor for making computer power as easy to access as an electric power grid. 1994 EMBL European Bioinformatics Institute (EBI), Hinxton, UK 1995 Microsoft version 1.0 of IE. Sun version 1.0 of Java. Version 1.0 of Apache. 1997 PSI-BLAST 1997 International Society for Computational Biology was founded 1998 Worm (multicellular) genome completely sequenced 1999 The term e-Science was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology 2000 Gene Ontology (GO) 2001 The human genome (3 Giga base pairs) is published. 2001 Minimum information about a microarray experiment (MIAME; Brazma). 2001 Genetical Genomics (Ritsert Jansen, Jan Peter Nap) 2002 BioMoby. Web-service repository 2003 myGrid: personalised bioinformatics on the information grid (e.g, Taverna). 2004 Bioconductor: open software development for computational biology and bioinformatics 2005 Reactome: knowledge base of biological pathways Expression profiling and marker-based fingerprinting of each individual of a segregating population Quantitative trait loci analysis

1976 Pauline Hogeweg (theoretical biology) 1979 Gert Vriend (proteins)
Bioinformatics in the Netherlands 1976 Pauline Hogeweg (theoretical biology) 1979 Gert Vriend (proteins) 1985 Computer Assisted Organic Synthesis/Computer Assisted Molecular Modelling Centre (CAOS/CAMM) was founded (Nijmegen, Jan Noordik) 1989 Jack Leunissen (first Dutch researcher with PhD in Bioinformatics) 90 ‘s Driving forces: Herman Berendsen, Charles Buys, Jacob de Vlieg 1999 CAOS/CAMM was reorganized; Gert Vriend becomes director of CMBI. 1999 KNAW committee(chaired by Berendsen) wrote the report ‘Bioexact’ in which strong stimulation of bioinformatics was recommended. 2000 KNCV working group bioinformatics 2000 NWO-BMI (Biomolecular informatics); program committee chaired by De Vlieg 2001 NWO/KNAW workshop ‘The future of bioinformatics in the Netherlands’ 2002 Position paper ‘De toekomst van de bioinformatica in Nederland’ representing the vision of the NWO/KNAW 2003 NBIC was founded 2003 First BioRange proposal (Vriend, Berendsen, Hertzberger, Tellegen) 2005 Start of BioRange (NBIC-I) 2008 ……………

Open Source & Open Access, Open Standards, WIKI

Bioinformatics tools and databases
Many different bioinformatic tools are freely available BLAST, EMBOSS, EnsEMBL, GenScan, BioConductor, Many different biological databases are freely available GenBank, UniProtKB, KEGG, Many publications in open access journals BMC bioinformatics PLoS computational biology Also many commercial software packages available Spotfire, Rosetta Resolver, Genelogic, Bioinformaticians write their own tools for specialized tasks

Nucleic acid research – annual web server issue

Cytoscape Lesser General Public License (LGPL)

Open Access Unrestricted access to data
Allows researchers to use data and make discoveries Discoveries are not necessarily open access Open access is applicable to any kind of data you want to apply it to: Sequence data (DNA, RNA or protein) Gene expression data Protein-protein interaction data Literature

Why Open Standards? Freedom to operate No vendor lock-in Share
Collaborations Flexibility Examples Protocols TCP/IP, SOAP File formats ODF, PNG, HTML

The Original Wiki-Wiki
Shuttlebus on Honolulu (quick)

What is a wiki? Open source Management of Collaboration
Issues Projects Documents Knowledge Collaboration Add, edit, delete web page content

Wikipedia

www.bioinformaticslaboratory.nl == TWIKI

WikiPathways

@AMC: genomics, metabolomics
Bioinformatics Extraction of biological knowledge from complex data @AMC: genomics, metabolomics and proteomics data

What is genomics? The application of high-throughput automated technologies to molecular biology. OR The experimental study of complete genomes. On a large scale: genomics (since the last ~15 years)

DNA microarrays Short summary, each spot is one gene (actually a few spots per gene) Intensity according to amount of sample RNA Each spot consists of single stranded nucleotide sequences Each spot represents a gene (if questions by courseparticipants: actually a few spots per gene, drawing on whiteboard of gene with probes) One sample (e.g. Healthy tissue) labelled green, one tissue (e.g. Disease tissue) in red Mixture over plate Competition of two samples Intensity is measured Red = more RNA of disease tissue Green = more RNA of healthy tissue Yellow = equal amount, no change Black = no RNA from either sample Other techniques: one sample at the time, e.g. Affymetrix, not a "ratio" measure, but quantitive measure per sample

Automated DNA sequencing
Sanger sequencing. Capillary system. Development in sequencing process was very important for HGP.

High throughput sequencing
Roche, 454 Applied biosystems, SOLiD Illumina, Solexa Now-a-days even faster sequencing machines At the moment 3 types In AMC we have the solid machines One run takes... produces... costs... Analysis not doable by hand -> bioinformatics

Sample storage Sample have to be stored and kept track of. (LIMS)

Confused by genomics? Genomics Transcriptomics Proteomics Metabolomics
Nutrigenomics Pharmacogenomics Epigenomics Infectomics Patientomics other 'omics' Genomics: study of genomes on large scale Transcriptomics: study of gene expression on large scale Proteomics: study of proteins on large scale Etc

Sign @ Wellcome-Sanger, Cambridge, UK
Anyway... You have loads of data Therefore: Bioinformatics to the rescue image credit: Digital Vision, PhotoDisc, Matt Ray/EHP Wellcome-Sanger, Cambridge, UK

Introduction to bioinformatics

Similar presentations

Presentation on theme: "Introduction to bioinformatics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to bioinformatics

Similar presentations

Presentation on theme: "Introduction to bioinformatics"— Presentation transcript:

Similar presentations

About project

Feedback