Overview of I519 & Introduction to Bioinformatics.

Slides:



Advertisements
Similar presentations
Lesson Overview 1.3 Studying Life.
Advertisements

Chapter 25/26 Taxonomy and Biodiversity Evolutionary biology The major goal of evolutionary biology is to reconstruct the history of life on earth ►Process:
Revisions to the “central dogma” of molecular biology over the last 10 years, scientists have discovered an entirely new category of non-coding RNA genes.
Lecture 2 Molecular Biology Primer Saurabh Sinha.
Bioinformatics at IU - Ketan Mane. Bioinformatics at IU What is Bioinformatics? Bioinformatics is the study of the inherent structure of biological information.
1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Genome Structure 12 Jan, Nature of DNA Transformation (uptake of foreign DNA) in prokaryotes and eukaryotes has repeatedly shown that DNA is hereditary.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
BIOS 307: Biochemical Metabolism Textbook: Lehninger Principles of Biochemistry;
Bioinformatics Lecture 2. Bioinformatics: is the computational branch of molecular biology Using the computer software to analyze biological data The.
Alternative splicing and evolution Daniel Jeffares.
Introduction to Genetics
Lecture 1: Introduction Dr. Mamoun Ahram Faculty of Medicine Second year, Second semester, Principles of Genetics and Molecular Biology.
Comparative Genomics of the Eukaryotes
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
BIOS 307: Biochemical Metabolism Textbook: Lehninger Principles of Biochemistry;
The Science of Life Biology unifies much of natural science
Elements of Molecular Biology All living things are made of cells All living things are made of cells Prokaryote, Eukaryote Prokaryote, Eukaryote.
Unit 1: The Language of Science  communicate and apply scientific information extracted from various sources (3.B)  evaluate models according to their.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Microbial taxonomy and phylogeny
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
CSE 6406: Bioinformatics Algorithms. Course Outline
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Everyone is a Biologist ! Chapter 1 What is Life?
Introduction to Bioinformatics Prologue. Bioinformatics Living things have the ability to store, utilize, and pass on information Bioinformatics strives.
Challenges in Bioinformatics Part I
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
Genomics Lecture 8 By Ms. Shumaila Azam. 2 Genome Evolution “Genomes are more than instruction books for building and maintaining an organism; they also.
1 Classification. 2 What is Classification? Classification is the arrangement of organisms into orderly groups based on their similarities. Taxonomy is.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
1 Classification M.Bregar (Dante C.S.S.). 2 There are 13 billion known species of organisms There are 13 billion known species of organisms This is only.
1 Classification. 2 There are 13 billion known species of organisms There are 13 billion known species of organisms New organisms are still being found.
Studying Life Vodcast 1.3 Unit 1: Introduction to Biology.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Chapter 21 Eukaryotic Genome Sequences
Genomics and Arabidopsis. What is ‘genomics’? Study of an organism’s entire genome –All the DNA encoded in the organism –Nucleus, mitochondria, chloroplasts.
Overview of Bioinformatics 1 Module Denis Manley..
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Bioinformatics and Computational Biology
Chapter 1 Introduction.
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
1 A View of Life. 2 The Human Genome Project mapped the complete set of human genes Genomics analyses the DNA sequence of an organism While genomics will.
Chapter 18 Classification.
SPECIATION & CLASSIFICATION Allopatric Speciation.
The Future of Genetics Research Lesson 7. Human Genome Project 13 year project to sequence human genome and other species (fruit fly, mice yeast, nematodes,
Today’s Goals Describe the advantages of C. elegans as a model organism Discuss the life cycle of the nematode Safely and effectively culture a population.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
Introduction to Bioinformatics Resources for DNA Barcoding
MCB 7200: Molecular Biology
Introduction to Bioinformatics and Functional Genomics
Introduction to Genetics and Biotechnology
Introduction to Genetics and Biotechnology
What is Bioinformatics?
Genomes and Their Evolution
BIOL 2416 Chapter 1: Genetics: An Introduction
Summary of the Standards of Learning
The Study of Biological Information
Introduction to Bioinformatic
BSC1010: Intro to Biology I K. Maltz Chapter 21.
Presentation transcript:

Overview of I519 & Introduction to Bioinformatics

Structure of I519  Two classes and one lab each week  Python, C (a little bit R)  Textbook: Understanding Bioinformatics  Homework assignments (~5 in total)  Grading: –midterm exam (25%) + final exam (25%) + assignments (30%) + class project (15%) + attendance (5%)  Course webpage:

What’s Bioinformatics  "Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three important sub-disciplines within bioinformatics: the development of new algorithms and statistics with which to assess relationships among members of large data sets; the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and the development and implementation of tools that enable efficient access and management of different types of information.” (NCBI)  "I do not think all biological computing is bioinformatics, e.g. mathematical modelling is not bioinformatics, even when connected with biology-related problems. In my opinion, bioinformatics has to do with management and the subsequent use of biological information, particular genetic information.” (Durbin) What’s bioinformatics

Bioinformatics vs Computational Biology  Almost interchangeable  Computational biology may be broader –Computational biology is an interdisciplinary field that applies the techniques of computer science, applied mathematics and statistics to address biological problems (wikipedia) –Includes bioinformatics What’s bioinformatics

Impacts of Bioinformatics  On biological sciences (and medical sciences) –Large scale experimental techniques –Information growth  On computational sciences –Biological has become a large source for new algorithmic and statistical problems! What’s bioinformatics

Related Fields  Proteomics/genomics (metagenomics)/ comparative genomics/structural genomics  Chemical informatics  Health informatics/Biomedical informatics  Complex systems  Systems biology  Biophysics  Mathematical biology –tackles biological problems using methods that need not be numerical and need not be implemented in software or hardware What’s bioinformatics

Bioinformatics Problems/Applications Figure from “Bioinformatics dummies” What’s bioinformatics

Biology Primer Figure 1-1 Molecular Biology of the Cell Multicullar organisms Eggs Cell divisions Underlying the diversity of life is a striking unity: DNA is universal genetic language; Cells are the basic units of structure and function Biology primer

Cells are the Basic Unit of Life  Cell Theory –All organisms are made up of cells –The cell is the basic living unit of organization for all organisms –All cells come from pre-existing cells by division –Cells contains hereditary information which is passed from cell to cell during cell division. –All cells are basically the same in chemical composition –All energy flow (metabolism & biochemistry) of life occurs within cells  Organisms can be of single cells or multiple cells (multicellular organisms) −Most living organisms are single cells (e.g., E.coli, Yeast) −Multicellular organisms (e.g., human has more than cells. Have no idea about this number? World population as of July 2008 is billion, (1 billion = 10 9 ) Biology primer

Animal cell structure Cell Structures Prokaryotic cell structure Biology primer

Scale Down to the Atomic Level Figure 9-1 Molecular Biology of the CellFigure 9-2 Cell Biology primer

The Central Dogma DNARNAProtein RNA virus retrovirus TranslationTranscription The flow of genetic information in cells is from DNA to RNA to protein. All cells, from bacteria to humans, express their genetic information in this way—a principle so fundamental that it is termed the central dogma of molecular biology. Biology primer

DNA and Replication Figure 1-2 Molecular Biology of the Cell, Fifth Edition Biology primer

From DNA (to RNA) to Protein Biology primer

The Genetic Code Biology primer

Genome  Definition –Genome of an organism is its whole hereditary information and is encoded in the form of DNA (or, for some viruses, RNA) –Chromosome: structure composed of a long DNA and associated proteins; human has 46 chromosomes  DNA sequences can be determined by various sequencing techniques  Sequence first. Ask questions later –Cell Oct 4;111(1):13-6 Biology primer

CharacteristicArchaeaBacteriaEukaryote s Predominately multicellularNo Yes DNA structurecircular linear Cytoplasma is compartmentalized No Yes Introns are present in most genes No Yes Photosynthesis with chlorophyll NoYes Histone proteins present in cell YesNoYes Three (Super)Kingdoms Biology primer

Organisms at Pivotal Positions in the Tree of Life E.coli: 1997 Cell Oct 4;111(1):13-6 Fly: 2000 Worm: 1998 Biology primer

Model Organisms  A model organism is a species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the organism model will provide insight into the workings of other organisms.  Genetic models (with short generation times, such as the fruit fly and nematode worm), experimental models, and genomic models, with a pivotal position in the tree of life Biology primer

Escherichia coli (E. coli)  A common gut bacterium, is the most widely- used organism in molecular genetics.  Some strains of E. coli are capable of causing disease under certain conditions  Different strains of E. coli have been extensively studied  Whole genome of several E. coli strains was sequenced (e.g., K-12, O157:H7, HS) Biology primer

The Genome of E. coli K-12 Figure 1-29 Molecular Biology of the Cell, Fifth Edition (© Garland Science 2008) Circular DNA: a single, closed loop Protein-coding genes RNA genes The whole genome was sequenced in 1997 Total 4,639,221 bp. Biology primer

Caenorhabditis elegans  C. elegans is a eukaryote (nematodes, or round worms)  Has small genome (~97megabases) (whole genome sequencing, 1998)  C. elegans is easy to maintain in the laboratory (in petri dishes) and has a fast and convenient life cycle. –the life span is 2-3 weeks. –tiny (1 mm in length) and transparent organism and the developmental pattern of all 959 of its somatic cells has been traced. somatic cell: any cell of a plant or animal other than cells of the germ line (from Greek soma, body) Biology primer

Caenorhabditis elegans (Cont.)  Discovery of the mechanism of RNA interference in C. elegans (1998) –Andrew Fire and Craig C. Mello shared the Nobel Prize in Physiology or Medicine in 2006 –Silencing was triggered efficiently by injected dsRNA, but weakly or not at all by sense or antisense single-stranded RNAs Biology primer

Drosophila melanogaster (fruit fly)  It has been used as a model organism for over 100 years, widely used to study genetic and development biology –Small and has a simple diet. –Short life cycle: taking about two weeks –Have large polytene chromosomes, whose barcode patterns of light and dark bands allow genes to be mapped accurately  It was chosen in 1990 as one of the model organisms to be studied under the auspices of the federally funded Human Genome Project  Whole genome sequenced in 2000  >10 Drosophila genomes have been sequenced  FlyBase: Biology primer

Species Classification  Classification is arrangement of organisms into orderly groups based on their similarities  Also known as taxonomy  Provide accurate and uniform naming system Biology primer

Linnaean System of Classification  Carolus Linnaeus (the “father of taxonomy”) -- the first widely accepted hierarchical scheme, which consists today of 7 categories (kingdom, phylum, class, order, family, genus, and species) (not including domain)  Species is the most basic unit of biological classification (means “kind” in Latin) –Each species is different, and reproduces itself faithfully –Heredity is a central part of the definition of life  The Linnaean system uses two Latin name categories, genus and species, to designate each type of organism –Salmonela saintpaul (which caused the latest food-borne disease) –Capitalize the genus, but not the species; italicized in print Biology primer

Homo sapiens Domain: Eukaryotes Kingdom: Matazon (many-celled animal) Phylum: Chordata (characterized by a notochord, nerve cord, and gill slits) (subphylum: Vertebrata) Class: Mammalia (warm-blooded vertebrates) Order: Primates Family: Hominidae Genus: Homo Species: Sapiens King Philip Came Over For Gooseberry Soup Biology primer

Gene/Protein Family  A protein/gene family is a group of evolutionarily related proteins/genes  Genes/proteins of the same family typically have similar functions (and structures for proteins) and with sequence similarity  There are far more genes/proteins than the number of families—which shows the advantage of grouping genes/proteins into different families Biology primer

Evolution of Genes  New genes are generated from preexisting genes –Intragenic mutation (modified by changes in DNA sequence – errors occurred in the process of DNA replication) –Gene duplication – two copies of genes may then diverge in the course of evolution –Segment shuffling –Horizontal transfer Biology primer

Analysis of Gene/Protein Families – Key Problems in Bioinformatics  Homolog detection  Alignment (the residual-level mapping among homologous genes/proteins)  Application of the alignments –Detect the conserved residues – functional sites –Prediction of protein structures –Motif finding (cis-elements)  Phylogeny  Function annotation None of these problems have been solved! More on what’s bioinformatics

Is Protein A Related/Similar to Protein B?  Sequence similarity (alignment!)  Structure similarity (structural comparison)  Co-expression (Microarray data analysis)  Any types of correlation (operon-structure, etc) You will see this question again and again! More on what’s bioinformatics

Guilty by Association More on what’s bioinformatics

Computational Abstractions: Biological Sequences as Strings DNARNAProteinPhylotype DNA A string in a four-letter alphabet RNA Protein More on what’s bioinformatics

Computational Abstractions: Networks (and Others) as Graphs  Protein-protein interaction network  Protein structures presented as graphs  Gene functions presented as graphs (Gene ontology)  Metabolic pathways as graphs (directed) More on what’s bioinformatics

Large Scale Data Analysis  Genome scale –genome, proteome, transcriptome  Metagenome scale –Metagenome, metaprotome, metatranscriptome More on what’s bioinformatics

More than Implementation  Find old/new biological problems –Remember biology has become a large source for new algorithmic and statistical problem  Formulate as a computational problem –Define inputs and outputs –(though there are many paper work on well- defined bioinformatics problems)  Apply existing algorithms and/or tools to solving your problem  Develop new ones if necessary  Implement your algorithms with appropriate programming language(s) More on what’s bioinformatics

Where Can I Get the Biological Data?  Sequences –NCBI genbank –Swissprot  Structures –PDB  Genomes –NCBI, IMG, GOLD –Specialized genome resources Ensembl: selected eukaryotic genomes.  Others –KEGG, SEED (biological pathways) More on what’s bioinformatics

Dealing with Databases  Databases are the backbone of bioinformatics research  Flat files were the first type of database; and are still used today  Rational databases are good for searching purposes  Databases can contain data and annotations of data –Primary and derived (secondary) data

Readings  Biology primer (available at the course website)  Anything about Python and/or C (if you have no programming experience at all)  What’s in the textbook? –Chapter 1 ( The Nucleic Acid World) –Chapter 2 (Protein Structure) –Chapter 3 (Dealing With Databases)