Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topics The topics: basic concepts of molecular biology more on Perl

Similar presentations


Presentation on theme: "Topics The topics: basic concepts of molecular biology more on Perl"— Presentation transcript:

1 Topics The topics: basic concepts of molecular biology more on Perl
overview of the field biological databases and database searching sequence alignments phylogenetic trees protein structure prediction microarray data analysis

2 The Human Genome Project
The human genome sequence is complete - almost - approximately 3 billion base pairs. 23 chromosomes, starting from 1990 Some of these slides are adapted from Lecture Notes of Stuart M. Brown at NYU

3 Whole genome sequencing has now become routine

4 How does the human genome stack up?
Organism Genome Size (Bases) Estimated Genes Human (Homo sapiens) 3.2 billion 25,000 Laboratory mouse (M. musculus) 2.6 billion Mustard weed (A. thaliana) 100 million Roundworm (C. elegans) 97 million 19,000 Fruit fly (D. melanogaster) 137 million 13,000 Yeast (S. cerevisiae) 12.1 million 6,000 Bacterium (E. coli) 4.6 million 3,200 Human immunodeficiency virus (HIV) 9700 9 U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003

5 The Path Forward How does DNA impact health? What do all the genes do?
Identify and understand the difference in DNA sequence (A,T,C,G) among human populations What do all the genes do? Discover the functions of human genes by experimentation and by finding genes with similar funcs in the model organisms What are the functions of nongene areas? Identify important elements in the nongene regions of DNA How does info in the genome enable life? Explore life at the ultimate level of the whole organism instead of single genes/proteins. U.S. Department of Energy, 2005

6 Diverse applications Medicine – customized treatments, …
Microbes for energy and the environment – generate clean energy source, clean up toxic wastes,… Bioanthropology – human lineage Agriculture, livestock breeding, Bioprocessing – crops&animals more resistant to diseases, efficient industrial processes,… DNA identification – implicate people accused of crimes, identify contaminants in air, water, … U.S. Department of Energy, 2005

7 Genomics: Journey to the Center of Biology
Without doubt, the greatest achievement in biology over the past millennium has been the elucidation of the mechanism of heredity. The instructions for assembling every organism on the planet are all specified in DNA sequences that can be translated into digital information and stored in a computer for analysis. As a consequence of this revolution, biology in the 21st century is rapidly becoming an information science. Powerful new types of bioinformatics will clearly be required to assimilate and interpret the data that will issue from various types of genomics research. Eric Lander & Robert Weinberg, Science, 2000

8 Nucleic Acid Sequence Databases
the principal nucleic acid sequence databases are GeneBank, EMBL and DDBJ, which each collect a portion of the total sequence data reported world-wide, and exchange new and updated entries on a daily basis Nucleic acid sequence Databases EMBL (European Molecular Biology Laboratory) GenBank (USA) DDBJ (DNA Data Bank of Japan) ENSEMBL (project between EMBL - EBI and the Sanger Institute, to produce and maintain automatic annotation on selected eukaryotic genomes ) dbEST (division of GenBank) GSDB (Genome Sequence DataBase, division of GenBank)

9 GenBank Once upon a time, GenBank sent out sequence updates on CD-ROM disks a few times per year. .

10

11

12 Specialised Genomic Resources
In addition to the comprehensive DNA sequence DBs, there is a variety of more specialised genomic resources. These so called boutique DBs bring focus to species-specific genomics and to particular sequencing techniques. Specialised Genomic Resources SGD – Saccharomyces Genome Database UniGene - gene-oriented clusters from GenBank TIGR - Databases of The Institute for Genomic Research ACeDB – A C.elegans DataBase

13 Protein Information Resources
The primary structure of a protein is its amino acid sequence The second structure of a protein corresponds to regions of local regularity (e.g., α-helices and β-strands). The tertiary structure of a protein arises from the packing of its secondary structure elements, which may form discrete domains within a fold. Levels of protein sequence and structural organisation: primary tertiary secondary

14 Primary Protein Databases
The primary structure of a protein is its amino acid sequence. These are stored in primary databases as linear alphabets that denote the constituent residues. Protein sequence Databases SWISS-PROT - Protein knowledgebase TrEMBL - Computer-annotated supplement to Swiss-Prot PIR – Protein Information Resource MIPS – Munich Information Centre for Protein Sequences NRL-3D - produced by PIR

15 Structure Classification DBs
Contain 3D structures available from crystallographic and spectroscopic studies Structure Classification Databases PDB – Protein Data Bank CATH – Class, Architecture, Topology, Homology SCOP – Structural Classification of Proteins

16 PDB: Growth (2006)

17 Databases concerning Mutations
dbSNP HGBASE (Human Genome Variation Database) The SNP Consortium (TSC)

18 Literature Databases PubMed http://www.ncbi.nlm.nih.gov/entrez/query
Bioinformatics Online Nature Science

19 Systems Biology Integrate different levels of
information to understand how biological systems function Use computational and mathematical models to analyze, model and simulate cellular networks, interactions and pathways.

20 Microarray DNA microarray is a new technology to measure the level of the mRNA gene products of a living cell.

21 Affymetrix GeneChip® Probe Arrays
Hybridized Probe Cell * * GeneChip Probe Array * * * * Single stranded, fluorescently labeled cRNA target Oligonucleotide probe 24~50µm 1.28cm Each probe cell or feature contains millions of copies of a specific oligonucleotide probe Image of Hybridized Probe Array BGT108_DukeUniv

22 Bioinformatics Tools Database & searching Computational algorithms
Alignment Similarity Clustering Pattern Searching Structure predictions Statistical methods Data visualization

23 Bioinformatics Bioinformatics is the research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data; Computational biology is the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.


Download ppt "Topics The topics: basic concepts of molecular biology more on Perl"

Similar presentations


Ads by Google