Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.

Slides:



Advertisements
Similar presentations
BIOINFORMATICS GENE DISCOVERY BIOINFORMATICS AND GENE DISCOVERY Iosif Vaisman 1998 UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL Bioinformatics Tutorials.
Advertisements

An Introduction to Bioinformatics Finding genes in prokaryotes.
Genomics and Gene Recognition CIS 667 April 27, 2004.
Celera Assembler Arthur L. Delcher Senior Research Scientist CBCB University of Maryland.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Genome analysis and annotation. Genome Annotation Which sequences code for proteins and structural RNAs ? What is the function of the predicted gene products.
RNA and Protein Synthesis
E. coli Genome PROKARYOTES Typically, - >10 6 bp - Sequence without gaps ANIMALS Typically, >10 9 bp - Sequence with many gaps - 95+% covered.
Gene Prediction Methods G P S Raghava. Prokaryotic gene structure ORF (open reading frame) Start codon Stop codon TATA box ATGACAGATTACAGATTACAGATTACAGGATAG.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
DNA Sequencing and Gene Analysis
Bacterial Physiology (Micr430)
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Lecture 12 Splicing and gene prediction in eukaryotes
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Gene Structure and Identification
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
Fine Structure and Analysis of Eukaryotic Genes
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
BME 110L / BIOL 181L Computational Biology Tools February 19: In-class exercise: a phylogenetic tree for that.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Genomics: Gene prediction and Annotations Kishor K. Shende Information Officer Bioinformatics Center, Barkatullah University Bhopal.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
Gene finding and gene structure prediction M. Fatih BÜYÜKAKÇALI Computational Bioinformatics 2012.
Organizing information in the post-genomic era The rise of bioinformatics.
Chapter 21 Eukaryotic Genome Sequences
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Genome Annotation Rosana O. Babu.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
From Genomes to Genes Rui Alves.
Genes and How They Work Chapter The Nature of Genes information flows in one direction: DNA (gene)RNAprotein TranscriptionTranslation.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Human Genome.
E. coli Genome PROKARYOTES Typically, - >10 6 bp - Sequence without gaps ANIMALS Typically, >10 9 bp - Sequence with many gaps - 95+% covered.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
How can we find genes? Search for them Look them up.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
DNA in the Cell Stored in Number of Chromosomes (24 in Human Genome) Tightly coiled threads of DNA and Associated Proteins: Chromatin 3 billion bp in Human.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
DNA Technology and Genomics
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
(H)MMs in gene prediction and similarity searches.
Finding genes in the genome
CFE Higher Biology DNA and the Genome Transcription.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
DNA Technology & Genomics CHAPTER 20. Restriction Enzymes enzymes that cut DNA at specific locations (restriction sites) yielding restriction fragments.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
The Transcriptional Landscape of the Mammalian Genome
Human Genome Project.
Genes, Genomes, and Genomics
Relationship between Genotype and Phenotype
DNA Sequencing The DNA from the genome is chopped into bits- whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping.
Recitation 7 2/4/09 PSSMs+Gene finding
Genomes and Their Evolution
Introduction to Bioinformatics II
Genome Annotation and the Human Genome
Relationship between Genotype and Phenotype
Presentation transcript:

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes find the genes! For Bioinformatics, Start with:

The Human Genome E. coli Genome

SHEAR Shotgun DNA Sequencing of whole genome (WGS) DNA target sample LIGATE & CLONE Vector ReadsSEQUENCE Primer Reading:

Reading to Assembly:

The Human Genome E. coli Genome 50% of genome is repeat sequences! Assembly: The challenge of eukaryotic genomes 4 million bp 3 billion bp

Assembly of sequence of each chromosome from end to end END, Jan 14 begin

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence Whole genome shotgun OR Ordered clones find the genes ! Annotation: Robotically do dideoxy-dye data collection

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence find the genes ! Annotation: 1.ab initio 2.by evidence 10/1/5

ORFs are MOST of prokaryotic genome Annotation: For Bacterial genomes, ab initio is adequate ab initio: “from the beginning” יש מאין from first principles…

-85-88% of the nucleotides are associated with coding sequence in the bacterial genomes that have been completely sequenced. example: in Escherichia coli there are 4288 genes that have an average of 950 bp of coding sequence and are separated by an average of just 118 bp. So first, to find genes in prokaryotic DNA, search for ORFs!! ab initio – finding ORFs Annotation:

-85-88% of the nucleotides are associated with coding sequence in the bacterial genomes that have been completely sequenced. example: in Escherichia coli there are 4288 genes that have an average of 950 bp of coding sequence and are separated by an average of just 118 bp. So first, to find genes in prokaryotic DNA, search for ORFs!! ab initio – finding ORFs Annotation:

-Prokaryotes have short, simple promoters that are easy to recognize -Transcriptional terminators often consist of short inverted repeats followed by a run of Ts. -Therefore, programs that find prokaryotic genes search for: ORFs 60 or more codons long –and codon usage promoters at the 5' end Terminators at the 3' end Homology to known genes from other prokaryotes Shine-Dalgarno sequences ` ab initio – beyond ORFs Annotation: beyond ORFs:

Prokaryotic gene finder examples Glimmer- Interpolated Markov Model method GrailII- Neural Network method (See BioInfo text – Fig 8.8) ab initio – automated Annotation:

results Annotation:

Multicellular eukaryotes Done too 10/1/5

Multicellular eukaryotes Annotation: Done too 10/1/5

Multicellular eukaryotes Annotation: Done too 10/1/5

2 ways to annotate eukaryotic genomes: -ab initio gene finders: Work on basic biological principles: Open reading frames Codon usage Consensus splice sites Met start codons ….. -Genes based on previous knowledge….EVIDENCE -cDNA sequence of the gene’s message -cDNA of a closely related gene’ message sequence -Protein sequence of the known gene Same gene’s Same gene’s from another species Related gene’s protein……. -ab initio gene finders: Work on basic biological principles: Open reading frames Codon usage Consensus splice sites Met start codons ….. Annotation: Genes based on previous knowledge-EVIDENCE -cDNA sequence of the gene’s message -cDNA of a related gene’s message seq. -Protein sequence of the known gene Same gene’s Same gene’s from another species Related gene’s protein…….

Homology based exon predictions Consensus gene structure (both strands) start and stop site predictions Splice site predictions computational exon predictions Tracking information Unique identifiers

Automatically generated annotation

A zebrafish hit shows a gene model protein encoded by a 6 exon gene. This gene structure (intron/exon) is seen in other species, as is the protein size. The proteins, if corresponding to MSP in S. gal., must be heavily glycosylated (likely). At least some have a signal peptide.

The zebrafish hit can be viewed at higher resolution, and…

The zebrafish hit can be viewed down to nucleotide resolution

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes, 700 bp each read, MAX

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes find the genes!

cDNAs & ESTs: Expressed Sequence Tags RNA target sample End Reads (Mates) SEQUENCE Primer cDNA Library Each cDNA provides sequence from the two ends – two ESTs Annotation:

Who Gets Sequenced? Models Pathogens Agriculturals

Array analysis: see animation from Griffiths

Protein Structure Database See Swiss-pdb viewer

RNA for ALL C. elegans genes

RNAi for every C. elegans gene too! -results on the web Projects to systematically Knock-out (or pseudo-knockout) every gene, in order to establish phenotype of each gene -> function of each gene