Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.

Slides:



Advertisements
Similar presentations
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Work presentation Gaurav Moghe Feb 4 th, 2008 – March 17 th, 2008.
Annotating a Scarlet Runner Bean genome fragment put together by shotgun sequencing Scarlet Runner ean Max Bachour.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Alignment of mRNAs to genomic DNA Sequence Martin Berglund Khanh Huy Bui Md. Asaduzzaman Jean-Luc Leblond.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
How to access genomic information using Ensembl August 2005.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
Eukaryotic Gene Finding
Genome Annotation BCB 660 October 20, From Carson Holt.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Blast 1. Blast 2 Low Complexity masking >GDB1_WHEAT MKTFLVFALIAVVATSAIAQMETSCISGLERPWQQQPLPPQQSFSQQPPFSQQQQQPLPQ QPSFSQQQPPFSQQQPILSQQPPFSQQQQPVLPQQSPFSQQQQLVLPPQQQQQQLVQQQI.
GeneWise and Artemis Exercises Spliced Alignment using GeneWise Click on the GeneWise hyperlink on the course links page,
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Part I: Identifying sequences with … Speaker : S. Gaj Date
RNA Sequencing I: De novo RNAseq
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
1 GMOD Meeting, Spring 2005 Peili Zhang, FlyBase - Harvard Comparative Genome Annotation of Drosophila pseudoobscura and Its Implementation in chado.
K Phone: Web: A Software Package for the Design and Analysis of Microbial Functional.
Genome Annotation Rosana O. Babu.
INTRODUCTION ● Expressed sequence tags offer a low cost approach to gene discovery ● For a range of non-model organisms, ESTs represent the only sequence.
Sackler Medical School
 Read quality  Adaptor trimming  Read sequence collapse Preprocessing Genome mapping  Map read to the spruce genome (Pabies1.0- genome.fa) using Patman
Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Wilson Leung08/2015.
Plant Biology Division Post-process of IMGAG M.t. 2.0 Release Affymetrix Medicago Probe set – IMGAG 2.0 / MTGI 8.0 Mapping Zhao Bioinformatics Lab.
Gene discovery using combined signals from genome sequence and natural selection Michael Brent Washington University The mouse genome analysis group.
SRB Genome Assembly and Analysis From 454 Sequences HC70AL S Brandon Le & Min Chen.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Indexing genomic sequences 逢甲大學 資訊工程系 許芳榮. Outline Introduction Unique markers Multi-layer unique markers Locating SNP on genome Aligning EST to genome.
 Series of enzyme catalyzed reactions  Glycolysis - citrate cycle – oxidative phosphorylation  Sugar -> energy.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Web Databases for Drosophila
What is BLAST? Basic BLAST search What is BLAST?
bacteria and eukaryotes
Annotating The data.
The Transcriptional Landscape of the Mammalian Genome
Basics of BLAST Basic BLAST Search - What is BLAST?
Experimental Verification Department of Genetic Medicine
GEP Annotation Workflow
From: TopHat: discovering splice junctions with RNA-Seq
Gene Annotation with DNA Subway
Cis-regulatory evolution of duplicate genes in yeasts
Sequence alignment, Part 2
What do you with a whole genome sequence?
Rotation review Gaurav Moghe Genetics Program
Practice Clone 3 Download and get ready!.
Basic Local Alignment Search Tool (BLAST)
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Introduction to Alternative Splicing and my research report
Basic Local Alignment Search Tool
Presentation transcript:

Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008

Source: Nature (Commentary on ENCODE

Starting databases Putative Unique Transcripts (PUTs) Expressed Sequence Tags (ESTs)

42% of the total EST sequences in GenBank assembled into PUTs 82% of the ESTs can be mapped to a unique genomic region vs 72% of the PUTs PercentileNo. of ESTs/PUT ESTs vs PUTs

Download PUT sequences Map them to the genome using GMAP Map to protein-coding regions Map to AT RNA genes Yes? Map to other AT features No? BLASTn against all known CDS sequences + GeneWise to confirm alignment on translated CDS sequences BLASTx against all known proteins to verify absence of any protein in the sequences Coding Index to double-verify absence of protein-like seq BLASTn against Repetitive Sequence Database No match? ~324, ,

Download PUT sequences Map them to the genome using GMAP Map to protein-coding regions Map to AT RNA genes Yes? Map to other AT features No? BLASTn against all known CDS sequences + GeneWise to confirm alignment on translated CDS sequences BLASTx against all known proteins to verify absence of any protein in the sequences Coding Index to double-verify absence of protein-like seq BLASTn against Repetitive Sequence Database No match? ~324, ,

Issues PUT sequences of not very good quality Use sequence of the region on the genome where these PUTs map Use EST sequences? BLAST against database does not give all hits BLAST against a different database, of a different size. PUTs extremely close to genes may be part of extended UTR regions Remove ridiculously close ones. Check directions of other PUTs.

What if… A sequence passes through all filters… but still is a protein sequence?

Issues Most of these PUTs do not show conservation Does that mean they are non-functional? Most of these PUTs do not seem to have a secondary structure like RNA Does that mean they are not RNA genes?

Plans for the next month Get the final list of novel PUTs Assign them directionality and estimate assembly error rates using EST mapping Conservation Secondary structure