1 Q1-Q3 results Roderic Guigó' s lab April 11 th 2007 conference call.

Slides:



Advertisements
Similar presentations
1 Q1-Q3 results. 2 RF lengths 3 Filtered RF length distribution.
Advertisements

EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
Why NCBI Tools are important for breeding plants studies genetically modified organism: the impossibility of intergenic crosses caused by the genetic incompatibility.
Breakdown of 244 total (Yale+Vega) Pseudogenes Amongst Various ENCODE Regions 211 Yale, 178 Vega, Union is 244 More pseudogenes in the manually picked.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
1 Detection of nTARs in the mouse intestinal transcriptome BMC Genomics, 2011.
DNA fingerprinting Every human carries a unique set of genes (except twins!) The order of the base pairs in the sequence of every human varies In a single.
Analysis of ChIP-Seq Data
HIV Project -Matt Hagen. The Problem Are there any DNA sequences in common between HIV and human genomes? HIV-1, complete genome, chimeric clone AF HIV-1,
RNA-Seq based discovery and reconstruction of unannotated transcripts in partially annotated genomes 3 Serghei Mangul*, Adrian Caciula*, Ion.
BME 130 – Genomes Lecture 14 Eukaryotic Genome Anatomy.
UCSC Known Genes Version 3 Take 9. Known Gene History Initially based on Genie predictions constrained by BLAT mRNA alignments. –David Kulp got busy at.
BME 130 – Genomes Lecture 7 Genome Annotation I – Gene finding & function predictions.
CSE182-L10 Gene Finding.
CSE182-L12 Gene Finding.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.
Characterizing Alternative Splicing With Respect To Protein Domains BME 220 Project Charlie Vaske.
MiRNA targets Using undergraduate molecular biology labs to discover targets of miRNAs in humans Adam Idica, Jordan Thompson, Irene Munk Pedersen, Pavan.
Lecture 12 Splicing and gene prediction in eukaryotes
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
Software for Robust Transcript Discovery and Quantification from RNA-Seq Ion Mandoiu, Alex Zelikovsky, Serghei Mangul.
Target: To know number bonds to 10. Learning your number bonds to 10 helps you with your addition + and subtraction -
Microarray Preprocessing
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Gene Structure and Identification
Page 1 Mouse Genome CGH Microarray 44A. Page 2 Mouse Genome CGH Microarray Kit 44A Designed for CGH, Validated with samples of known aberrations Designed.
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
RNAseq analyses -- methods
LOC_Os02g08480 Supplementary Figure S1. Exons shorter than a read length have few or no reads aligned. The gene at LOC_Os02g08040 contains exons shorter.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Chapter 21 Eukaryotic Genome Sequences
A markovian approach for the analysis of the gene structure C. MelodeLima 1, L. Guéguen 1, C. Gautier 1 and D. Piau 2 1 Biométrie et Biologie Evolutive.
Wfleabase.org/docs/tilexseq0904.pdf What is all this genome expression? Observations and statistics for expression at the base level April 2009Don Gilbert.
Sackler Medical School
The Havana-Gencode annotation GENCODE CONSORTIUM.
Mark D. Adams Dept. of Genetics 9/10/04
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Day 5-2 What bioinformatics.
Gene prediction roderic guigó i serra IMIM/UPF/CRG.
.1Sources of DNA and Sequencing Methods.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 2 Genome Assembly.
Multiplication Facts Table of Contents 0’s 1’s 2’s 3’s 4’s 5’s 6’s 7’s 8’s 9’s 10’s.
Curve Fitting Pertemuan 10 Matakuliah: S0262-Analisis Numerik Tahun: 2010.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Multiplication Find the missing value x __ = 32.
EGASP 2005 Evaluation Protocol
MBD-Chip.
Target: To know number bonds to 10.
EGASP 2005 Evaluation Protocol
Detect alternative splicing
Interpretation Next Generation Sequencing (Bench Clinic)
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
5' breakpoint in intron 2 (chr19:1,219,187-1,219,238 shown)
DNase‐HS sites are main independent determinants of DNA replication timing Simulations based on genome sequence features (GC content, CpG islands), or.
Ab initio gene prediction
From: TopHat: discovering splice junctions with RNA-Seq
Volume 116, Issue 4, Pages (February 2004)
Supplementary Figure 4. Comparisons of MethyLight and gene expression data. PMR values (X-axis) were plotted against log2 gene expression values (Y-axis)
closing in on the set of human genes. The ENCODE project.
The transcript profiles in the three human cell lines based on RNA sequencing (RNA‐seq). The transcript profiles in the three human cell lines based on.
Joseph Rodriguez, Jerome S. Menet, Michael Rosbash  Molecular Cell 
Volume 128, Issue 6, Pages (March 2007)
.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 3 Gene Prediction and Annotation 4 Genome Structure 5 Genome.
The Toy Exon Finder.
Presentation transcript:

1 Q1-Q3 results Roderic Guigó' s lab April 11 th 2007 conference call

Pool-unspecific RACEfrags in Q3

3 Pool unspecific unique RF (USPP-filtered) Most pool unspecific unique RF are: Q1: internal exonic (72%) Q2: internal exonic (87%) Q3: external (91%) (of which 63% are exonic) 20 unique RF are in more than 4 pools

4 Q3 Q1 Q2 Q1-Q3: Number of pools a unique RF appears in (unfiltered/filtered)

5 Pool-unspecific RFs in Q3 Possibly due to cross-hybridization? is there a correlation between number of pools a RF is found in and the number of non-unique probes it overlaps? no by the way 135,380 / 2,191,331 (6%) of probes from chr21/22 chip have multiple perfect matches in genome

6 Pool-unspecific RFs in Q3 Possibly due to high GC content? -> Answer: NO!

7 Pool-unspecific RFs in Q3 Possibly due to mis-priming on unknown transcripts of chr21 or chr22 (missed by the simulator)? 4 - genuine chimeric transcripts? 5 - Pooling errors the same gene is present in >1 pool because it has 2 different identifiers (UCSC known genes / RefSeq nomenclature discrepancy we found a few cases like this, not sure yet how widespread it is (systematic survey to come)

8 RF position when compared to genes and exons

9 Q1-Q2-Q3: Projected filtered RF distribution (internal=overlap target gene ; projection done by pool) 39% internal 47% exonic 53% intronic 61% external 71% genic 79% exonic 22% overlap most 5' ex. of tr. 21% intronic 29% intergenic 86% internal 88% exonic 12% intronic 14% external 78% genic 88% exonic 47% overlap most 5' ex. of tr. 12% intronic 22% intergenic 21% internal 47% exonic 53% intronic 79% external 78% genic 69% exonic 23% overlap most 5' ex. of tr. 31% intronic 22% intergenic Q1Q3Q2  chimeric transcripts?

10 Why are Q3 RF mostly external (79%) ? Existence of a systematic swap between certain pairs of pools? For each RF we have computed the overlap with all genes of Q3 and then compared: RF pool with RF overlapping gene pool

11 RF overlapping gene pool

12 Q3 RF compared to Q3 genes  Q3 RF are more overlapping genes of their pool than genes of other pools (no clear pool swap)

13 RF lengths

14 Filtered RF length distribution

15 Q1 filtered RF length distribution

16 Q2 filtered RF length distribution

17 Q3 filtered RF length distribution