Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Detection of nTARs in the mouse intestinal transcriptome BMC Genomics, 2011.

Similar presentations


Presentation on theme: "1 Detection of nTARs in the mouse intestinal transcriptome BMC Genomics, 2011."— Presentation transcript:

1 1 Detection of nTARs in the mouse intestinal transcriptome BMC Genomics, 2011

2 2 nTARs - workflow (novel transcriptional active region) covered regions known exon? mapping (GEM) linked to known regions/other nTars quality check check for splice sites discard hit UTR? reading frame (RF) + start codon ? yes no bad discard hit yes no OK no discard hit reading frame (RF) ? yes no yes new exon? new gene? check neighborhood

3 3 nTARs – raw output position quality neighborhood/ overlap connections chromosome start end chromosome start end avg. mapping quality avg. base quality avg. mapping quality avg. base quality closest genes + distance overlapping known regions (i.e. introns) closest genes + distance overlapping known regions (i.e. introns) gene isoform region type #supporting pairs #supporting (splitted) reads gene isoform region type #supporting pairs #supporting (splitted) reads longest RF (both strands) earliest start codon position (both RFs) sequence around start- and end point of splitted reads (possible splice site) longest RF (both strands) earliest start codon position (both RFs) sequence around start- and end point of splitted reads (possible splice site) other

4 4 nTars – example (NA06984.1.M_111124_4.bam) RP3-395M20.7 RP3-395M20.9 nTar length = 528bp, 120 aa RF at the end nTar avg. base quality = 32.67 nTar avg. mapping quality = 195.61 link between nTar and RP3-395M20.9 supported by 15 splitted reads / 21 pairs CAG|GTGGGGCAG|G short reading frames no link to known Tar avg. base quality < 19 for one part (dashed line) => discard hits

5 5 nTar – outlook known exon identification of exact nTar borders (RF, splitted reads,...) SNP dependent transcriptional regions => „eQTL“ for nTARs (1000 Genomes SNPs) how many nTars overlap with known exons? find SNP-specific nTars on population seq level Potential synergy with splice analysis part of main paper.... nTar SNP coverage genome position

6 6 Subtle splice events Hiller et al. 2006 Permutations of the topic: include GYN(N) n GYN and NAG(N) n NAG with n in our analysis <14

7 7 read Q-filter VCF (1000 Genome SNVs) detection of SNVs in proximity of exon ends (i.e. 30 bp) (remember genotypes) detection of SNVs in proximity of exon ends (i.e. 30 bp) (remember genotypes) list of target ‘splice-sites‘ detection of split-reads overlapping ‘splice-sites’ (each split-read-pattern = potential isoform) detection of split-reads overlapping ‘splice-sites’ (each split-read-pattern = potential isoform) generate ‘normalized’ split-read count for all three possible genotypes: ref/refref/altalt/alt generate ‘normalized’ split-read count for all three possible genotypes: ref/refref/altalt/alt attributevalue QV (uniqueness)>=150 edit distance<=6 FailedQCFALSE MappedTRUE mate-reference IDsame as current reference ID BAM total number of ‘passed filter‘ reads Fisher’s exact test for all isoform combinations and genotypes identification of ‘subtle splice’-affecting SNVs

8 8 combinati on binary string 00000 10001 20010 30011 40100 50101 60110 70111 combinationbinary string 00000 10001 20010 30011 40100 50101 60110 70111 81000 91001 101010 111011 121100 131101 141110 151111 Comparison of every isoform with all possible combinations of the other isoforms binary string 0000 isoform 1isoform 4 isoform 2isoform 3 group 1 2 (N-1) -1 combinations (= 8-1 = 7) group 2 2 N -2 combinations (= 16-2 = 14) example: N = 4 isoforms ignore 0000 (all samples in group 2) 1111 (all samples in group 1)

9 9 00010010001101000101 0110 0111 00000 10001 20010 30011 40100 50101 60110 70111 81000 91001 101010 111011 121100 131101 141110 151111 00010010001101000101 0110 0111 00000 10001 20010 30011 40100 50101 60110 70111 81000 91001 101010 111011 121100 131101 141110 151111 = 25 unique combinations for 4 isoforms possible combinations for N=4 isoforms black shaded = duplicated combination

10 10 1000 Genome SNVs Consequences of SNVs on tandem splicing …NAGCGAG CTCGATGTGTGATT… …NAG CGAXCTCGATGTGAGATT… destruction of ‘usually’ realized AG generation of novel AG prediction phase Do these SNVs actually generate new (or remove) isoforms in individuals carrying the variant? …NAG CGAGCTCGATGTGTGATT…

11 11 …NAGGYN… CGGAGCT splice-acceptorsplice-donor …NAGGYN… CGGAGGT AAAAAAAAACCCCCCCCCGGGGGGGGGTTTTTTTTTKAGSGAGSTAAAAAAAAACCCCCCCCCGGGGGGGGGTTTTTTTTTKAGSGAGST important nucleotide for isoform Population genetics


Download ppt "1 Detection of nTARs in the mouse intestinal transcriptome BMC Genomics, 2011."

Similar presentations


Ads by Google