RBP1 Splicing Regulation in Drosophila Melanogaster 03-711 - Fall 2005 Jacob Joseph, Ahmet Bakan, Amina Abdulla This presentation available at

Slides:



Advertisements
Similar presentations
Gene Prediction: Similarity-Based Approaches
Advertisements

Hidden Markov Model in Biological Sequence Analysis – Part 2
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Ab initio gene prediction Genome 559, Winter 2011.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Profiles for Sequences
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Gene Prediction Methods G P S Raghava. Prokaryotic gene structure ORF (open reading frame) Start codon Stop codon TATA box ATGACAGATTACAGATTACAGATTACAGGATAG.
Mismatch string kernels for discriminative protein classification By Leslie. et.al Presented by Yan Wang.
Hidden Markov Models Sasha Tkachev and Ed Anderson Presenter: Sasha Tkachev.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Introduction to BioInformatics GCB/CIS535
Bio 465 Summary. Overview Conserved DNA Conserved DNA Drug Targets, TreeSAAP Drug Targets, TreeSAAP Next Generation Sequencing Next Generation Sequencing.
CSE182-L12 Gene Finding.
Comparative ab initio prediction of gene structures using pair HMMs
Protein Modules An Introduction to Bioinformatics.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
Lecture 12 Splicing and gene prediction in eukaryotes
Genome Annotation BCB 660 October 20, From Carson Holt.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Protein Tertiary Structure Prediction
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
RNAseq analyses -- methods
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Computational Identification of Drosophila microRNA Genes Journal Club 09/05/03 Jared Bischof.
PreDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Department.
Comparative genomics analysis of NtcA regulons in cyanobacteria: Regulation of nitrogen assimilation and its coupling to photosynthesis Wen-Ting Huang.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Protein and RNA Families
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work Exploring Alternative Splicing Features.
Motif discovery and Protein Databases Tutorial 5.
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
From Genomes to Genes Rui Alves.
Introduction to ab initio and evidence-based gene finding Wilson Leung08/2015.
Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Finding Patterns Gopalan Vivek Lee Teck Kwong Bernett.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
Computational Biology, Part C Family Pairwise Search and Cobbling Robert F. Murphy Copyright  2000, All rights reserved.
HISPIG – A Discriminative Model Refinement Approach with Iterations for Detecting Regulatory Regions Takuma Tsukahara
Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
Ab initio gene prediction
Strategies for annotation of a genome
Dr Tan Tin Wee Director Bioinformatics Centre
Identify D. melanogaster ortholog
Ortholog identification and summaries.
Nora Pierstorff Dept. of Genetics University of Cologne
Basic Local Alignment Search Tool
Presentation transcript:

RBP1 Splicing Regulation in Drosophila Melanogaster Fall 2005 Jacob Joseph, Ahmet Bakan, Amina Abdulla This presentation available at

Alternative Splicing in Dros.

RBP1 Regulation Involved in dsx splicing and Rbp1 auto-regulation Involved in dsx splicing and Rbp1 auto-regulation Suspected in many other related pathways Suspected in many other related pathways

Genome Data Sequence of all introns of known splice variants Sequence of all introns of known splice variants Two annotated genomes available Two annotated genomes available D. Melanogaster D. Melanogaster D. Pseudoobscura D. Pseudoobscura As the gene names for D. Mel. and D. Pseu. differ, a list of gene orthologs was also obtained As the gene names for D. Mel. and D. Pseu. differ, a list of gene orthologs was also obtained

Computational Approach Create profile HMM for each motif (B-B, B-A) Create profile HMM for each motif (B-B, B-A) Select the end of every intron (~50 bases) Select the end of every intron (~50 bases) Perform an HMM search for each intron segment, in both D. Mel. and D. Pseu. Perform an HMM search for each intron segment, in both D. Mel. and D. Pseu. Keep matches found in both species Keep matches found in both species Keep matches at the end of introns (~15 bases) Keep matches at the end of introns (~15 bases) Return alignment of both species Return alignment of both species Examine biological similarity of matches Examine biological similarity of matches

Data Summary

Hidden Markov Profile (HMM) and HMMer We needed an HMM profiler and search program. We needed an HMM profiler and search program. Revised version of what Krogh/Haussler model called Plan 7 Revised version of what Krogh/Haussler model called Plan 7 Not only global alignment Not only global alignment

HMMer Advantages Possible Alignments Possible Alignments Classic global alignment Classic global alignment Classic local alignment Classic local alignment Global Profile, Local Sequence alignment Global Profile, Local Sequence alignment Fully local “multihit” alignment. Ex: Fully local “multihit” alignment. Ex: Scoring Scoring Raw alignment score Raw alignment score E-value, showing the significance of the alignment E-value, showing the significance of the alignment

HMMer Create HMM for multiple alignment of each B-B and B-A motif Create HMM for multiple alignment of each B-B and B-A motif Genome is scanned for high scoring matches Genome is scanned for high scoring matches Only hits within a distance of 15 base pairs of the 3’ splice site are considered Only hits within a distance of 15 base pairs of the 3’ splice site are considered

Results: B-A Motif CG30271-RC-in_5 ( ), GA15740-in_5 ( ) score: -6 ctgttgaatcacttggaaagcaatcaGTCGACAATTGTTtacttttacag | |||||||||| ||||||||||||||||||||||||||||||||||| cctttgaatcactcggaaagcaatcaGTCGACAATTGTTtacttttacag CG30020-RA-in_3 ( ), GA15581-in_9 ( ) score: -8 ccgtcccagtgacttacaatacgaTTCTACTATTTTTtgtacgcttacag | | | | | ||||| |||| | | taaggctcttcatactttatcaaATCTACAATTTCTcaatgtaattgcag Klp3A-RA-in_3 ( ), GA21186-in_3 ( ) score: -9 ttgaagttcgaaaactcctgaaactaattgTTCCACAATTTTTttttatt | || || || ||| || ||||| | | tgttcaattcttaaataaaaccaatTTCGACTCTTTTTctcttctttcag na-RB-in_0 ( ), GA13546-in_2 ( ) score: -9 tctggtgcactgagagaaatgccatctacttcATCGATACTCTTTtgcag | | || | | || || | tgtaaacactcgttgcaaacacaaATTTACAATCAATttccatgttttat CG30428-RA-in_2 ( ), GA15840-in_1 ( ) score: -9 ggtaaggaagcgtaaaaataaattctttttttATCACCAATATTTttcag | || || ||||| |||| ||||| aaaatatcaagccgaaacaaatttATGTACAATTTTTtttttatggaaag CG2199-RB-in_0 ( ), GA15296-in_0 ( ) score: -10 ttgctactgccattataggtagtttaaaaactgttTTCTACACTCTTTct | | | | | || ||||| | | aacaaaaacaaaaatatggccctctgataattGGGGACACTTTATttcag

Results: B-B Motif ps-RD-in_4 ( ), GA20847-in_4 ( ) score: -11 catttaatatcttgaaaatatttaacataaATCTGATGCAAAtattccag | || | || |||||||||||||||||||||||||||||||| attactattcttaaaatatatttaacataaATCTGATGCAAAtattccag fru-RE-in_6 ( ), GA12896-in_5 ( ) score: -13 cccacccccacagtgatgacgcctaATATGAACCAAGcaaatgtttgcag | | | | | | ||| | || | | | | tgctaaataaaccaaattccaaaCTCTGATCAAAAaataccgataaaaag Ptp52F-RA-in_0 ( ), GA14851-in_14 ( ) score: -13 tactctttgaaaaataagcatatggatgtcactgataATATGATATTAAt | | | | || | ||| || || tctaaatcgtattcaaatcgaattgaaacataaATCGAATCCAAAaacag CG9455-RA-in_0 ( ), GA21800-in_0 ( ) score: -13 aatagtggctttgttttaataacaatgtaatATCTGATATTTAttctcag | | | | | ||||| | | | cagagcgtgccccgtctgatgatccgAACTGATCTGATgtttttcggtag CG8709-RA-in_2 ( ), GA21271-in_9 ( ) score: -13 acaaatcttaggaaataccaaagttgttctacgATCTTATCTATGgagtc | | | | | | || || | |||||| gccccatcagtgtcagtggcagctgaccccaccATTTGATCTATTtgcag CG7966-RA-in_0 ( ), GA20727-in_4 ( ) score: -13 tatatgtacacattgtactgcaaacacatgccctgaATCTTTGATAAAga | | ||| | | |||||| | |||| gtgttgaatgaaagaatacacttgaATCGGTTCTAAAttgcatcgcacag

Biomolecular Activity: B-A

Biomolecular Activity: B-B

Biomolecular activity analysis fru gene, regulated by the tra and tra2 genes is expressed at the same time as dsx gene helps validate our results. fru gene, regulated by the tra and tra2 genes is expressed at the same time as dsx gene helps validate our results. Expected presence of sxl and tra genes. Expected presence of sxl and tra genes. Functional Similarity: Functional Similarity: B-A motif: SNF4Agamma, rdgc, qtc. B-A motif: SNF4Agamma, rdgc, qtc. B-B motif: ps, ptp, CG9455. B-B motif: ps, ptp, CG9455.

Difficulties & Future Directions Support Vector Machines were applied Support Vector Machines were applied Lack of significant training data. Lack of significant training data. Lack of direct experimental data for cross- validation. Lack of direct experimental data for cross- validation. Since the current D. Pse. genome has far fewer intron sequences, reliance upon orthologs introduces many false negatives. Since the current D. Pse. genome has far fewer intron sequences, reliance upon orthologs introduces many false negatives.

Alternate Approach: Support Vector Machines (SVM) Used for data classification Used for data classification Creates hyperplanes that separate data into two classes with maximum-margin Creates hyperplanes that separate data into two classes with maximum-margin Appropriate for multidimensional classification problems Appropriate for multidimensional classification problems Examples Examples Article classification Article classification Protein classification Protein classification Critical points Critical points Feature selection Feature selection Training Training

HMM and SVM HMMer is used to generate features HMMer is used to generate features All genome searched for A and B consensus sequences All genome searched for A and B consensus sequences Search results for each intron combined to create features Search results for each intron combined to create features Features Features Scores of two motifs in the upstream (2) Scores of two motifs in the upstream (2) Distance of the motifs to the splice site (1) Distance of the motifs to the splice site (1) Length of consensus sequence overlap (1) Length of consensus sequence overlap (1) Length of motif (1) Length of motif (1) Does consensus sequence B precedes A (1) Does consensus sequence B precedes A (1) Number of features = 6 Number of features = 6

Summary Profile HMM used for modeling Profile HMM used for modeling Comparative analysis with the D.Pseu genome Comparative analysis with the D.Pseu genome High scoring alignments for both motifs further analyzed for biomolecular activity High scoring alignments for both motifs further analyzed for biomolecular activity The existence of the fru and other close matches help to validate our results The existence of the fru and other close matches help to validate our results