Presentation is loading. Please wait.

Presentation is loading. Please wait.

Algorithms in Bioinformatics: A Practical Introduction

Similar presentations


Presentation on theme: "Algorithms in Bioinformatics: A Practical Introduction"— Presentation transcript:

1 Algorithms in Bioinformatics: A Practical Introduction
Project: Motif finding using ChIP-seq peak data

2 Transcriptional Control (I)

3 Transcriptional Control (II)
TATAAT is the motif!

4 Motif model TTGACA TCGACA TTGAAA ATGACA GTGACA TTGACT TTGACC Consensus Pattern TTGACA Positional Weight Matrix (PWM) Motif can be described in two ways based on the binding sites discovered

5 ChIP experiment Chromatin immunoprecipitation experiment
Detect the interaction between protein (transcription factor) and DNA.

6 Peak data Peak data represents the locations where a particular TF binding. The data tells us the locations and intensities. (Note that due to experimental error, peaks of low intensity may be noise.) ChIP-seq data for Human (MCF7) E2 treatment at 45min chr1:883, ,485

7 Our aim Given the DNA sequences of those peaks, find motifs which occur in those peak regions. For the example below, we have two motifs: TTGACA and GCATC. Note that each instance has at most 1 mutation. GCACGCGGTATCGTTAGCTTGACAATGAAGAATCCCCCCGCTCGACAGT GCATACTTTGACACTGACTTCGCTTCTTTAATGTTTAATGAAACATGCG CCCTCTGGAAATTAGTGCGGCATCTCACAACCCGAGGAATGACCAAATG GTATTGAAAGTAAGGCAACGGTGATCCCCATGACACCAAAGATGCTAAG CAACGCTCAGGCAACGTTGACAGGTGACACGTTGACTGCGGCCTCCTGC GTCTCTTGACCGCTTAATCCTAAAGGCCTCCTATTAGTATCCGCAATGT GAACAGGAGCGCGAGCCATCAATTGAAGCGAAGTTGACACCTAATAACT

8 Input (I) From every peak, we get approximately +/-200 DNA sequence
>cmyc_1_chr1_ _ _range_chr1_ _ _intensity_20 CCTCCATACCAGCCCCAATGTTCTGCGTTCCCGAATGAAAGACACACAACACAGCCTTTATATTTTGATATGCCTAAAACTGCTCAATGGCTGGGCCACTTCCTAGCTAGTATCCACGTGGCTATCCCACCTCTCTCTGATATTCCCAAGTCATTACTTACTAAAATCTGTAATTACATCTTTGCTGCCCTAGGCCCAATCTGGCAGCCCTCCTGTGGCCCCTCAGGCTACTACATGGCAGCTAAGCTCTCTGACCCACATCTTCTCAGGCACCGTGCCTCCTCTTCTCCACCTTATTCAAACATGGTGGCTCTCCTTCCTCCTTCTTCCTGTCTGTCCCCAGCCTGGGAATTCTAAAAGTCCCACCTCTGTCTGCCCTGTTCAGCCATTGGCTGTCGGCATCTTTATTTACGAG >cmyc_2_chr1_ _ _range_chr1_ _ _intensity_15 GGTCATAAACCAAGCTTCTTCAAAGATTTTTGGCTTTTTGGCACCAGTGGCCTGCAGGGTGGCGAGCTCTGCCAGTTTGAAGTGACCAAGTTAAGTGGCCTGGGAAAGGCCATTTGGTGCGCGGTCCAGCAGTTTTGGGCGCTCTCGGCTTCCGCCCTCAGCTGCGGTCACGTGCGGCTGCTCACGTGCCAGACGCTGCTGTCACTTCGTAGCTGTTCCGGCTTCCTCTGAGTGAGGCTCGCAACGTCTCCCACGGAGTCGCCTTCGTTCTGCTCTGGGTCTCCCGTGGCCACTGAGACCTCGGAGCTCGACCGGCGCCTGCCCGCCCGTGCGGCCCTCACTCCCCGAGGCTATCCAGGTGAGGCCGCCTGGGGTCCCTCCCCGGCTCCGGAGAGCCGACTGGTTTCCCTGCCG >cmyc_3_chr1_ _ _range_chr1_ _ _intensity_36 GTAGTCCCAACCAGGTCCTGAGCTGGTTAGCCAACCCTCAGCGCCAGTCGGGCCAACATCCGGTGACGAATCCAAGTCCCGCCTCTAAGCCCATCTGCTGTCCAATGCCGCCCTCTGCCGGTCTTTACCTCCCCGCCTAGCTGTGAGCCGCTTCCAGACAACCCGGAAGTGATCTTTCCTCTTCCGGATTACGGGTCCGGACGTCCGCACGTGGTTGCCGGTTTAGGGTGCTGCTGTAGTGGCGATACGTCCCGCCGCTGTCCCGAAGTGAGGGATCCGAGCCGCAGCGAGAGCCATGGAGGGCCAGCGCGTGGAGGAGCTGCTGGCCAAGGCAGAGCAGGAGGAGGCGGAGAAGCTGCAGCGCATCACGGTGCACAAGGAGCTGGAGCTGGAGTTCGACCTGGGCAACC ……………

9 Input (II) A set of sequences which are likely containing no motif.
AACAAGGGAAAGAGTAGTGAGTGCTTCTTTCTATTCAGAGGGAGGGGAAGTTGCTGTTAGCTAAGACAGTCAGGACTGAGAAGGGGGGGGGGGGTTTAACTCTCCTGGAGGGAGCTGAGAGGTAAAGGGAGGGGCGTGAGGTAGAACAAGCCGAGAACACAGGGCAGGTTGGTCTGACTCCAGAGCACAGTGCAGGAGCCCGGAAGTTGACTCAGTTCAGTTAGCAAGTATTTTCACACAAGGCGTGAACACTGAAGACAAAAGCAAGAGACACAGCTCTATCTCTAAGAAGATTTTCAGAGCCAAGATCGATGGGGCACACCTGTTAATCCCAGCACTTAGGAGGCTGAGGCAGGAGGATCCCAAGTTCAAAACCAGCCTGGACTTGTTTTAAGGAAAA >SEQ_2 AAAAAAAAAAAAAAGACTTCCAGTTTAATAAATGACCAATTCAGGAATGGAGATTAGGGCTGGATGACAAGTTTTTAATTGTCAAGGACTCAATTCTGTTTATCAGTTGGTATGGAATTATGTAAGCTTTTAGCGATATGACCGCACGGAGCAGTGTAGAGAGTGATCTGAGAGACGCTTGGGGGTCAGGATGGAGATAGAACTCCCTCTCTATTAGAAGGTGTTTGGTGGTAGGTAACCCTGGGCTAGCATGGTGGGTCTCTTCTTACTTAGGCTTCCATCTTTGTGGTTCAAATCCAAGAAGGACCTGCGTTCCCTCCCTCCTTGTGATCAGCTGATTGCTAGAGCATAACTCATCTTAACTTCTCATGTACTCTCCGGGTACAGGAAGGGAGGGGGC >SEQ_3 CCACTGCTGACAGTGGAGCATGAAACGACCGGCTTCCTGACTATGTTGGTACCCTTTCAGGAGCCTAAAACAGTGCTTTCAATACTTGTGTCTATGTCTGTTAGCCACAACTTTCTAGTTTCCCAGAGAGATTTTGAAGTGTAGTTTTGTATTTGCTCAAATATATATTCATATGGTGAGGTGCACATTTTTTATATTATATTTTTATTCATTTATTTTTGGTGCTTGGGAATTATACTCTAGGAATAAAGCGCCTGGTAGAAAGTGGCACACATCTTTAATCCCAGCACTCAGGAAGCAGAGGCAGACAAATCTCTGCGTTCCAGGACAGCCTGGTCTATAGAGCAAGGTCCAAGCCAGCCAGGTTTACACAAAGAAACCTAGTGTGGAAAAGACAAAA ……………

10 Output You need to output a list of candidate (ranked) motifs.
You can model the motif as PWM or consensus sequence. If you model the motif as a PWM, one of the answer for the previous dataset is You may also return other significant motifs.

11 Aim of the project Given a sample file and a background file,
you need to implement a method which output a list of motifs. You need to take advantage of the fact that this is a ChIP-seq dataset Hint: Read papers on ChIP-seq and understand its properties.


Download ppt "Algorithms in Bioinformatics: A Practical Introduction"

Similar presentations


Ads by Google