Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 3 Finding Motifs Aleppo University Faculty of technical engineering.

Similar presentations


Presentation on theme: "Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 3 Finding Motifs Aleppo University Faculty of technical engineering."— Presentation transcript:

1 Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 3 Finding Motifs Aleppo University Faculty of technical engineering Department of Biotechnology

2 Main Lines Definition Definition Motif types Motif types Motifs problem Motifs problem Motifs: Profiles and Consensus Motifs: Profiles and Consensus Motif Logo Motif Logo Motif Search in Local Database Motif Search in Local Database

3 Definition A motif is a short conserved sequence pattern associated with distinct functions of a protein or DNA. A motif is a short conserved sequence pattern associated with distinct functions of a protein or DNA.

4 Motif Types 1.Regulatory sequences

5 Combinatorial Gene Regulation A microarray experiment showed that when gene X is knocked out, 20 other genes are not expressed – How can one gene have such drastic effects? Combinatorial Gene Regulation

6 Gene X encodes regulatory protein, a.k.a. a transcription factor (TF) The 20 unexpressed genes rely on gene X’s TF to induce transcription A single TF may regulate multiple genes Regulatory Protein

7 Every gene contains a regulatory region (RR) typically stretching 100-1000 bp upstream of the transcriptional start site Every gene contains a regulatory region (RR) typically stretching 100-1000 bp upstream of the transcriptional start site Located within the RR are the Transcription Factor Binding Sites (TFBS), also known as motifs, specific for a given transcription factor Located within the RR are the Transcription Factor Binding Sites (TFBS), also known as motifs, specific for a given transcription factor TFs influence gene expression by binding to a specific location in the respective gene’s regulatory region - TFBS TFs influence gene expression by binding to a specific location in the respective gene’s regulatory region - TFBS Regulatory Regions

8 A TFBS can be located anywhere within the Regulatory Region. A TFBS can be located anywhere within the Regulatory Region. TFBS may vary slightly across different regulatory regions since non-essential bases could mutate TFBS may vary slightly across different regulatory regions since non-essential bases could mutate Transcription Factor Binding Sites

9 gene ATCCCG gene TTCCGG gene ATCCCG gene ATGCCG gene ATGCCC Motifs and Transcriptional Start Sites

10 TTGACA -35 hexamer spacer TATAAT -10 hexamer Transcription start site interval 15 - 19 bases 5 - 9 bases -35-10 A weight matrix contains more information A T G C 1 23456 A T G C 1 23456 Based on ~450 known promoters 0.1 0.1 0.1 0.5 0.2 0.5 0.7 0.7 0.2 0.2 0.2 0.2 0.1 0.1 0.5 0.1 0.1 0.2 0.1 0.1 0.2 0.2 0.5 0.1 0.1 0.7 0.2 0.6 0.5 0.1 0.7 0.1 0.5 0.2 0.2 0.8 0.1 0.1 0.1 0.1 0.1 0.0 0.1 0.1 0.2 0.1 0.1 0.1 Consensus considerations

11 GAL4 in Yeast – Activator of galactose- induced genes (convert galactose to glucose) – Protein structure determines motif DNA-protein interactions require certain bases at specified locations Motif reflects homodimer structure Example

12 Motif Types 2.Motifs in protein structure

13 Importance Functional relationships between proteins cannot be distinguished through simple BLAST or FASTA database. Functional relationships between proteins cannot be distinguished through simple BLAST or FASTA database. Proteins often perform multiple functions that cannot be fully described using a single annotation. Proteins often perform multiple functions that cannot be fully described using a single annotation. To resolve these issues, identification of the motifs and domains becomes very useful. To resolve these issues, identification of the motifs and domains becomes very useful.

14 atgaccgggatactgataccgtatttggcctaggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatactgggcataaggtaca tgagtatccctgggatgacttttgggaacactatagtgctctcccgatttttgaatatgtaggatcattcgccagggtccga gctgagaattggatgaccttgtaagtgttttccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatggcccacttagtccacttatag gtcaatcatgttcttgtgaatggatttttaactgagggcatagaccgcttggcgcacccaaattcagtgtgggcgagcgcaa cggttttggcccttgttagaggcccccgtactgatggaaactttcaattatgagagagctaatctatcgcgtgcgtgttcat aacttgagttggtttcgaaaatgctctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatttcaacgtatgccgaaccgaaagggaag ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttctgggtactgatagca Random Sample

15 Implanting Motif AAAAAAAGGGGGGG atgaccgggatactgatAAAAAAAAGGGGGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaataAAAAAAAAGGGGGGGa tgagtatccctgggatgacttAAAAAAAAGGGGGGGtgctctcccgatttttgaatatgtaggatcattcgccagggtccga gctgagaattggatgAAAAAAAAGGGGGGGtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatAAAAAAAAGGGGGGGcttatag gtcaatcatgttcttgtgaatggatttAAAAAAAAGGGGGGGgaccgcttggcgcacccaaattcagtgtgggcgagcgcaa cggttttggcccttgttagaggcccccgtAAAAAAAAGGGGGGGcaattatgagagagctaatctatcgcgtgcgtgttcat aacttgagttAAAAAAAAGGGGGGGctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatAAAAAAAAGGGGGGGaccgaaagggaag ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttAAAAAAAAGGGGGGGa

16 Hard to identify – Relatively short sequences (as small as 6 bases) – Many positions not well conserved Factors improving identification – Usually localized in certain proximity of a gene (search within 3 kb upstream) – Some positions highly conserved – Use other data (Microarray?) The Challenge

17 Find a motif in a sample of: Find a motif in a sample of: 20 “random” sequences (e.g. 600 nt long) 20 “random” sequences (e.g. 600 nt long) each sequence containing an implanted pattern of length 15. each sequence containing an implanted pattern of length 15. each pattern appearing with 4 mismatches as (15,4) motif. each pattern appearing with 4 mismatches as (15,4) motif. Challenge Problem

18 atgaccgggatactgatagaagaaaggttgggggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatacaataaaacggcggga tgagtatccctgggatgacttaaaataatggagtggtgctctcccgatttttgaatatgtaggatcattcgccagggtccga gctgagaattggatgcaaaaaaagggattgtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatataataaaggaagggcttatag gtcaatcatgttcttgtgaatggatttaacaataagggctgggaccgcttggcgcacccaaattcagtgtgggcgagcgcaa cggttttggcccttgttagaggcccccgtataaacaaggagggccaattatgagagagctaatctatcgcgtgcgtgttcat aacttgagttaaaaaatagggagccctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatactaaaaaggagcggaccgaaagggaag ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttactaaaaaggagcgga Where is the Motif???

19 AgAAgAAAGGttGGG cAAtAAAAcGGcGGG..|..|||.|..||| Why Finding (15,4) Motif is Difficult? atgaccgggatactgatAgAAgAAAGGttGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatacAAtAAAAcGGcGGGa tgagtatccctgggatgacttAAAAtAAtGGaGtGGtgctctcccgatttttgaatatgtaggatcattcgccagggtccga gctgagaattggatgcAAAAAAAGGGattGtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatAtAAtAAAGGaaGGGcttatag gtcaatcatgttcttgtgaatggatttAAcAAtAAGGGctGGgaccgcttggcgcacccaaattcagtgtgggcgagcgcaa cggttttggcccttgttagaggcccccgtAtAAAcAAGGaGGGccaattatgagagagctaatctatcgcgtgcgtgttcat aacttgagttAAAAAAtAGGGaGccctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatActAAAAAGGaGcGGaccgaaagggaag ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttActAAAAAGGaGcGGa

20 a G g t a c T t C c A t a c g t Alignment Alignment a c g t T A g t a c g t C c A t C c g t a c g G _________________ A 3 0 1 0 3 1 1 0 Profile Profile C 2 4 0 0 1 4 0 0 G 0 1 4 0 0 0 3 1 T 0 0 0 5 1 0 1 4 _________________ Consensus Consensus A C G T A C G T Line up the patterns by their start indexes s = (s 1, s 2, …, s t ) Construct matrix profile with frequencies of each nucleotide in columns Consensus nucleotide in each position has the highest score in column Motifs: Profiles and Consensus

21 Motif Search in Local Database

22


Download ppt "Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 3 Finding Motifs Aleppo University Faculty of technical engineering."

Similar presentations


Ads by Google