Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.

Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription Factor DB

http://weblogo.berkeley.edu WebLogo - Input Aligned Sequences (e.g. output of ClulatlW) RUN !

Genes: WebLogo - Output Proteins:

MEME http://meme.sdsc.edu/ Motif discovery from unaligned sequences  Genomic or protein sequences Identifies profile motifs  Multiple motifs for any input Flexible model of motif presence  Motif can be absent in some sequences  Can appear several times in one sequence

MEME Input Email addressMultiple input sequences How many times in each sequence? How many motifs? How many sites? Range of motif lengths

MEME Output (1) Motif length Number of times Like BLAST “Position-Specific Probability Matrix” = Motif Profile Diversion of motif position from background Most popular symbols

MEME Output (2) Sequence names Reverse complement (genomic input only) Position in sequence Strength of match Motif within sequence

MEME Output (3) Overall strength of motif matches Original sequence lengths Motif instance

MAST Searches for motifs (one or more) in sequence databases:  Like BLAST but motifs for input  Similar to iterations of PSI-BLAST Profile defines strength of match  Multiple motif matches per sequence  Combined E value for all motifs MEME uses MAST to summarize results:  Each MEME result is accompanied by the MAST result for searching the discovered motifs on the given sequences.

MAST Input Email address Database (like BLAST) Motif file (e.g. MEME output) Consider matched sequence length E value threshold

MAST Output (1) Matched accession Match E value Length of sequence Link to GenBank

MAST Output (2) Motif diagram

MAST Output (3) Position of each instance P value of instance Matched parts of sequence Motif ‘consensus’ Motif and orientation

TRANSFAC Database of eukaryotic DNA transcription regulation: Individual regulatory sites (SITES table)  Genes to which they belong  Proteins which bind them Proteins which bind sites (FACTORS table)  Cellular source of protein  Nucleotide motif profile for binding  Some grouping and classification Classification of factors (CLASS table) Position-specific matrices for select factors (MATRIX table) Cell localization (CELL table)

Searching TRANSFAC www.gene-regulation.com Search a single table  By identifier, factor name, gene name  By species, author Browse your way from table to table Search within a sequence  MatInspector, TFScan (EMBOSS package)

TRANSFAC Factor DT Date; author FA Factor name GE Encoding gene SF Structural features CP Cell specificity (positive) CN Cell specificity (negative) EX Expression pattern FF Functional features IN Interacting factors MX Matrix BS Binding SITE DR External databases References: RN Reference no. RX MEDLINE ID RA Reference authors RT Reference title RL Reference data

TRANSFAC Matrix Accession Position Specific Matrix Statistical basis Concensus (IUPAC subset symbols)

TRANSFAC Site (1) Accession number DNA or RNA Gene Gene region Sequence of regulatory element Position range of factor binding site

TRANSFAC Site (2) Binding factor accession Factor name Binding ‘quality’ 1functionally confirmed 2binding of pure protein 3 immunologically characterized extract 4 via known binding sequence 5 extract protein binding to bona fide element 6unassigned Organism Cellular source Methods of identifying site External links

TRANSFAC Factor (1) AC: Accession number FA: Factor name SX: Other names OS: Organism OC: Taxonomy HO: Homologs CL: Classification SZ: Size SX: Amino acid sequence

TRANSFAC Factor (2) Protein sequence reference Features and positions Structural features Cell specificity

Question A biologist at your university has found 15 target genes that she thinks are co-regulated. She gives you 15 upstream regions of length 50 base pairs in FASTA format, file DNASample50.txt, and asks you to identify the motif, and - if possible - the potential regulating protein. She tells you the sequences are from Homo sapiens, and by intuition feels the motifs of length 8. She wants you to suggest only the best possible candidate motif.

Question After you ran all the programs your biologist friend confesses that she is not sure if her intuition about the motif length was correct. Re-run the tool without knowledge of motif length. Do you get the same results? Determine a potential DNA binding protein using TRANSFACTRANSFAC

Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.

Similar presentations

Presentation on theme: "Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.

Similar presentations

Presentation on theme: "Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription."— Presentation transcript:

Similar presentations

About project

Feedback