UK MRC Human Genome Mapping Project Resource Centre EMBOSS – an application suite for bioinformatics Lisa Mullan
UK MRC Human Genome Mapping Project Resource Centre E – European M – Molecular B – Biology O – Open S – Software S - Suite
UK MRC Human Genome Mapping Project Resource Centre Large collection of gene and protein analysis tools Sequence retrieval Alignments Primer design Restriction Mapping Protein domain searching Translation
UK MRC Human Genome Mapping Project Resource Centre DNA Sequence 1 DNA Sequence 2 dotplottranslation protein local/global alignment protein Sequence 1 protein Sequence 2 multiple sequence alignment motif and domain searching physico- chemical properties
UK MRC Human Genome Mapping Project Resource Centre AGTGGTCGTGAAG AGAATGCTCCTCC TTTGGAATCTTAA >SEQ1.fasta AGTGCTCCTCCCT TAGAATCTTAG >SEQ2.fasta Unix% dottup SEQ1.fasta SEQ2.fasta –window 10 & Unix% dotmatcher SEQ1.fasta SEQ2.fasta –window 10 – threshold 17 & For an exact match: For a similarity match: Dotplots
UK MRC Human Genome Mapping Project Resource Centre ATGGGTCGTGAAGAGAATGCTCCTCCTTTGGAATCT TCTAAGATTCCCTCCTCGGTATCTAAGATTCCCTCCTCGGTA Dottup looks for regions of exact match Dotplots
UK MRC Human Genome Mapping Project Resource Centre ATGGGTCGTGAAGAGAATGCTCCTCCTTTGGAATCT TCTAAGATTCCCTCCTCGGTATCTAAGATTCCCTCCTCGGTA Dottup looks for regions of exact match There are no regions of exact match spanning a window size of between the vertical window of 10 and the horizontal sequence, therefore nothing is placed on the output graph. Dotplots
UK MRC Human Genome Mapping Project Resource Centre ATGGGTCGTGAAGAGAATGCTCCTCCTTTGGAATCT TCTAAGATTCCCTCCTCGGTATCTAAGATTCCCTCCTCGGTA There are no regions of exact match spanning a window size of 10 anywhere between the two sequences, therefore nothing is placed on the output graph. Dotplots
UK MRC Human Genome Mapping Project Resource Centre A T G C A T G – C Identity Matrix Dotplots
UK MRC Human Genome Mapping Project Resource Centre A T G C A T G – C CCTCCTTTGG Score = CCTCCTTTGG CCTCCCTTAG Score = 32 ProLeu ProLeu Dotplots
UK MRC Human Genome Mapping Project Resource Centre ATGGGTCGTGAAGAGAATGCTCCTCCTTTGGAATCT TCTAAGATTCCCTCCTCGGTATCTAAGATTCCCTCCTCGGTA Using a window size of 10 and a threshold value of 25 Dotplots
UK MRC Human Genome Mapping Project Resource Centre ATGGGTCGTGAAGAGAATGCTCCTCCTTTGGAATCT TCTAAGATTCCCTCCTCGGTATCTAAGATTCCCTCCTCGGTA Using a window size of 10 and a threshold value of 35 Dotplots
UK MRC Human Genome Mapping Project Resource Centre ATGGGTCGTGAAG AGAATGCTCCTCC TTTGGAATCTTAA >SEQ1.fasta ATGGCTCCTCCCT TAGAATCTTAG >SEQ2.fasta Unix% plotorf SEQ1.fasta –stop TAA, TAG –out GA.plot & Unix% getorf SEQ1.fasta –minsize 5 –table 0 –find 1 –out GA.getorf &
UK MRC Human Genome Mapping Project Resource Centre ATGGGTCGTGAAGAGAATGCTCCTCCTTTGGAATCTTAA TACCCAGCACTTCTCTTACGAGGAGGAAACCTTAGAATT Frame -3 Frame -2 Frame -1 Frame 1 Frame 2 Frame 3 Start and stop codons are located according to the instructions to the program, and the area in between start and stop codons
UK MRC Human Genome Mapping Project Resource Centre Indication of full coding sequence? Alternative splice form?
UK MRC Human Genome Mapping Project Resource Centre >_1 [ ] MLLLWNL >_2 [1 - 36] MGREENAPPLES* Using getorf: (min ORF size = 5) stop codon start methionine
UK MRC Human Genome Mapping Project Resource Centre Unix% transeq SEQ1.fasta –frame 1 –table 0 –sbegin 4 –send 33 -out GA.fasta & >_1 [ ] MLLLWNL >_2 [1 - 36] MGREENAPPLES* >GA.fasta GREENAPPLES Knowledge procured from the literature suggests that this protein is post-translationally modified to cleave the initial methionine residue
UK MRC Human Genome Mapping Project Resource Centre Unix% needle GA.fasta A.fasta –gapopen 10 –gapextend 0.5 –matrix EPAM250 & Unix% water GA.fasta A.fasta –gapopen 10 –gapextend 0.5 –matrix EPAM250 & >GA.fasta GREENAPPLES >A.fasta APPLES For a global alignment: For a local alignment: Alignments
UK MRC Human Genome Mapping Project Resource Centre P S T W Y V B Z X A R N D C Q E G H I L K M F P S T W V B Z X Y A N D C Q E G H I L K M F R APPESGREENL S E L P A P Gap Open Penalty = 10 Gap Extension Penalty = Using PAM250 matrix Alignments
UK MRC Human Genome Mapping Project Resource Centre P S T W Y V B Z X A R N D C Q E G H I L K M F P S T W V B Z X Y A N D C Q E G H I L K M F R APPESGREENL S E L P A P Gap Open Penalty = 10 Gap Extension Penalty = 0.5 Alignments
UK MRC Human Genome Mapping Project Resource Centre P S T W Y V B Z X A R N D C Q E G H I L K M F P S T W V B Z X Y A N D C Q E G H I L K M F R APPESGREENL S E L P A P Gap Open Penalty = 10 Gap Extension Penalty = Alignments
UK MRC Human Genome Mapping Project Resource Centre P S T W Y V B Z X A R N D C Q E G H I L K M F P S T W V B Z X Y A N D C Q E G H I L K M F R APPESGREENL S E L P A P Gap Open Penalty = 10 Gap Extension Penalty = Alignments
UK MRC Human Genome Mapping Project Resource Centre P S T W Y V B Z X A R N D C Q E G H I L K M F P S T W V B Z X Y A N D C Q E G H I L K M F R APPESGREENL S E L P A P Gap Open Penalty = 10 Gap Extension Penalty = Alignments
UK MRC Human Genome Mapping Project Resource Centre P S T W Y V B Z X A R N D C Q E G H I L K M F P S T W V B Z X Y A N D C Q E G H I L K M F R APPESGREENL S E L P A P Gap Open Penalty = 10 Gap Extension Penalty = Alignments
UK MRC Human Genome Mapping Project Resource Centre P S T W Y V B Z X A R N D C Q E G H I L K M F P S T W V B Z X Y A N D C Q E G H I L K M F R APPESGREENL S E L P A P Gap Open Penalty = 10 Gap Extension Penalty = Alignments
UK MRC Human Genome Mapping Project Resource Centre P S T W Y V B Z X A R N D C Q E G H I L K M F P S T W V B Z X Y A N D C Q E G H I L K M F R APPESGREENL S E L P A P Gap Open Penalty = 10 Gap Extension Penalty = Alignments
UK MRC Human Genome Mapping Project Resource Centre P S T W Y V B Z X A R N D C Q E G H I L K M F P S T W V B Z X Y A N D C Q E G H I L K M F R Gap Open Penalty = 10 Gap Extension Penalty = 0.5 APPESGREENL S E L P A P Alignments
UK MRC Human Genome Mapping Project Resource Centre Alignments To align two or more sequences in a biologically significant way. GREENAPPLES APPLES Local (water) Global (needle) Gap penalty = 10; Extension penalty = 0.5 APPLES
UK MRC Human Genome Mapping Project Resource Centre GREENAPPLES APPLES looks like the “apples” motif may be part of a larger domain APPLES physicochemical properties pattern searching
UK MRC Human Genome Mapping Project Resource Centre Physico- chemical properties Unix% iep GA.fasta –plot -step 0.5 –out GA.IEP & Unix% pepinfo GA.fasta –hwindow 8 –generalplot –hydropathyplot & Isoelectric point General properties
UK MRC Human Genome Mapping Project Resource Centre Physico- chemical properties D Y FW H K R E Q N M A G C S P I V L T Aliphatic Aromatic Hydrophobic Tiny Small Charged Positive Polar The pepinfo graph of properties is based on this diagram
UK MRC Human Genome Mapping Project Resource Centre Physico- chemical properties non-polar region with small residues polar region to one side of non- charged region
UK MRC Human Genome Mapping Project Resource Centre Hydropathy plot I 4.50 C 2.50 A 1.80 M 1.90 R-4.50 K-3.90 D-3.50 Q-3.50 N-3.50 E-3.50 H-3.20 S-0.80 T-0.70 P Y G W L 3.80 V 4.20 F 2.80 Kyte & Doolittle GREENAPPLES window size = = hydropathy value = -3.08
UK MRC Human Genome Mapping Project Resource Centre GREENAPPLES 0 Hydropathy plot GREEN REENA EENAP ENAPP NAPPL PPLES APPLE no truly hydropathic regions window size = 5
UK MRC Human Genome Mapping Project Resource Centre Pattern searching GREENAPPL---ES -RE-DAPPL---ES GREEN---LEAVES -RE-D---LEAVES GREENAPPLES >GA.fasta GREENLEAVES >GL.fasta REDAPPLES >RA.fasta REDLEAVES >RL.fasta [G] (0,1)-R–[E] (1,2)–[ND]–X (3)–L–X (3) – E – S
UK MRC Human Genome Mapping Project Resource Centre Pattern searching Unix% fuzzpro sptr:* pattern.fruit –mismatch 0 –out GA.fuzzpro & Search a protein database: [G] (0,1) - [R] – [E] (1,2) – [ND] –x (3) – [L] –x (3) – [E] – [S] pattern.fruit Nothing resembling this pattern is found in the database - But we could try scanning PRINTS (pscan) and PROSTIE (patmatmotifs) with one of our sequences.