Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad.

Similar presentations


Presentation on theme: "The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad."— Presentation transcript:

1 The Poor Beginners’ Guide to Bioinformatics

2 What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad or better) public databases of genomic sequences public databases of cDNA + EST public databases of protein sequences, structures and motifs  money for specialised software packages public servers capable of (almost) anything we wish to do

3 Dealing with a sequence: model tasks basic (DNA) sequence manipulation: restriction analysis, translation… sequence similarity and pattern/motif searches gene building: modelling exon-intron structures protein domain searches,structure analysis construction and interpretation of sequence alignments

4  Make sure you have the correct format.  FASTA format is (almost) always correct. >sequencename thisisasequenceinfastaformat  If not, you can always use raw data.  If things don’t work, check for gaps in sequence, empty lines, and file extension.  BEWARE OF MICROSOFT! Notes on basic sequence handling

5

6 Model tasks continued … basic (DNA) sequence manipulation: restriction analysis, translation… sequence similarity and pattern/motif searches gene building: modelling exon-intron structures protein domain searches,structure analysis construction and interpretation of sequence alignments

7 Defining a gene family… By overall domain structure By domain sequence Based on a peptide motif L-X-X-G-N-X-[ML]-N FH1FH2 FH3?

8 Sequence comparison-based searches Entrez “related sequences” easy identification of “false starts”  no organism selection BLAST/FASTA all DNA/protein combinations taxonomy selection possible statistical data provided domain structure comparison available  divergent motifs may be missed Two methods are better than one.

9 Notes on all sequence comparisons, searches, alignments…  Start with defaults (the authors know what they are doing)…  … BUT don’t be afraid to vary the parameters  Chose a reasonable scoring matrix: Distant sequences: low BLOSUM, high PAM Closely related sequences: low PAM, high BLOSUM

10

11 Motif-based searches sensitive  no statistics  only protein databases can be searched TAIR PatMatch  Arabidopsis - specific  Problematic user interface ISREC - INSECTS admirable technology access to SwissProt and TrEMBL  no organism selection

12

13 basic (DNA) sequence manipulation: restriction analysis, translation… sequence similarity and pattern/motif searches gene building: modelling exon-intron structures protein domain searches,structure analysis construction and interpretation of sequence alignments Model tasks continued …

14 Some genes are more alike than others… A number of splicing prediction servers available Agreement of different methods is a good sign but no absolute measure Always align ESTs if possible Beware of non-conventional intron boundaries (GC-AG instead of GT-AG) Plant data for transcription start/factor binding sites prediction are limited

15 basic (DNA) sequence manipulation: restriction analysis, translation… sequence similarity and pattern/motif searches gene building: modelling exon-intron structures protein domain searches,structure analysis construction and interpretation of sequence alignments Model tasks continued …

16 Searching for known domains/motifs Searching for PROSITE patterns – allowing ambiguities PROSITE and Pfam profile searches SMART, CDsearch (domains and more)

17

18 Predicting protein localisation transmembrane segments prediction predicting signal peptides/anchors 2 methods available possibility to predict organelle localisation

19 basic (DNA) sequence manipulation: restriction analysis, translation… sequence similarity and pattern/motif searches gene building: modelling exon-intron structures protein domain searches,structure analysis construction and interpretation of sequence alignments Model tasks continued …

20 Alignment: “manual” or automated? locally installed, free, for Mac and PC interactive domain definition statistical data provided  may produce false- positive blocks (read the on-line manual!) “objective” results a number of servers available  recommended for well- conserved proteins  empiric parameters (e.g. gap penalties)  bad for divergent sequences

21 Phylogenetic analyses  Two methods are better than one.  Your phylogeny cannot be better than your alignment.  Gaps are no data.  Allways do bootstrapping (100-500 cycles)  Certain questions cannot be answered from an unrooted tree.

22 Points to take off... go to the Bioinformatics page http://www2.rhul.ac.uk/~ujba110/Bioinfo.htm select your exercise (A,B,C,D,E) … and enjoy it! If you mean it seriously: create your own bookmarks (seed provided on the course web page)

23


Download ppt "The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad."

Similar presentations


Ads by Google