Download presentation
Presentation is loading. Please wait.
1
The Poor Beginners’ Guide to Bioinformatics
2
What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad or better) public databases of genomic sequences public databases of cDNA + EST public databases of protein sequences, structures and motifs money for specialised software packages public servers capable of (almost) anything we wish to do
3
Dealing with a sequence: model tasks basic (DNA) sequence manipulation: restriction analysis, translation… sequence similarity and pattern/motif searches gene building: modelling exon-intron structures protein domain searches,structure analysis construction and interpretation of sequence alignments
4
Make sure you have the correct format. FASTA format is (almost) always correct. >sequencename thisisasequenceinfastaformat If not, you can always use raw data. If things don’t work, check for gaps in sequence, empty lines, and file extension. BEWARE OF MICROSOFT! Notes on basic sequence handling
6
Model tasks continued … basic (DNA) sequence manipulation: restriction analysis, translation… sequence similarity and pattern/motif searches gene building: modelling exon-intron structures protein domain searches,structure analysis construction and interpretation of sequence alignments
7
Defining a gene family… By overall domain structure By domain sequence Based on a peptide motif L-X-X-G-N-X-[ML]-N FH1FH2 FH3?
8
Sequence comparison-based searches Entrez “related sequences” easy identification of “false starts” no organism selection BLAST/FASTA all DNA/protein combinations taxonomy selection possible statistical data provided domain structure comparison available divergent motifs may be missed Two methods are better than one.
9
Notes on all sequence comparisons, searches, alignments… Start with defaults (the authors know what they are doing)… … BUT don’t be afraid to vary the parameters Chose a reasonable scoring matrix: Distant sequences: low BLOSUM, high PAM Closely related sequences: low PAM, high BLOSUM
11
Motif-based searches sensitive no statistics only protein databases can be searched TAIR PatMatch Arabidopsis - specific Problematic user interface ISREC - INSECTS admirable technology access to SwissProt and TrEMBL no organism selection
13
basic (DNA) sequence manipulation: restriction analysis, translation… sequence similarity and pattern/motif searches gene building: modelling exon-intron structures protein domain searches,structure analysis construction and interpretation of sequence alignments Model tasks continued …
14
Some genes are more alike than others… A number of splicing prediction servers available Agreement of different methods is a good sign but no absolute measure Always align ESTs if possible Beware of non-conventional intron boundaries (GC-AG instead of GT-AG) Plant data for transcription start/factor binding sites prediction are limited
15
basic (DNA) sequence manipulation: restriction analysis, translation… sequence similarity and pattern/motif searches gene building: modelling exon-intron structures protein domain searches,structure analysis construction and interpretation of sequence alignments Model tasks continued …
16
Searching for known domains/motifs Searching for PROSITE patterns – allowing ambiguities PROSITE and Pfam profile searches SMART, CDsearch (domains and more)
18
Predicting protein localisation transmembrane segments prediction predicting signal peptides/anchors 2 methods available possibility to predict organelle localisation
19
basic (DNA) sequence manipulation: restriction analysis, translation… sequence similarity and pattern/motif searches gene building: modelling exon-intron structures protein domain searches,structure analysis construction and interpretation of sequence alignments Model tasks continued …
20
Alignment: “manual” or automated? locally installed, free, for Mac and PC interactive domain definition statistical data provided may produce false- positive blocks (read the on-line manual!) “objective” results a number of servers available recommended for well- conserved proteins empiric parameters (e.g. gap penalties) bad for divergent sequences
21
Phylogenetic analyses Two methods are better than one. Your phylogeny cannot be better than your alignment. Gaps are no data. Allways do bootstrapping (100-500 cycles) Certain questions cannot be answered from an unrooted tree.
22
Points to take off... go to the Bioinformatics page http://www2.rhul.ac.uk/~ujba110/Bioinfo.htm select your exercise (A,B,C,D,E) … and enjoy it! If you mean it seriously: create your own bookmarks (seed provided on the course web page)
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.