Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.

Similar presentations


Presentation on theme: "Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre."— Presentation transcript:

1 Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre

2 Common Computational Analyses Sequence Assembly Simple sequence analysis –Translation and reverse Complement, ORF –Composition statistics (protein & DNA) –Molecular mass –Total charge and pI; local hydropathy –Simple determination of secondary structures –Restriction site analysis –Internal repeat analysis Detection of active sites, functional residues, characteristic structures, substrates, and processing signals

3 Common Computational Analyses Database sequence search Multiple alignment 2  and 3  Structure prediction; transmembrane helix detection Structure modeling Docking prediction and design Hidden Markov model searches

4 Database Searching Text-based Database Searching - using a text string to match an annotation in a sequence database record, ie. Keyword search Sequence-based Database Searching - using a biological sequence to match its whole or parts of its sequence to the sequences of every sequence database records

5 Text-Based Database Searching Examples: Entrez, SRS, DBGET, AceDB - common integrated database systems Search Concepts –Boolean Search - AND, OR, NOT –Broadening Search –Narrowing the Search –Proximity searching, soundex –Wild Card, Stemming eg. Thala* for thalasemia, thalassemia, thalassemic Use standard string search algorithms and boolean operations, vocabulary matches

6 Text-based Database Searching Example: To find the human homolog of the Drosophila per gene Procedure –Web to Entrez –All Fields : enter "human" "per" –Hits returned, irrelevant - broaden search –"human" "period" - more hits –check every one, find the human RIGUI gene Hit and miss, clever guess work, free form or controlled vocabulary (MeSH terms)? Use Boolean searches?

7 Sequence-based Database Searching Homology Search Global or Local Sequence Alignment Needleman-Wunch Algorithm Smith-Waterman Algorithm Lipman - Pearson FASTA Altschul's BLAST Take a sequence, pairwise comparison with each sequence in the database

8 Sequence-based Database Searching Basic Assumptions: Sequences of homologous Genes/Protein diverge over time even though structure and/or function change little Significant sequence similarity inferred as potential structural /functional similarity or common evolutionary origin Based on well-characterised protein, infer the function of an unknown sequence at gene or protein sequence level.

9 Sequence-based Database Searching Global Alignment forces complete alignment of the pairwise comparison of the two input sequences Local Alignment looks for local stretches of similarity and tries to align the most similar segments Algorithms used may be similar, but output different, statistics needed to assess results

10 Sequence-based Database Searching Alignment Scoring Substitution score and substitution matrix PAM, BLOSUM affine gap costs/gap penalty and gap scores Optimal alignments, dynamic programming Needleman-Wunsch algorithm, Smith-Waterman algorithm (SSEARCH) Additional heuristics to speed up the search - FASTA, BLAST

11 Some definitions Affine gap costs - scoring system for gaps within alignments which charges a penalty for gap formation and additional per- residue penalty proportional to size of gap Alignment score - numerical value indicating the overall quality of an alignment, the higher the better the alignment. Algorithm - fixed procedure embodied in a computer program Heuristics - a computer science term referring to guesses made by the program to approximate results, usually based on arbitrary or predefined rules. Gapped Alignment - alignment of sequences where gaps are permitted

12 Computational Genefinding Major challenge in genome project Given a DNA sequence, where does a gene begin and stop? - ORF Where are the exons and introns? Where are the transcription elements? Gene structure and other regulatory elements?

13 Genomic Elements Intron-exon splice sites Start-Stop codons Branch Points Promoters and terminators of transcription Polyadenylation sites ribosomal binding sites Topoisomerase II binding sites Topoisomerase I cleavage sites Transcription factor binding sites

14 Detecting Genomic Elements Local sites and motifs/patterns for such element - signals and signal sensors Extended variable-length regions eg exons and introns- contents and content sensors Linguistic technique - gene structure described in formal grammar - GeneLang genefinding program

15 Signal sensors Simple consensus sequence Use of Pattern matching algorithms Weight matrices allow for weighted score for each weight matrix sensors to be summed Use of Artificial Neural Networks (ANN)

16 Content Sensors Long ORF for bacteria Statistical models eg. Markov models - GeneMark statistical models of nucleotide frequencies and dependencies in codon structure Neural Nets eg Grail exon detection by neural network combined with signal sensors for exon-intron splice sites

17 Some Definitions Artificial Neural Nets - statistical pattern recognition method - a type of nonlinear regression Markov Models - statistical models for sequences in which the probability of each residue depends on the residues preceding it. Dynamic Programming - type of algorithm widely used for constructing sequence aligments and for evaluating all posible candidate gene structure

18 Other Genefinding methods Use of dynamic programming Linguistic rules for functional features Parameters of a Markov Process on hidden variables - hidden Markov Models (HMM) HMM genefinder - EcoParse, Xpound GeneMark HMM, Veil, HMMgene, GenScan


Download ppt "Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre."

Similar presentations


Ads by Google