Presentation is loading. Please wait.

Presentation is loading. Please wait.

BASICS OF BIOINFORMATICS Biotechnology Division North-East Institute of Science & Technology (Council of Scientific & Industrial Research) Jorhat 785 006,

Similar presentations


Presentation on theme: "BASICS OF BIOINFORMATICS Biotechnology Division North-East Institute of Science & Technology (Council of Scientific & Industrial Research) Jorhat 785 006,"— Presentation transcript:

1 BASICS OF BIOINFORMATICS Biotechnology Division North-East Institute of Science & Technology (Council of Scientific & Industrial Research) Jorhat 785 006, Assam Salam Pradeep Email: salampradeep@gmail.com

2 Bioinformatics Use of techniques including Applied mathematics Informatics Statistics Computer science Artificial intelligence, Chemistry & Biochemistry To solve biological problems on the molecular level

3 Major Research Efforts & Applications

4 Sequence analysis & alignment Comparison of sequence in order to find the similar sequence. Way of arranging the sequences of DNA / RNA / Amino Acids to identify regions of similarity that may be a consequence of functional, structural or evolutionary relationships. Identification of gene structures, reading frames, distributions of introns & exons & regulatory elements.

5 Genome annotation Process of marking the genes and other biological features in a DNA sequence First genome annotation software system was designed in 1995 by Dr. Owen White First genome of a free-living organism to be decoded, the bacterium Haemophilus influenzae. White’s software system finds the genes (places in the DNA sequence that encode a protein), the transfer RNA, and other features.

6 Computational evolutionary biology Trace the evolution of a large number of organisms by measuring changes in their DNA, rather than through physical taxonomy or physiological observations alone. Compare entire genomes, permits the study of more complex evolutionary events, such as gene duplication, horizontal gene transfer, speciation. Track and share information on an increasingly large number of species and organisms

7 Measuring biodiversity Biodiversity Databases are used to collect the species names, descriptions, distributions, genetic information, status & size of populations, habitat needs, and how each organism interacts with other species. Computer simulations model such things as population dynamics, or calculate the cumulative genetic health of a breeding pool (in agriculture) or endangered population (in conservation). Entire DNA sequences, or genomes of endangered species can be preserved, allowing the results of Nature's genetic experiment to be remembered in silico, and possibly reused in the future, even if that species is eventually lost.

8 Prediction of protein structure Protein structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry. Its aim is the prediction of the three-dimensional structure of proteins from their amino acid sequences. In other words, it deals with the prediction of a protein's tertiary structure from its primary structure. Protein structure prediction is of high importance in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes).

9 Comparative genomics Comparative genomics is the study of the relationship of genome structure and function across different biological species or strains. Gene finding is an important application of comparative genomics, as is discovery of new, non-coding functional elements of the genome. Computational approaches to genome comparison have recently become a common research topic in computer science.

10 Modeling biological systems Systems biology involves the use of computer simulations of cellular subsystems such as the networks of metabolites and enzymes which comprise metabolism, signal transduction pathways and gene regulatory networks) to both analyze and visualize the complex connections of these cellular processes. Artificial life or virtual evolution attempts to understand evolutionary processes via the computer simulation of simple (artificial) life forms.

11 Protein-protein interaction & docking Protein-protein interactions involve the association of protein molecules. These associations are studied from the perspective of biochemistry, signal transduction and networks. Wet Lab Techniques: Co-immunoprecipitation, FRET, Bimolecular Fluorescence Complementation Protein-protein docking: the prediction of protein- protein interaction based on the three-dimensional protein structures only is not satisfactory As of 2006.

12 Biological Sequence Database

13 Primary Sequence Databases The International Nucleotide Sequence Database (INSD) consists of the following databases. DDBJ (DNA Data Bank of Japan) EMBL Nucleotide DB (European Molecular Biology Laboratory) GenBank (National Center for Biotechnology Information) They interchange the stored information and are the source for many other databases

14 NCBI National Center for Biotechnology Information is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health. Founded in 1988 sponsored by Senator Claude Pepper. NCBI has had responsibility for making available the GenBank DNA sequence database since 1992 In addition to GenBank, NCBI provides OMIM, MMDB (3D protein structures), dbSNP, the Unique Human Gene Sequence Collection, a Gene Map of the Human genome, a Taxonomy Browser etc

15

16 DDBJ

17 EMBL

18 Protein Sequence Database

19 UniProt - Universal Protein Resource

20 Swiss-Prot - Protein Knowledgebase

21 Protein Information Resource

22 Pfam

23 Protein Structure Databases

24 Protein Data Bank (PDB)

25

26 PDB Statistics

27 NCBI Molecular Modeling Database

28

29 Genome Databases

30 Corn

31

32 ERIC (Enteropathogen Resource Integration Center)

33

34 Flybase

35

36 MGI Mouse Genome

37

38 Viral Bioinformatics Resource Center

39

40 Saccharomyces Genome Database

41

42 National Microbial Pathogen Data Resource

43 Other Databases Protein-protein interactions - BioGrid, STRING, DIP etc Metabolic pathway Databases - KEGG, BioCyc, MANET etc Microarray databases - ArrayExpress, Stanford Microarray Dbase, GEO

44 Sequence File Formats FASTA – Always starts with a > (greater than symbol) GENBANK – Series of header lines - Locus, Definition, Origin … EMBL – 1 st line begins the first sequence entry - 1 st line of entry contains 2 letter ID

45 FASTA Format

46 GenBank Format

47

48 EMBL Format

49

50 Inside NCBI

51 Sitemap

52

53 Taxonomy Browser

54 NCBI Taxonomy Browser Statistics

55 Genome Projects

56 Genome Projects Statistics

57 Map Viewer

58

59 Sequence analysis & Sequence alignment

60 Sequence analysis & alignment Comparison of sequences in order to find similar sequences A way of arranging the sequences of DNA/RNA/PTN to identify regions of similarity that may be a consequence of functional, structural or evolutionary relationships. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix

61 Representations in Sequence alignment Semi Conservative Substitution Conservative Substitution

62 Global and Local alignments Global alignments attempt to align every residue in every sequence Most useful when the sequences in the query set are similar and of roughly equal size. Local alignments are useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context. With sufficiently similar sequences - there is no difference between local and global alignments.

63 Needleman-Wunsch algorithm - A general global alignment technique and is based on dynamic programming Smith-Waterman algorithm - A general local alignment method also based on dynamic programming.

64 Pairwise alignment Used to find the best-matching piecewise local or global alignments of two query sequences. It can only be used between 2 sequences at a time Efficient to calculate and are often used for methods such as searching a database for sequences with high homology to a query. Primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods

65

66 Multiple sequence alignment MSA incorporate more than two sequences at a time Multiple alignment align all of the sequences in a given query set Often used in identifying conserved sequence regions across a group of sequences Aid in establishing evolutionary relationships by constructing phylogenetic trees

67

68 Sequence Similarity Search

69 NCBI BLAST An algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold. BLAST program was designed by Eugene Myers, Stephen Altschul, Warren Gish, David J. Lipman and Webb Miller at the NIH and was published in J. Mol. Biol. in 1990

70 BLAST Types blastn - Nucleotide-nucleotide BLAST blastp - Protein-protein BLAST blastx - Nucleotide 6-frame translation- protein tblastx - -Nucleotide 6-frame translation- nucleotide 6-frame translation tblastn - Protein-nucleotide 6-frame translation megablast - Large numbers of query sequences

71

72 BLASTn

73 BLASTp

74 BLASTn: Search Set

75 BLASTp: Search Set

76 BLASTn: Program Selection

77 BLASTp: Program Selection

78 BLASTn Result

79 BLASTn: Graphic Summary

80 BLASTn Description

81 BLASTn Alignment

82 BLASTn Tree View

83 PDB BLASTp

84 BLASTp: Graphic Summary

85 PDB BLASTp Description

86 PDB BLASTp Alignment

87 BLASTp Tree View

88 Multiple Sequence Alignment

89 EBI ClustalW Server

90 Preparing Multiple Sequence

91

92

93

94

95 Phylogenetic Analysis

96 Cladogram A Cladogram is a branching diagram (tree) assumed to be an estimate of a phylogeny where the branches are of equal length, thus cladograms show common ancestry, but do not indicate the amount of evolutionary "time" separating taxa.

97 Phylogram Phylogram is a branching diagram (tree) assumed to be an estimate of a phylogeny, branch lengths are proportional to the amount of inferred evolutionary change.

98 JalView – Java Applet

99 Thank You


Download ppt "BASICS OF BIOINFORMATICS Biotechnology Division North-East Institute of Science & Technology (Council of Scientific & Industrial Research) Jorhat 785 006,"

Similar presentations


Ads by Google