생물정보학 Bioinformatics
생물정보학이란? 생물정보학 (Bioinformatics) 생물학 (biology) + 정보학 (informatics) 융합 학문 Interdisciplinary field of science that develops methods and software tools for understanding biological data. Biology, computer science, statistics, mathematics, … 1970, Hogeweg & Hesper coined the term bioinformatics
생물정보학의 발달 Sequencing Database 1950s Amino acid sequencing (Sanger, Insulin) 1970s Nucleotide sequencing (Sanger, Maxam-Gilbert) 1990s Genome sequencing Haemophilus influenza (1995, 1.2 Mb), Human genome (2003, 3 Gb) Database 1972 Dayhoff & Lipman : mother & father of bioinformatics first protein sequence databases NCBI (National Center for Biotechnology Information) 1982 Creation of database GenBank (NCBI, NIH, USA) 1988 NCBI/EMBL/DDBJ GenBank at NCBI EMBL (European Molecular Biology Laboratory) at EBI DDBJ (DNA Data Bank of Japan) at CIB (NIG, Japan) * INSDC (International Nucleotide Sequence Database Collaboration)
생물정보학으로 무엇을? The goal of bioinformatics Major research efforts To increase the understanding of biological processes Focus on developing and applying computationally intensive techniques to achieve the goal (pattern recognition, data mining, machine learning algorithms, visualization) Major research efforts sequence alignment, gene finding, genome assembly, prediction of gene expression protein structure alignment Prediction of protein structure prediction of protein–protein interactions modeling of evolution drug design & drug discovery
Omics Omics Genome Genomics Proteome Proteomics -ome, -omics data-driven approach (exploratory approach) vs hypothesis-driven approach Genome Genomics Proteome Proteomics Transcriptome Transcriptomics Metabolome Metabolomics …
Databases Primary nucleotide sequence databases Meta databases Original sequence data INSD (International Nucleotide Sequence Database) include GenBank at NCBI EMBL (European Molecular Biology Laboratory) at EBI DDBJ (DNA Data Bank of Japan) at NIG Meta databases Collect data from different sources and make them available in a new and more convenient form, or with an emphasis on a particular disease or organism RefSeq (NCBI) Enzyme Portal (European Bioinformatics Institute) mGen : contain databases GenBank, Refseq, EMBL and DDBJ
Protein sequence databases Genome databases EcoCyc : E. coli K-12 Saccharomyces Genome Database : EzGenome : archaea and bacteria Protein sequence databases UniProt : Universal Resource Swiss-Prot : Protein Knowledgebase Pfam : Protein families database Entrez : 정보검색 시스템 (NCBI) ExPASy (Expert Protein Analysis System) : Proteomics SIB (Swiss Institute of Bioinformatics) BLAST : 서열 검색 프로그램 (NCBI)
Useful websites NCBI http://www.ncbi.nlm.nih.gov/ BLAST http://www.ncbi.nlm.nih.gov/Blast.cgi PubMed http://www.ncbi.nlm.nih.gov/pubmed/ EMBL-EBI http://www.ebi.ac.uk/ ExPASy http://www.expasy.org/ BRIC http://ibric.org/