Presentation is loading. Please wait.

Presentation is loading. Please wait.

각종 생물정보 분석도구 의 실무적 활용 및 실습 김형용 개발팀 Insilicogen, Inc.

Similar presentations


Presentation on theme: "각종 생물정보 분석도구 의 실무적 활용 및 실습 김형용 개발팀 Insilicogen, Inc."— Presentation transcript:

1 각종 생물정보 분석도구 의 실무적 활용 및 실습 김형용 개발팀 Insilicogen, Inc.

2 Contents Introduction to biological sequence Pairwise alignment BLAST Multiple alignment ClustalW Phylogenetic analysis Phylip Genome analysis Apollo

3 Rosetta stone Hieroglyphic, Demotic Egyptian, Greek How can I translate it?

4 Biological sequence A kind of language “AGTCAGTCAGTCAGTCAGTTTCCCAAA” “PEEKSAVTALWGKVNVDEVGGEALGRLLV VYPWT” Format FASTA format GenBank(EMBL, DDBJ) format XML

5 FASTA format

6 Transformational grammar Regular grammar : [A|G](C.+)* Context free grammar : DNA Palindrome, “ 다시합창합시다 ” Context sensitive grammar Unrestricted Grammar : 자연어

7 Sequence Analysis method Sequence to sequence comparison : Alignment Pattern search : Using regular grammar RNA 2 nd structure modeling : Using context free grammar ADCNY- RQCLCR-PM AYC-YNR- CKCRDP- ADCNYRQCLCR PM AYCYNRCKCRD P

8 Substitution matrix DNA Protein BLOSUM (BLOCK Amino Acid Substitution Matrix) PAM (Percent Accepted Mutation)

9

10 Sequence alignment

11 ADCNY- RQCLCR-PM AYC-YNR- CKCRDP- ADCNYRQCLCR PM AYCYNRCKCRD P

12 Pairwise alignment Global alignment Needleman & Wunsch algorithm Local alignment Smith & Waterman algorithm Repeated matches Overlap matches

13 BLAST Unknown sequence Known sequence Database

14 NCBI toolkit BLAST analysis in your computer ftp://ftp.ncbi.nih.gov/blast/executables/LATES T/ncbiz.exe ftp://ftp.ncbi.nih.gov/blast/executables/LATES T/ncbiz.exe formatdb blastall bl2seq

15 Multiple alignment Purpose Predicting protein structure and function Phylogenetic analysis Confirm SNPs or other polymorphism Criteria Structural similarity Evolutionary similarity Functional similarity Sequence similarity

16 Multiple alignment Main application Extrapolation Phylogenetic analysis Pattern identification Domain identification DNA regulatory elements Structure prediction PCR analysis

17 Example of Multiple alignment Cellulose-binding domain of cellobiohydrolase I (30-35 residue)

18

19 Multiple alignment formats MSF : Multiple Sequence alignment Format Selex : Extended version of MSF ALN : Default output of ClustalW Phylip : Variant of ALN Converting format Fmtseq : http://bioweb.pasteur.fr/seqanal/interfaces/fmtseq. html

20 ClustarW 모든 sequence pair 에 대해 Kimura 의 모델을 이용하여, evolutionary distance diagonal matrix 를 만든다. Neighbor-joining clustering algorithm 을 사용 하여 guide tree 를 만든다. Similarity 가 감소하는 순으로 alignment 한다. Windows 용 다운로드 ftp://ftp.ebi.ac.uk/pub/software/dos/clustalw/

21 Phylogenetic analysis Phylogeny inference or “tree building” Character and rate analysis Practical approach Multiple fasta format (*.fasta) Multiple sequence alignment format (*.msf, *.aln, *.phy, *.nex) Tree format (*.tre) Result image (*.ps, *.png, *.jpg)

22 Common phylogenetic tree terminology

23

24

25 Types of tree

26 Phylogenetic tree building method

27 Types of data Character-based method Distance –based method

28 Similarity vs. Evolutionary Relationship Similar : having likeness or resemblance (an observation) Related : genetically connected (an historical fact)

29 Parsimony method The ‘most-parsimonious’ tree is the one that requires the fewest number of evolutionary events Advantages Simple, intuitive, logical Can be used to infer the sequence of extinct ancestor Disadvantages Derived from Medieval logic, not statistics

30 Maximum likelihood method The highest ML value is considered Advantages Statistical and evolutionary model-based The most ‘consistent’ Can be used to infer the sequence of ancestor Disadvantages Computationally very intense (limits number of taxa and length of sequence)

31 Minimum Evolution method The tree with the shortest sum of the branch lengths is chosen as the best tree Advantages Indirectly measured distances (immunological, hybridization) Usually faster than character-based methods Has an objective function Disadvantages Information lost when characters transformed to distances Slower than clustering method

32 Clustering methods (UPGMA & Neighbor-Joining) The algorithm itself builds ‘the’ tree Advantages Indirectly measured distances (immunological, hybridization) Fastest (very large DB quickly) Disadvantages Similarity and relationship are not necessarily the same thing. Have no explicit optimization criteria

33 Phylip Phylogeny Inference Package 주요 프로그램들 Dnaml, proml : Maximum likelihood Dnapenny, protpars : Parsimony method Fitch, neighbor : Distance method Drawgram, drawtree : drawing

34 그외 프로그램들 PAUP : *.tre 파일의 생성 TreeView : *.tre 파일의 viewing BioEdit : GUI 환경에서 대부분의 작업을 수행 (fastdnaml 유용 )

35 Genome Analysis Genome sequencing Transcriptome sequencing (EST) Microsatellite, SNP, Genotyping

36 EST Expression Sequence Tag

37 Eukaryotic gene structure

38 Genome annotation Repeat identification : RepeatMasker Gene prediction : GenScan, FGENESH Other region : tRNAScan-SE, CpG-island Regulatory region : TESS BLAST (dbEST, other genome, known genes)

39 Gene modeling

40 Genome Browser Ensembl UCSC Genome browser AceDB Apollo GAVI

41 Apollo Genome browser & annotation tool Input data XML : GAME, Chado Ensembl : GFF, direct MySQL connection GenBank, EMBL Analysis result : BLAST, sim4, blat, FgenesH, Genscan, tRNAScan-SE http://www.fruitfly.org/annot/apollo/

42 GAVI : Genome Ajax Viewer Insilicogen’s web service Manual addition your feature Zoom in/out, move left/right Analysis result import : Genscan, RepeatMasker

43 실습 Pairwise alignment : bl2seq BLAST searching to your data : blastall Multiple alignment for interesting protein : ClustalW Phylogenetic tree drawing : Phylip Genome annotation : Apollo, GAVI


Download ppt "각종 생물정보 분석도구 의 실무적 활용 및 실습 김형용 개발팀 Insilicogen, Inc."

Similar presentations


Ads by Google