Presentation is loading. Please wait.

Presentation is loading. Please wait.

Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou

Similar presentations


Presentation on theme: "Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou"— Presentation transcript:

1 Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou http://159.178.28.30/GMS6014/home.htm

2 Application of Bioinformatics in Genetic Research Time and location: Monday: 12:00-12:50 in CGRC291. Wednesday: 12:00-12:50 or 11:40-12:30, CGRC-291 Fridays (11/18. 12/2): 12:00-12:50 in CGRC-391 or 11:40-12:30 in CGRC291.

3 Evaluation 50% classroom participation 50% homework

4 History of bioinformatics – sequence analysis Sequence comparison Similarity search Phylogenetic analysis Structure predication Gene prediction

5 Bioinformatics in the post genome era Information Representation. - many new types of data, such as Function, Location, Interaction, Regulatory pathway, Expression profile, etc. needs to be recorded Data Management - Infrastructure for inputting, managing, access and retrieval of relevant information in a “sea of databases”. Cloud computing. Systematics The opportunity provided by genome sequence and genomic / proteomic technology is matched by the challenge to bioinformatics / computational biology

6 Bioinformatics in the post genome era SNP and whole genome wide association studies. Genomic expression profiling (RNA and protein levels). Comparative genomics, Epigenomics … Individual genomes, epigenomes, transcriptomes. Regulatory pathway simulation – systems biology. $1,000 genome and … $500,000 analysis ?

7 Objectives of GMS6014 Basic skills for retrieving and storing data, using web-based applications. Ability to install and run stand alone local applications. Understanding the basis of bioinformatics applications using sequence similarity search as the example. A brief survey of available bioinformatics tools and introduction to functional genomics and systems biology.

8 Sequence Representation - nucleotide N G R C W T G Y C Y A G A C A T G C C C C G T T T G T For complete list, see table 2.1, Mount 2 nd Ed Or http://www.ncbi.nlm.nih.gov/blast/fasta.shtmlhttp://www.ncbi.nlm.nih.gov/blast/fasta.shtml

9 Sequence Representation - amino acids Q: What’s the common property of these amino acids ? 1.D, E 2.I, L, V, M, F 3.A, S, P

10 Sequence Representation - amino acids Example: Coloring based on aa property. WDLLAQILCYALRIY WRFLATVVLETLRQY WKFLAITMCKVLKQF RCLLCNKLYYLLRKV LNRLLAELYEVLCHI LRLLQQQQMVLQRQY WDLLAQILCYALRIY WRFLATVVLETLRQY WKFLAITMCKVLKQF RCLLCNKLYYLLRKV LNRLLAELYEVLCHI LRLLQQQQMVLQRQY

11 Representation of sequence – sequence file format 1.) FASTA – simple and clean > gene_name, (other info) MASASASKJHKLJLKJLDSDFSF SSDSASFSFD… Practice / DIY: retrieve sequence in Fasta format and save the file in the local computer.

12 How to store sequence files.txt format is clean and allows down stream sequence analysis.doc or.rtf allows formatting during annotation – however, extra information are inserted thus NOT suitable for computational analysis.

13 Practice – file types Using Windows Explorer (with your own computer) or IE with “C:\” in the address window. Change the “Tools  Folder Options” so that the file extensions (.xxx) are revealed. Edit the downloaded sequence file in MS Word, highlight a section of the sequence with Bold font or color and save as.doc Open the.doc file in NotePad – observe the inserted characters.

14 Practice – file types (Cont.) Load the.doc file to Webcutter using “Browse” and then “Upload sequence file”. -Notice that the “sequence” in the sequence box are nonsense characters. Clear input; Browse and then load the.txt file. Run an analysis. Always keep you sequences in.txt file for downstream analysis.

15 Representation of sequence The need to include annotations and functional information with each sequence. Structured data entry GeneBank EMBL / SwissProt Observe: The difference of data structure between SwissProt, NCBI protein, and NCBI Genes.

16 Representation of sequence The need to represent associated info with sequence Structured data entry Specialized databases  3-d Structure  Mutation / Diseases  Protein family / Protein domain  Interaction  Pathway  ….

17 Representation of sequence The need to represent associated info with sequence Structured data entry Specialized databases Complex / customized data structure - Object-oriented data representation (Mount, p44-45)

18 XML – Extensible Markup language Define highly structured data for sharing and exchange. Observe: 1.) The differences between the XML format and the GenPept format. 2.) The differences among XML, TinySeqXML, and INSDXML.

19

20 Bioinformatics / Computational biology Bioinformatics - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Computational Biology - The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems. (Working Definition of Bioinformatics and Computational Biology - July 17, 2000). NIH / BISTI

21 Genetic code Codon usage special code – mitochondria genes


Download ppt "Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou"

Similar presentations


Ads by Google