Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS5263 Bioinformatics Lecture 1: Introduction Outline Administravia What is bioinformatics Why bioinformatics Topics in bioinformatics What you will.

Similar presentations


Presentation on theme: "CS5263 Bioinformatics Lecture 1: Introduction Outline Administravia What is bioinformatics Why bioinformatics Topics in bioinformatics What you will."— Presentation transcript:

1

2 CS5263 Bioinformatics Lecture 1: Introduction

3 Outline Administravia What is bioinformatics Why bioinformatics Topics in bioinformatics What you will & will not learn Introduction to molecular biology

4 Student info Your name Email Enrollment status Academic background Interests

5 Course Info Instructor: Jianhua Ruan Office: S.B. 4.01.48 Phone: 458-6819 Email: jruan@cs.utsa.edu Office hours: Tues 6:30-7:30, Wed 3-4pm Web: http://www.cs.utsa.edu/~jruan/teaching/cs 5263_fall_2007/ http://www.cs.utsa.edu/~jruan/teaching/cs 5263_fall_2007/

6 Course description A survey of algorithms and methods in bioinformatics, approached from a computational viewpoint. Discussions balanced between algorithmic analyses and biological applications Prerequisite: –Knowledge in algorithms and data structure –Programming experience –Basic understanding of statistics and probability –Appetite to learn some biology

7 Textbooks Required: –An Introduction to Bioinformatics Algorithms by Jones and Pevzner Recommended: –Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids by Durbin, Eddy, Krogh and Mitchison Additional resources –See course website

8 Grading Attendance: 10% –At most 2 classes missed without affecting grade Homeworks: 50% –No late submission accepted –Read the collaboration policy! Final project and presentation: 40%

9 What is bioinformatics National Institutes of Health (NIH): –Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.

10 What is bioinformatics National Center for Biotechnology Information (NCBI): –the field of science in which biology, computer science, and information technology merge to form a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.

11 What is bioinformatics Wikipedia –Bioinformatics refers to the creation and advancement of algorithms, computational and statistical techniques, and theory to solve formal and practical problems posed by or inspired from the management and analysis of biological data.

12 Why bioinformatics Modern biology generates huge amount of data –Human genome sequence has 3 billion bases Complex relationships among different types of data –Challenges to integrate and analyze data Algorithmic challenges –Biologists trained to programming are probably not sufficient Tremendous needs in both academic and industry –Job opportunities You get the chance to learn something different

13 Some examples of central role of CS in bioinformatics

14 1. Genome sequencing AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides ~500 nucleotides

15 AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides Computational Fragment Assembly Introduced ~1980 1995: assemble up to 1,000,000 long DNA pieces 2000: assemble whole human genome A big puzzle ~60 million pieces 1. Genome sequencing

16 Where are the genes? 2. Gene Finding In humans: ~22,000 genes ~1.5% of human DNA

17 Start codon ATG 5’ 3’ Exon 1 Exon 2 Exon 3 Intron 1Intron 2 Stop codon TAG/TGA/TAA Splice sites 2. Gene Finding Hidden Markov Models (Well studied for many years in speech recognition)

18 3. Protein Folding The amino-acid sequence of a protein determines the 3D fold The 3D fold of a protein determines its function Can we predict 3D fold of a protein given its amino-acid sequence? –Holy grail of compbio—40 years old problem –Molecular dynamics, computational geometry, machine learning, robotics

19 4. Sequence Comparison—Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- | | | | | | | | | | | | | x | | | | | | | | | | | TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Sequence Alignment Introduced ~1970 BLAST: 1990, most cited paper in history Still very active area of research query DB BLAST Efficient string matching algorithms Fast database index techniques

20 Sequence comparison is key to Finding genes Determining function Uncovering the evolutionary processes Sequence conservation implies function

21 5. Evolution More than 200 complete genomes have been sequenced

22 5. Evolution

23 6. Microarray analysis Clinical prediction of Leukemia type 2 types –Acute lymphoid (ALL) –Acute myeloid (AML) Different treatment & outcomes Predict type before treatment? Bone marrow samples: ALL vs AML Measure amount of each gene

24 Some goals of biology for the next 50 years List all molecular parts that build an organism –Genes, proteins, other functional parts Understand the function of each part Understand how parts interact Study how function has evolved across all species Find genetic defects that cause diseases Design drugs rationally Sequence the genome of every human, use it for personalized medicine Bioinformatics is an essential component for all the goals above

25 Major conferences ISMB (Summer every year) RECOMB (and its satellites) (Spring every year) PSB (Jan every year, Hawaii) ECCB (Europe) CSB (July every year, Stanford) Conferences in computer science –ICDM (conference on data mining) –ICML (conference on machine learning) –AAAI (conference on AI)

26 Major journals Bioinformatics Journal of Computational Biology PLoS Computational Biology BMC Bioinformatics Genome Biology Genome Research Nucleic Acids Research IEEE Trans on Computational Biology Science, Nature, PNAS, Cell, Nature Genetics, Nature Biotech, …

27 Major Bioinfo research topics

28 Covered topics Sequence analysis –Alignment –Motif finding –Pattern matching –Phylogenetic tree Sequence-based predictions –Gene components –RNA structure Functional Genomics –Microarray analysis –Biological networks

29 What you will learn? Basic concepts in molecular biology and genetics Selected topics in bioinformatics and challenges Algorithms: –DP, graph, string algorithms –Statistical learning algorithms: HMM, EM, Gibbs sampling –Data mining: clustering / classification

30 What you will not learn? Existing tools / databases Design / perform biological experiments Protein structure prediction (commonly avoided by most bioinfo researchers…) Building bioinformatics software tools (GUI, database, Perl / Python, …)

31 Goals Basis of sequence analysis and other computational biology algorithms Overall picture about the field Read / criticize research articles Think about the sub-field that best suits your background to explore Communicate and exchange ideas with (computational) biologists

32 Computer Scientists vs Biologists (courtesy Serafim Batzoglou, Stanford)

33 Biologists vs computer scientists (almost) Everything is true or false in computer science (almost) Nothing is ever true or false in Biology

34 Biologists vs computer scientists Biologists seek to understand the complicated, messy natural world Computer scientists strive to build their own clean and organized virtual world

35 Biologists vs computer scientists Computer scientists are obsessed with being the first to invent or prove something Biologists are obsessed with being the first to discover something

36 Biologists vs computer scientists Biologists are comfortable with the idea that all data have errors, and every rule has exceptions Computer scientists are not

37 Biologists vs computer scientists Computer scientists get high-paid jobs after graduation Biologists typically have to complete one or more 5-year post-docs...

38 Molecular biology 101 Cell DNA, RNA, Protein Genome, chromosome, gene Central dogma

39 Life Categories –Prokaryotes (e.g. bacteria) Unicellular No nucleus –Eukaryotes (e.g. fungi, plant, animal) Unicellular or multicellular Has nucleus The most important distinction among groups of organism

40 Prokaryote vs Eukaryote Eukaryote has many membrane-bounded compartment inside the cell –Different biological processes occur at different cellular location

41 Chemical contents of cell Small molecules –Sugar –Ions (Na +, Ka +, Ca 2+, Cl -,…) –…–… Macromolecules (polymers): –DNA –RNA –Protein –…–… Polymers: “strings” made by linking monomers from a specified set (alphabet)

42 PolymerMonomer DNADeoxyribonucleotides RNARibonucleotides ProteinAmino Acid

43 DNA DNA: forms the genetic material of all living organisms –Can be replicated and passed to descendents –Contains information to produce proteins To computer scientists, DNA is a string made from alphabet {A, C, G, T} –e.g. ACAGAACGTAGTGCCGTGAGCG Each letter is called a base –A deoxyribonucleotides Length varies. From hundreds to billions

44 RNA Historically thought to be information carrier only –DNA => RNA => Protein –New roles have been found for them To computer scientists, RNA is a string made from alphabet {A, C, G, U} –e.g. ACAGAACGUAGUGCCGUGAGCG Each letter is called a base –A ribonucleotides Length varies. From tens to thousands

45 Protein Protein: the actual “worker” for almost all processes in the cell –Enzymes: speed up reactions –Signaling: information transduction –Structural support –Production of other macromolecules –Transport To computer scientists, protein is a string built from 20 letters –E.g. MGDVEKGKKIFIMKCSQCHTVEKGGKHKTGP Each letter is called an amino acid Lengths: from tens to thousands

46 Central dogma of molecular biology

47 DNA/RNA zoom-in Commonly referred to as Nucleic Acid DNA: Deoxyribonucleic acid RNA: Ribonucleic acid Found mainly in the nucleus of a cell (hence “nucleic”) Contain phosphoric acid as a component (hence “acid”) They are made up of nucleotides

48 Nucleotides A nucleotide has 3 components –Sugar (ribose in RNA, deoxyribose in DNA) –Phosphoric acid –Nitrogen base Adenine (A) Guanine (G) Cytosine (C) Thymine (T) or Uracil (U)

49 Monomers of RNA A ribonucleotide has 3 components –Sugar - Ribose –Phosphate group –Nitrogen base Adenine (A) Guanine (G) Cytosine (C) Uracil (U)

50 Monomers of DNA A deoxyribonucleotide has 3 components –Sugar - Deoxyribose –Phosphoric acid –Nitrogen base Adenine (A) Guanine (G) Cytosine (C) Thymine (T)

51

52 Polymerization: Nucleotides => nucleic acids Phosphate Sugar Nitrogen Base Phosphate Sugar Nitrogen Base Phosphate Sugar Nitrogen Base

53 G A G T C A G C 5’-AGCGACTG-3’ AGCGACTG Phosphate Sugar Base 1 2 3 4 5 Many biological processes go from 5’ to 3’ e.g. DNA replication, transcription, etc. 5’ 3’ DNA

54 G A G U C A G U 5’-AGUGACUG-3’ AGUGACUG Phosphate Sugar Base 1 2 3 4 5 Many biological processes go from 5’ to 3’ e.g. transcription. 5’ 3’ RNA

55 T C A C T G G C G A G T C A G C Base-pair: A = T G = C 5’ 3’ 5’-AGCGACTG-3’ 3’-TCGCTGAC-5’ AGCGACTG TCGCTGAC AGCGACTG Forward (+) strand Backward (-) strand One strand is said to be reverse- complementary to the other

56 Reverse-complementary sequences 5’-ACGTTACAGTA-3’ The reverse complement is: 3’-TGCAATGTCAT-5’ => 5’-TACTGTAACGT-3’ Or simply written as TACTGTAACGT

57 DNA double helix

58 Orientation of the double helix Double helix is anti-parallel –5’ end of each strand at 3’ end of the other –5’ to 3’ motion in one strand is 3’ to 5’ in the other Double helix has no orientation –Biology has no “forward” and “reverse” strand –Relative to any single strand, there is a “reverse complement” or “reverse strand” –Information can be encoded by either strand or both strands 5’TTTTACAGGACCATG 3’ 3’AAAATGTCCTGGTAC 5’

59 RNA Secondary structures RNAs are normally single-stranded Can form complex structure by self-base- pairing A=U, C=G


Download ppt "CS5263 Bioinformatics Lecture 1: Introduction Outline Administravia What is bioinformatics Why bioinformatics Topics in bioinformatics What you will."

Similar presentations


Ads by Google