Presentation is loading. Please wait.

Presentation is loading. Please wait.

Welcome to CS374 Algorithms in Biology

Similar presentations


Presentation on theme: "Welcome to CS374 Algorithms in Biology"— Presentation transcript:

1 Welcome to CS374 Algorithms in Biology

2 Overview Administrivia Molecular Biology and Computation
DNA, proteins, cells, evolution Some examples of CS in biology Computer Scientists vs Biologists

3 CS374: Algorithms in Biology cs374.stanford.edu
Attendance At most 2 classes missed without affecting grade Lectures Most important requirement Select an available topic and a day, send to Serafim Read papers, meet with Serafim (1hr) 1-2 weeks before lecture Schedule long (2 hr) meeting the day before lecture Slides due at noon before lecture

4 CS374: Algorithms in Biology cs374.stanford.edu
Scribing Please sign up on a first-come first-serve basis Due 1 week after lecture, edited & distributed 2 weeks after lecture Relly will help you edit Summaries Select 1 lecture among first 10, 1 lecture among rest Find one relevant paper Write a 1-page summary of the paper Paper reference Abstract Discussion Ask Relly for questions/feedback Have fun!

5 Structure of DNA double helix
Phosphate Group Sugar Nitrogenous Base A, C, G, T T C A G DNA Physicist Ornithologist

6 DNA to RNA, and genes G A U C RNA: carries the “message” for “translating”, or “expressing” one gene DNA, ~3x109 long in humans Contains ~ 22,000 genes transcription translation folding

7 Structure of proteins Composed of a chain of amino acids. R |
H2N--C--COOH H 20 possible groups Sequence of amino acids folds to form a complex 3-D structure. The structure of a protein is intimately connected to its function.

8 All living organisms are composed of cells

9 Genetics in the 20th Century

10 21st Century Technology drives an information revolution
AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTCTCTCTAGTCTACGTGCTGTATGCGTTAGTGTCGTCGTCTAGTAGTCGCGATGCTCTGATGTTAGAGGATGCACGATGCTGCTGCTACTAGCGTGCTGCTGCGATGTAGCTGTCGTACGTGTAGTGTGCTGTAAGTCGAGTGTAGCTGGCGATGTATCGTGGT

11 DNA to RNA, and genes RNA: carries the “message” for “translating”, or “expressing” one gene A DNA, ~3x109 long in humans Contains ~ 22,000 genes G C G transcription translation A folding 1 C U G

12 Some examples of central role of CS 1. Sequencing
AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT 3x109 nucleotides ~500 nucleotides

13 Some examples of central role of CS 1. Sequencing
AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT 3x109 nucleotides A big puzzle ~60 million pieces Computational Fragment Assembly Introduced ~1980 1995: assemble up to 1,000,000 long DNA pieces 2000: assemble whole human genome

14 Complete genomes today
More than 300 complete genomes have been sequenced

15 DNA to RNA, and genes RNA: carries the “message” for “translating”, or “expressing” one gene A DNA, ~3x109 long in humans Contains ~ 22,000 genes G C G transcription translation 2 A folding 1 C U G

16 2. Gene Finding Where are the genes? In humans: ~22,000 genes
~1.5% of human DNA

17 2. Gene Finding Exon 1 Exon 2 Exon 3 Intron 1 Intron 2 Splice sites
Start codon ATG 5’ 3’ Exon 1 Exon 2 Exon 3 Intron 1 Intron 2 Stop codon TAG/TGA/TAA Splice sites The problem of predicting genes means to give coordinates for the exon boundaries. The first kind of information that prediction algorithms use, is the regular structure of a gene. Every gene starts with an ATG codon, and then exons alternate with introns; at the exon-intron boundaries, the splice sites, there are short words that are approximately preserved.

18 atg caggtg ggtgag cagatg ggtgag cagttg ggtgag caggcc ggtgag tga
Topic in CS374: Finding genes by comparing genomes of different species atg caggtg ggtgag cagatg ggtgag cagttg ggtgag caggcc ggtgag tga

19 2 3 1 DNA to RNA, and genes easy
RNA: carries the “message” for “translating”, or “expressing” one gene A DNA, ~3x109 long in humans Contains ~ 22,000 genes G C G transcription translation 2 easy A 3 folding 1 C U G

20 3. Protein Folding The amino-acid sequence of a protein determines the 3D fold The 3D fold of a protein determines its function Can we predict 3D fold of a protein given its amino-acid sequence? Holy grail of compbio—35 years old problem Molecular dynamics, robotics, machine learning, computational geometry Topics on Proteins in CS374 Protein Structure Finding the -helix motif Protein Domains Molecular Dynamics & Drug Targets 2. Protein Classification Machine Learning Graph Flow techniques Protein Comparison Latest multiple alignment tools

21 More than 200 complete genomes have been sequenced

22 Evolution

23 Evolution at the DNA level
next generation OK OK OK X X Still OK?

24 4. Sequence Comparison Sequence conservation implies function
Sequence comparison is key to Finding genes Determining function Uncovering the evolutionary processes

25 Sequence Comparison—Alignment
AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- | | | | | | | | | | | | | x | | | | | | | | | | | TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC query DB BLAST Sequence Alignment Introduced ~1970 BLAST: 1990, most cited paper in history Still very active area of research

26 Comparison of Human, Mouse, and Rat

27 More DNA is coming… Topics on Genomics in CS374
Indexing Large Databases Newest encoding techniques 2. Genomic Rearrangements Finding the order of shuffles between two genomes Repeat Detection Identifying selfish sequences that replicate across DNA Gene Finding Finding genes by comparing DNA of different mammals Finding conserved elements How do we quantify how much evolution “likes” a given region?

28 5. Clustering of Microarrays Clinical prediction of Leukemia type
2 types Acute lymphoid (ALL) Acute myeloid (AML) Different treatment & outcomes Predict type before treatment? Bone marrow samples: ALL vs AML Measure amount of each gene

29 6. Protein networks Fresh research area
Topics on Protein Networks in CS374 Integration Build networks from multiple sources 2. Alignment Compare networks across species Mathematical properties Modular, scale free Systems Biology The cell as a dynamic system 5. Graph Algorithms Fresh research area Construct networks from multiple data sources Navigate networks Compare networks across organisms Statistics Machine learning Graph algorithms Databases

30 Some goals of biology for the next 50 years
List all molecular parts that build an organism Genes, proteins, other functional parts Understand the function of each part Understand how parts interact Study how function has evolved across all species Find genetic defects that cause diseases Design drugs rationally Sequence the genome of every human, use it for personalized medicine

31 Computer Scientists vs Biologists

32 Computer scientists vs Biologists
(almost) Nothing is ever true or false in Biology Everything is true or false in computer science

33 Computer scientists vs Biologists
Biologists strive to understand the complicated, messy natural world Computer scientists seek to build their own clean and organized virtual worlds

34 Computer scientists vs Biologists
Biologists are obsessed with being the first to discover something Computer scientists are obsessed with being the first to invent or prove something

35 Computer scientists vs Biologists
Biologists are comfortable with the idea that all data have errors Computer scientists are not

36 Computer scientists vs Biologists
Computer scientists get high-paid jobs after graduation Biologists typically have to complete one or more 5-year post-docs...

37 Computer Science is to Biology what Mathematics is to Physics
“Antedisciplinary” Science What is computational biology?


Download ppt "Welcome to CS374 Algorithms in Biology"

Similar presentations


Ads by Google