Presentation is loading. Please wait.

Presentation is loading. Please wait.

STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.

Similar presentations


Presentation on theme: "STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu."— Presentation transcript:

1 STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu

2 The Protein Sequence and Structure Wave 1955: Sanger sequenced bovine insulin 1970: Smith-Waterman algorithm 1973: PDB 1990: BLAST 1994: BLOCKS database 1994-: CASP 1997-: Proteomics STAT1152

3 3 The Microarray Wave Microarray contains hundreds to millions of tiny probes Simultaneously detect how much each gene is expressed

4 STAT1154 ALL vs AML Golub et al, Science 1999.

5 STAT1155 ALL vs AML

6 “Microarrays” Today Infer the expression value of all the genes from 1000 probes High throughput drug screen STAT1156

7 The DNA Sequencing Wave STAT1157 1953: DNA structure 1972: Recombinant DNA 1977: Sanger sequencing 1985: PCR 1988: NCBI 1990: BLAST

8 Sequencing in the 1970s STAT1158

9 9 The Human Genome Race Human Genome Project: 1990-2003 –Originally 1990-2005 –Boosted by technology improvement and automation –Competition from Celera

10 STAT11510 Human Genome Sequencing Clone-by-clone and whole-genome shotgun

11 STAT11511 The Human Genome Race Human Genome Project: 1990-2003 –Originally 1990-2005 –Boosted by technology improvement and automation –Competition from Celera Informatics essential for both the public and private sequencing efforts –Sequence assembly and gene prediction –Working draft finished simultaneously spring 2000

12 Sequencing in 2001

13 Sequencing in 2007

14 Sequencing Today Personal genome sequencing HiSeq X –900GB data / flow cell in < 3 days, 10 * 30X human genomes, at ~$1500 / sample STAT11514

15 Personalized Disease Susceptibility Test and Treatment STAT11515 Break

16 Big Data Challenges STAT11516

17 All biology is becoming computational, much the same way it has became molecular … Otherwise “low input, high throughput and no output science” --- Sydney Brenner 2002 Nobel Prize

18 Bioinformatics and Computational Biology Interdisciplinary –Statistics, Biology, Computer Science Applied –From freshman to postdocs –Useful training for many –The more you practice, the better you get Moves with technology development STAT11518

19 Is This Class for me? Computer: –R and Python Biology: –Molecular biology, genomics Statistics: –Hypothesis testing, distributions, intuition STAT11519

20 Class Information Course website: –https://canvas.harvard.edu/courses/10740 –Video recording, slides, reading online –Office hours, auditing –Background: CS, Stats, Biology Roughly 6 modules (HW each) –Transcriptomes (microarrays and RNA-seq) –Gene regulation (transcriptional & epigenetic regulation) –Human genetics and disease (GWAS / cancer) STAT11520

21 Class Information Teaching Fellows Zhirui HuZack McCaw Labs: –Wed 6 – 8pm, Science Center B09 –Thur 6 – 8pm, HCSPH HSPH Kresge LL6 –Next Wed: Odyssey account and LINUX tutorial! STAT11521

22 HW and Grading Discussion on Canvas by HW Submission on Canvas by HW HW: 6 * 15 (STAT115) or 6 * 20 (graduate) Quiz for each module: 6 Final exams 20 Class participation: 5 (extra) Algorithm videos: 5 (extra) Late days STAT11522 Break

23 Gene Expression Microarrays

24 24 Expression Microarrays Grow cells at certain condition, collect mRNA population, and label them Microarray has high density (thousands to millions) sequence specific probes with known location for each gene/RNA Sample hybridized to microarray probes by DNA (A-T, G-C) base pairing, wash non- specific binding Measure sample mRNA value by checking labeled signals at each probe location

25 25 Affymetrix GeneChip Arrays

26 26 Labeled Samples Hybridize to DNA Probes on GeneChip

27 27 Shining Laser Light Causes Tagged Fragments to Glow

28 28 Perfect Match (PM) vs MisMatch (MM) (control for cross hybridization)

29 NimbleGen Arrays 29

30 Agilent Arrays 30

31 Microarrays Array comparison: –# probes / array, # probes / gene, probe length –Flexibility vs data reuse Why do we bother learning about microarrays now? –RNA-seq is probably more cost effective now –The amount of useful public data –The data analysis techniques STAT11531


Download ppt "STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu."

Similar presentations


Ads by Google