[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.

Slides:



Advertisements
Similar presentations
Lecture 2 Strachan and Read Chapter 13
Advertisements

Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
IT253: Computer Organization Lecture 6: Assembly Language and MIPS: Programming Tonga Institute of Higher Education.
Introduction to genomes & genome browsers
Doug Brutlag 2011 Sequencing the Human Genome Doug Brutlag Professor Emeritus of Biochemistry.
Tutorial 7 Genome browser. Free, open source, on-line broswer for genomes Contains ~100 genomes, from nematodes to human. Many tools that can be used.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
The Shocking Details of Genome.ucsc.edu. History of the Code Started in 1999 in C after Java proved hopelessly unportable across browsers. Early modules.
[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
1 TTh 11:00-12:15 in Clark S361 Profs: Serafim Batzoglou, Gill Bejerano TAs: George Asimenos, Cory McLean.
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
How to access genomic information using Ensembl August 2005.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
[Bejerano Fall10/11] 1 Primer, Friday 10am, Beckman B-302 Ex. 1 is coming.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Reading the Blueprint of Life
Detecting copy number variations using paired-end sequence data Nick Furlotte CS224 May 29, 2009.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Amplifying DNA. The Power of PCR View the animation at
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Short Tandem Repeats (STR) and Variable Number Tandem Repeats (VNTR)
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
CS177 Lecture 10 SNPs and Human Genetic Variation
MAPPING GENOMES – genetic, physical & cytological maps Genetic distance (in cM) 1 centimorgan = 1 map unit, corresponding to recombination frequency of.
1 Gene Therapy Gene therapy: the attempt to cure an underlying genetic problem by insertion of a correct copy of a gene. –Tantalizingly simple and profound.
Used for detection of genetic diseases, forensics, paternity, evolutionary links Based on the characteristics of mammalian DNA Eukaryotic genome 1000x.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 10:
Studijní obor Bioinformatika. LAST LECTURE SUMMARY.
[Bejerano Fall11/12] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TAs: Aaron Wenger & Jim Notwell.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Genes in human populations n Population genetics: focus on allele frequencies (the “gene pool” = all the gametes in a big pot!) n Hardy-Weinberg calculations.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Genomics and Forensics
1 DNA Polymorphisms: DNA markers a useful tool in biotechnology Any section of DNA that varies among individuals in a population, “many forms”. Examples.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
GENE SEQUENCING. INTRODUCTION CELL The cells contain the nucleus. The chromosomes are present within the nucleus.
Molecular Markers CRITFC Genetics Workshop December 8, 2015.
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E The Structure of the Genome Denaturation, Renaturation and Complexity.
Genomics Chapter 18.
Simple-Sequence Length Polymorphisms SSLPs Short tandemly repeated DNA sequences that are present in variable copy numbers at a given locus. Scattered.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Accessing and visualizing genomics data
Forensics: Using DNA to distinguish individuals Need: Rapid and reliable markers Sufficient numbers of polymorphic markers to be sure that no other person.
1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.
Simple-Sequence Length Polymorphisms
DNA Marker Lecture 10 BY Ms. Shumaila Azam
Human Cells Human genomics
Relationship between Genotype and Phenotype
CS273A Lecture 7: Neutral evolution: repetitive elements
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Basic Local Alignment Search Tool
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Introduction to Sequencing
Relationship between Genotype and Phenotype
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean

[Bejerano Aut07/08] 2 Lecture 5 UCSC Source Tree Genome Assemblies Genomic Variation Repeats

[Bejerano Aut07/08] 3 UCSC Resources: Data, Tools & Code Underlying Database (MySQL) visualize Underlying Database (MySQL) visualizesearch & download

4 History of the Code Started in 1999 in C after Java proved hopelessly unportable across browsers. Early modules include a Worm genome browser (Intronerator), and GigAssembler which produced working draft of human genome. In 2001 a few other grad students started working on the code. In 2002 hired staff to help with Genome Browser Currently project employs ~20 full time people. [Jim Kent, 2004]

5 Lagging Edge Software C language - compilers still available! CGI Scripts - portable if not pretty. SQL database - at least MySQL is free.

6 Problems with C Missing booleans and strings. No real objects. Must free things

7 Advantages of C Very fast at runtime. Very portable. Language is simple. No tangled inheritance hierarchy. Excellent free tools are available. Libraries and conventions can compensate for language weaknesses.

8 Coping with Missing Data Types in C #define boolean int Fixing lack of real string type much harder –lineFile/common modules and autoSql code generator make parsing files relatively painless –dyString module not a horrible string ‘class’

9 Object Oriented Programming in C Build objects around structures. Make families of functions with names that start with the structure name, and that take the structure as the first argument. Implement polymorphism/virtual functions with function pointers in structure. Inheritance is still difficult. Perhaps this is not such a bad thing.

10 struct dnaSeq /* A dna sequence in one-letter-per-base format. */ { struct dnaSeq *next; /* Next in list. */ char *name; /* Sequence name. */ char *dna; /* a’s c’s g’s and t’s. Null terminated */ int size; /* Number of bases. */ }; struct dnaSeq *dnaSeqFromString(char *string); /* Convert string containing sequence and possibly * white space and numbers to a dnaSeq. */ void dnaSeqFree(struct dnaSeq **pSeq); /* Free dnaSeq and set pointer to NULL. */ void dnaSeqFreeList(struct dnaSeq **pList); /* Free list of dnaSeq’s. */

[Bejerano Aut07/08] 11 UCSC Code Tree Summary To conclude: Source tree is installed for you. All programs under utils/ should work. Code under hg/ requires the MySQL DB (or at least it thinks it does). Very useful resource: If in trouble, use the “contact us” link to search Q&A. Then come ask us/shoot UCSC helpdesk an .

[Bejerano Aut07/08] 12 Lights, Action, Rolling 2001 HGCCelera

[Bejerano Aut07/08] 13 The Sequencing of the Human Genome Lander: So the genes from which most of the work was done come from Buffalo, New York. Krulwich: From Buffalo, New York? Lander: Yes. It's mostly a guy from Buffalo and a woman from Buffalo. But that's because the laboratory that was making--... Lander: The laboratory that prepared the large DNA libraries that were used was a laboratory in Buffalo. And so they put an ad in the Buffalo newspapers, and they got random volunteers from Buffalo, and they got about 20 of them. They then erased all the labels and chose at random this sample and that sample and that sample. So nobody knows who they are. We don't have any links back to who they are, and that's deliberate. Eric Lander, NOVA interview, 2001

[Bejerano Aut07/08] 14 Meet Your Genome [Human Molecular Genetics, 3rd Edition]

[Bejerano Aut07/08] 15 Heterochromatin as an example

[Bejerano Aut07/08] 16 The Human Genome is “Finished” [HGC, 2004]

[Bejerano Aut07/08] 17 “Unfinished Business in a Finished Genome” 341 remaining gaps: 33 Heterochromatic, 35 Euchromatic Boundaries, 273 Euchromatic Interior regions. Centromeric, Telomeric gaps Arcocentric, rDNA clusters: chr. 13,14,15,21,22

[Bejerano Aut07/08] 18 Assembly Gap Types

[Bejerano Aut07/08] 19 Mind the Gap

[Bejerano Aut07/08] 20 Fluorescent in situ hybridization (FISH) [Eichler et al, 2004]

[Bejerano Aut07/08] 21 Euchromatic Interior Gap, Unplaced Sequence ? 2

[Bejerano Aut07/08] 22 The _random Chromosomes

[Bejerano Aut07/08] 23 hg18.chr1_random... Some genomes are in much worse shape. Some have _random chroms that are (sadly) called some other name (but look the same). _randoms are a great place to meet contaminants: pieces of local technician DNA sequence from the vector used in the protocol the odd tube from another genome project being sequenced at the same genome center.

[Bejerano Aut07/08] 24 Mistaking (Haplotype) Variation for Segmental Dups

[Bejerano Aut07/08] 25 Wave of the Future [Shendure et al, 2004]

[Bejerano Aut07/08] 26 SNPs A Single Nucleotide Polymorphism is a source of variance in a genome. A SNP ("snip") is a single base mutation in DNA. SNPs are the most simple form and most common source of genetic polymorphism in the human genome (90% of all human DNA polymorphisms). not any more... [Hegele, 2004]

[Bejerano Aut07/08] 27 Larger Scale DNA Mutation We knew this was happening to DNA, at all length scales. We did not know how frequent, nor how prevalent in the human population these changes are... you are here

[Bejerano Aut07/08] 28 Copy Number Variation (CNVs) so... how representative is the reference genome? [Redon et al, 2006]

[Bejerano Aut07/08] 29 J.C. Venter Goes to Buffalo serious representation problem [Khaja et al, 2006]

[Bejerano Aut07/08] 30 Large Scale Variation & Disease [Lupski, 2007]

[Bejerano Aut07/08] 31 Don’t Panic G E N O M E

[Bejerano Aut07/08] 32 Meanwhile, back in Your Genome

[Bejerano Aut07/08] 33 [Adapted from Lunter]

[Bejerano Aut07/08] 34

[Bejerano Aut07/08] 35

[Bejerano Aut07/08] 36

[Bejerano Aut07/08] 37

[Bejerano Aut07/08] 38

[Bejerano Aut07/08] 39

[Bejerano Aut07/08] 40

[Bejerano Aut07/08] 41

[Bejerano Aut07/08] 42 Inferring Phylogeny Using Repeats [Nishihara et al, 2006]

[Bejerano Aut07/08] 43 Simple Repeats Every possible motif of mono-, di, tri- and tetranucleotide repeats is vastly overrepresented in the human genome. These are called microsatellites, Longer repeating units are called minisatellites, The real long ones are called satellites. Highly polymorphic in the human population. Highly heterozygous in a single individual. As a result microsatellites are used in paternity testing, forensics, and the inference of demographic processes. There is no clear definition of how many repetitions make a simple repeat, nor how imperfect the different copies can be. Highly variable between genomes: e.g., using the same search criteria the mouse & rat genomes have 2-3 times more microsatellites than the human genome. They’re also longer in mouse & rat.

[Bejerano Aut07/08] 44

[Bejerano Aut07/08] 45

[Bejerano Aut07/08] 46

[Bejerano Aut07/08] 47

[Bejerano Aut07/08] 48