Of Sea Urchins, Birds and Men Algorithmic Functions of Computational Biology – Course 1 Professor Istrail.

Slides:



Advertisements
Similar presentations
February 20, Who is Rube Goldberg? Reuben Lucius Goldberg Born July 4, 1883 Was an engineer for 6 months designing sewers Left engineering to become.
Advertisements

Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail.
General Science Chapter 6
Using genetics to study human history and natural selection David Reich Harvard Medical School Depatment of Genetics Broad Institute.
Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.
The role of variation in finding functional genetic elements Andy Clark – Cornell Dave Begun – UC Davis.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
A Data Compression Problem The Minimum Informative Subset.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Computational Molecular Biology Biochem 218 – BioMedical Informatics Simple Nucleotide.
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Notes to the teacher and lesson plan Have the students create an 8 panel foldable for notes on simple machines.
Take a Look at a “Rube Goldberg”
Rube Goldberg Machines. As you raise spoon of soup (A) to your mouth it pulls string (B), thereby jerking ladle (C) which throws cracker (D) past parrot.
A Rube Goldberg Invention
Engineering Design Algorithm 1)Identify the problem or design objective 2)Define the goals and identify the constraints 3)Research and gather information.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
CIT 343 Concord Wednesday, August 11, Introduction To get student files go to
Combining Systems.  Subsystem: a secondary or subordinate system that is part of a larger system Ex. The brake system in a car.
How to Keep Show Windows Clean. Passing man (A) slips on banana peel (B) causing him to fall on rake (C). As handle of rake rises it throws horseshoe.
Monday, April 1 st Entry Task Think of one thing that could be done to make your mouse trap car go a further distance than it already goes Think of one.
Machines Engage What can you do to lift your teacher? Based upon what you already know … A machine is …
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
 Recall: A line of best fit is drawn through a set of data points to best represent a linear relationship between 2 variables  Line of best fit is aka.
 A simple machine has few or no moving parts.  Simple machines make work easier.  With or without a simple machine the work is the same.  Also when.
CS305j Introduction to Computing Structured Programming Case Study 1 Topic 3 Structured Programming Case Study “ Ugly programs are like ugly suspension.
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNPs and the Human Genome Prof. Sorin Istrail. A SNP is a position in a genome at which two or more different bases occur in the population, each with.
Gene Hunting: Linkage and Association
Informative SNP Selection Based on Multiple Linear Regression
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Finding shared laws: Ladders Philosophy 152 Philosophy of Social Science Week 10 Winter 2011.
Goal 1 : to use different forms of energy
Homework Conservation of energy
Wednesday PS Notes 11-6 Homework Rube Goldberg machines Conservation of energy.
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
The Haplotype Blocks Problems Wu Ling-Yun
Rube Goldberg Presentation. How to Keep a Window Clean. Passing man (A) slips on banana peel (B) causing him to fall on rake (C). As handle of rake.
Key Points : 1) Double Helix 2) Sugar-phosphate backbone 3) Nucleotide Rungs 4) Hydrogen bonds.
Rube Goldberg Presentation.
Of Sea Urchins, Birds and Men
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
SNP Haplotype Block Partition and tagSNP Finding
Rube Goldberg Project.
Simple Machines A machine is a device that makes work easier by changing the size or direction of the force. Machines make work easier because they.
Structure of proximal and distant regulatory elements in the human genome Ivan Ovcharenko Computational Biology Branch National Center for Biotechnology.
Patterns of Linkage Disequilibrium in the Human Genome
Science 421: Physics.
Caroline Durrant, Krina T. Zondervan, Lon R
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Approximation Algorithms for the Selection of Robust Tag SNPs
Presentation transcript:

Of Sea Urchins, Birds and Men Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

Darwin’s Finches 2 and Coco

The Father of All Dot Plots Algorithmic Functions of Computational Biology – Course 1 Professor Istrail The Human Genome

The Synteny Problem  Between distant species can reveal function Conservation reveals selective pressure  Between near species Conservation reveals evolutionary history  Between similar or the same species Recent events in subpopulations Phenotypic differences Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

Matching, Chaining, Extension Extension Phase Chaining Phase Algorithmic Functions of Computational Biology – Course 1 Professor Istrail Matching Phase

Dot Plots 101  a,b,c,d stand for letters A,B,C,D for words  Where letters match, put a dot  Where words match, put a line (words can be rc-ed) Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

Dot Plots 101  When words line up  Reversed  Misplaced  Something gained (relative to horizontal)  Something lost (relative to horizontal) Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

Some large reversals in GP Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

NCBI has more of the centromere than anyone else (or is that N’s?) Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

Many reversals in GP, a piece of the end is re-ordered to the middle, celera assemblies boringly good. Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

Again everyone misses the first 10MB (or are those N’s) of NCBI31 Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

Rube Goldberg’s Innovation GENOMIC REGULATORY SYSTEMS Mixed character of the problem : continuous mathematics discrete mathematics

Open window (A) and fly kite (B). String (C) lifts small door (D) allowing moths (E) to escape and eat red flannel shirt (F). As weight of shirt becomes less, shoe (G) steps on switch (H)which heats electric iron (I) and burns hole in pants (J). Smoke (K) enters hole in tree (L), smoking out opossum (M) which jumps into basket (N),pulling rope (O) and lifting cage (P), allowing woodpecker (Q) to chew wood from pencil (R), exposing lead. Emergency knife (S) is always handy in case opossum or the woodpecker gets sick and can't work. Rube Goldberg ’ s Pencil Sharpener invention

A Tale of Two Networks Sea Urchin Drosophila Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

A Proposal for Nobel Prize “Programs built into the DNA of every animal.” Eric H. Davidson Genomic Regulatory Systems One gene, 30 years of study, 300 docs and postdocs

The Dogma Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

Genomic Regulatory Regions Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

TF Binding Site Complexity Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

Genome Complexity 1 Billion DNA bases 20,000 Genes

cis-Regulatory Modules Complexity 200,000 cis-Modules Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

The DNA program that regulates the expression of endo16 in sea urchin  THE FIRST GENE

 THE FIRST NETWORK

The View from the Genome Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

The View from the Nucleus Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

Building Protein-DNA Assemblies  Inter-cismodule linkage  Insulation  Communication  cismodule  DNA  Cooperativity  Linear-amp  Gates  Potentiality Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

The Building Blocks  Protein  Free Energy  DNA  Protein-DNA Binding (free energy) Free energy is the “GLUE” Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

Information Processing Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

 Boolean Circuit  Synchronous input and output  Completely defined gates 0 Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

 Synchronous input and output  Asynchronous input and output  Completely defined gates  Incompletely defined gates  Boolean Circuit  Boolinear Circuit 00  1.1

OR AND NOT OR 1 IF (x1 = 1 AND x2= 1) THEN ….. GTAGGATTAAG …... CATCCTAATTC ……. GTATCTAGAAG …….

 Web page :  edu/~chyuh/cathy- mirsky-info.html Caltech, Davidson Lab October 2004

Introduction SNPs, HAPLOTYPES

A SNP is a position in a genome at which two or more different bases occur in the population, each with a frequency >1%. GATTTAGATCGCGATAGAG GATTTAGATCTCGATAGAG  The most abundant type of polymorphism The two alleles at the site are G and T Single Nucleotide Polymorphism (SNP)

tttctccatttgtcgtgacacctttgttgacaccttcatttctgcattctcaattctatttcactggtctatggcagagaacacaaaatatggccagtggc ctaaatccagcctactaccttttttttttttttgtaacattttactaacatagccattcccatgtgtttccatgtgtctgggctgcttttgcactctaatggcag agttaagaaattgtagcagagaccacaatgcctcaaatatttactctacagccctttataaaaacagtgtgccaactcctgatttatgaacttatc attatgtcaataccatactgtctttattactgtagttttataagtcatgacatcagataatgtaaatcctccaactttgtttttaatcaaaagtgttttggcc atcctagatatactttgtattgccacataaatttgaagatcagcctgtcagtgtctacaaaatagcatgctaggattttgatagggattgtgtagaat ctatagattaattagaggagaatgactatcttgacaatactgctgcccctctgtattcgtgggggattggttccacaacaacacccaccccccac tcggcaacccctgaaacccccacatcccccagcttttttcccctgctaccaaaatccatggatgctcaagtccatataaaatgccatactatttgc atataacctctgcaatcctcccctatagtttagatcatctctagattacttataatactaataaaatctaaatgctatgtaaatagttgctatactgtgtt gagggttttttgttttgttttgttttatttgtttgtttgtttgtattttaagagatggtgtcttgctttgttgcccaggctggagtgcagtggtgagatcatagctt actgcagcctcaaactcctggactcaaacagtcctcccacctcagcctcccaaagtgctgggatacaggtgtgacccactgtgcccagttatt attttttatttgtattattttactgttgtattatttttaattattttttctgaatattttccatctatagttggttgaatcatggatgtggaacaggcaaatatggag ggctaactgtattgcatcttccagttcatgagtatgcagtctctctgtttatttaaagttttagtttttctcaaccatgtttacttttcagtatacaagactttg acgttttttgttaaatgtatttgtaagtattttattatttgtgatgttatttaaaaagaaattgttgactgggcacagtggctcacgcctgtaatcccagca ctttgggaggctgaggcgggcagatcacgaggtcaggagatcaagaccatcctggctaacatggtaaaaccccgtctctactaaaaataga aaaaaattagccaggcgtggtggcgagtgcctgtagtcccagctactcgggaggctgaggcaggagaatggtgtgaacctgggaggcgg agcttgcagtgagctgagatcgtgccactgcattccagcctgcgtgacagagcgagactctgtcaaaaaaataaataaaatttaaaaaaag aagaagaaattattttcttaatttcattttcaggttttttatttatttctactatatggatacatgattgatttttgtatattgatcatgtatcctgcaaactagct aacatagtttattatttctctttttttgtggattttaaaggattttctacatagataaataaacacacataaacagttttacttctttcttttcaacctagactg gatgcattttttgtttttgtttgtttgtttgctttttaacttgctgcagtgactagagaatgtattgaagaatatattgttgaacaaaagcagtgagagtgg acatccctgctttccccctgattttagggggaatgttttcagtctttcactatttaatatgattttagctataggtttatcctagatccctgttatcatgttga ggaaattcccttctatttctagtttgttgagattttttaattcatgtgattgcgctatctggctttgctctca tctc gaga gaga gaga gaga gaga gcgc gcgc gcgc tctc gaga gaga gaga gaga gaga tctc tctc tctc tctc gaga gaga gaga tctc gcgc tctc tctc tctc Human Genome contains ~ 3 G basepairs arranged in 46 chromosomes. Two individuals are 99.9% the same. I.e. differ in ~ 3 M basepairs. SNPs occur once every ~600 bp Average gene in the human genome spans ~27Kb ~50 SNPs per gene

G C T C G A C A A C A G G T T C G T C A A C A G Two individuals C A G Haplotypes T T G SNP Haplotype

Mutations Infinite Sites Assumption: Each site mutates at most once

Haplotype Pattern C A G T T T G A C A T G C T G T At each SNP site label the two alleles as 0 and 1. The choice which allele is 0 and which one is 1 is arbitrary.

G T T C G A C T A T T A G T T C G A C A A C A T A C G T A T C T A T T A Recombination

G T T C G A C T A T T A G T T C G A C A A C A T A C G T A T C T A T T A The two alleles are linked, I.e., they are “ traveling together ” ? Recombination disrupts the linkage Recombination

Variations in Chromosomes Within a Population Common Ancestor Emergence of Variations Over Time timepresent Disease Mutation Linkage Disequilibrium (LD)

Time = present 2,000 gens. ago Disease-Causing Mutation 1,000 gens. ago Extent of Linkage Disequilibrium

A Data Compression Problem  Select SNPs to use in an association study Would like to associate single nucleotide polymorphisms (SNPs) with disease.  Very large number of candidate SNPs Chromosome wide studies, whole genome-scans For cost effectiveness, select only a subset.  Closely spaced SNPs are highly correlated It is less likely that there has been a recombination between two SNPs if they are close to each other.

Disease Associations

Association studies Disease Responder Control Non-responder Allele 0Allele 1 Marker A is associated with Phenotype Marker A: Allele 0 = Allele 1 =

 Evaluate whether nucleotide polymorphisms associate with phenotype TA GA A CG GA A CG TA A TA TC G TG TA G TG GA G Association studies

TA GA A CG GA A CG TA A TA TC G TG TA G TG GA G

Data Compression ACGATCGATCATGAT GGTGATTGCATCGAT ACGATCGGGCTTCCG ACGATCGGCATCCCG GGTGATTATCATGAT A------A---TG-- G------G---CG-- A------G---TC-- A------G---CC-- G------A---TG-- Haplotype Blocks based on LD (Method of Gabriel et al.2002) Selecting Tagging SNPs in blocks

Real Haplotype Data Two different runs of the Gabriel el al Block Detection method + Zhang et al SNP selection algorithm Our block-free algorithm A region of Chr Caucasian samples