Lectures 8 – Oct 24, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.

Slides:

Advertisements

Similar presentations

. Exact Inference in Bayesian Networks Lecture 9.

Advertisements

Lectures 9 – Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.

Tutorial #1 by Ma’ayan Fishelson

SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.

Unit 5 Genetics Terry Kotrla, MS, MT(ASCP)BB. Terminology  Genes  Chromosomes  Autosome  Sex chromosome  Locus  Alleles  Homozygous  Heterozygous.

METHODS FOR HAPLOTYPE RECONSTRUCTION

CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.

Genetic linkage analysis Dotan Schreiber According to a series of presentations by M. Fishelson.

Basics of Linkage Analysis

. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.

Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.

Pedigree Analysis.

Parallel Genehunter: Implementation of a linkage analysis package for distributed memory architectures Michael Moran CMSC 838T Presentation May 9, 2003.

What’s Your Blood Type? A B AB O.

Intro to Genetics. We can predict what traits are possible among the offspring from certain parents by using a Punnett square. A: True B: False.

An quick survey of human 2-point linkage analysis Terry Speed Division of Genetics & Bioinformatics, WEHI Department of Statistics, UCB NWO/IOP Genomics.

Tutorial by Ma’ayan Fishelson Changes made by Anna Tzemach.

. Learning Bayesian networks Slides by Nir Friedman.

Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.

Tutorial #5 by Ma’ayan Fishelson

Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.

Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.

Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.

Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.

1 Mendelian genetics in Humans: Autosomal and Sex- linked patterns of inheritance Obviously examining inheritance patterns of specific traits in humans.

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

Pedigree Analysis.

Non-Mendelian Genetics

Calculation of IBD State Probabilities Gonçalo Abecasis University of Michigan.

Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.

Genetics and Inheritance

Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.

Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Lectures 9 – Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.

Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.

Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.

Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.

Lecture 15: Linkage Analysis VII

1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.

FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.

 a visual tool for documenting biological relationships in families and the presence of diseases  A pedigree is a family tree or chart made of symbols.

Pedigree and linkage analysis

Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.

Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,

Lecture 22: Quantitative Traits II

What is Genetics? Genetics is the scientific study of heredity.

Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Go to Section: Unit 6D & ESex Linked Traits & Pedigrees **This needs to be separated into two sections and the sex link traits need to be expanded to include.

Linkage and Mapping Bonus #2 due now. The relationship between genes and traits is often complex Complexities include: Complex relationships between alleles.

Chi Square Pg 302. Why Chi - Squared ▪Biologists and other scientists use relationships they have discovered in the lab to predict events that might happen.

Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.

Mendelian genetics in Humans: Autosomal and Sex- linked patterns of inheritance Obviously examining inheritance patterns of specific traits in humans.

Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor

Difference between a monohybrid cross and a dihybrid cross

PEDIGREE ANALYSIS AND PROBABILITY

Biology MCAS Review: Mendelian Genetics

Recombination (Crossing Over)

PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)

1 Department of Engineering, 2 Department of Mathematics,

1 Department of Engineering, 2 Department of Mathematics,

1 Department of Engineering, 2 Department of Mathematics,

Pedigree Analysis.

gene mapping & pedigrees

Linkage Analysis Problems

Sex-linked and Polygenic Traits

Genetic linkage analysis

Presentation transcript:

Lectures 8 – Oct 24, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 Genetic Linkage Analysis 1

Outline Review: disease association studies Association vs linkage analysis Genetic linkage analysis Pedigree-based gene mapping Elston-Stewart algorithm Systems biology basics Gene regulatory network 2

Genome-Wide Association Studies Any disadvantages? Hypothesis-free: we search the entire genome for associations rather than focusing on small candidate areas. The need for extremely dense searches. The massive number of statistical tests performed presents a potential for false-positive results (multiple hypothesis testing) 3 Case …ACTCGGTGGGCATAAATTCGGCCCGGTCAGATTCCATCCAGTTTGTTCCATGG… …ACTCGGTGGGCATAAATTCGGCCCGGTCAGATTCCATCCAGTTTGTACCATGG… : : …ACTCGGTGGGCATAAATTCGGCCCGGTCAGATTCCATCCAGTTTGTACCATGG… …ACTCGGTGGGCATAAATTCTGCCCGGTCAGATTCCATCCAGTTTGTTCCATGG… Control …ACTCGGTAGGCATAAATTCGGCCCGGTCAGATTCCATACAGTTTGTACCATGG… …ACTCGGTGGGCATAAATTCGGCCCGGTCAGATTCCATACAGTTTGTTCCATGG… …ACTCGGTAGGCATAAATTCGGCCCGGTCAGATTCCATACAGTTTGTACCATGG… : : …ACTCGGTGGGCATAAATTCTGCCCGGTCAGATTCCATCCAGTTTGTACCATGG… …ACTCGGTGGGCATAAATTCTGCCCGGTCAGATTCCATACAGTTTGTTCCATGG… genetic markers on 0.1-1M SNPs GGGTTGGGGTGGGTTGGGGT P-value = 0.2 AAACACCCCCAAACACCCCC P-value = 1.0e-7

Association vs Linkage Analysis 4 Any disadvantages? Hypothesis-free: we search the entire genome for associations rather than focusing on small candidate areas. The need for extremely dense searches. The massive number of statistical tests performed presents a potential for false-positive results (multiple hypothesis testing) Alternative strategy – Linkage analysis It acts as systematic studies of variation, without needing to genotype at each region. Focus on a family or families.

Basic Ideas Neighboring genes on the chromosome have a tendency to stick together when passed on to offspring. Therefore, if some disease is often passed to offspring along with specific marker-genes, we can conclude that the gene(s) responsible for the disease are located close on the chromosome to these markers. 5

Outline Review: disease association studies Association vs linkage analysis Genetic linkage analysis Pedigree-based gene mapping Elston-Stewart algorithm Systems biology basics Gene expression data Gene regulatory network 6

Genetic linkage analysis Data Pedigree: set of individuals of known relationship Observed marker genotypes Phenotype data for individuals Genetic linkage analysis Goal – Relate sharing of specific chromosomal regions to phenotypic similarity Parametric methods define explicit relationship between phenotypic and genetic similarity Non-parametric methods test for increased sharing among affected individuals 7

Reading a Pedigree Circles are female, squares are males Shaded symbols are affected, half-shaded are carriers Genotypes of certain gene(s) are given for some individuals What is the probability to observe a pedigree above? 8

Elements of Pedigree Likelihood Prior probabilities For founder genotypes Transmission probabilities For offspring genotypes, given parents Penetrances For individual phenotypes, given genotype 9

Probabilistic model for a pedigree: (1) Founder (prior) probabilities Founders are individuals whose parents are not in the pedigree They may or may not be typed. Either way, we need to assign probabilities to their actual or possible genotypes. This is usually done by assuming Hardy-Weinberg equilibrium (HWE). If the frequency of D is.01, HW says Genotypes of founder couples are (usually) treated as independent Dd P(father Dd) = 2 x.01 x.99 Dd P(father Dd, mother dd) = (2 x.01 x.99) x (.99) dd

Probabilistic model for a pedigree: (2) Transmission probabilities I According to Mendel’s laws, children get their genes from their parents’ genes independently: The inheritances are independent for different children. 11 Dd P(children 3 dd | father Dd, mother dd) = ½ x ½ 1 2 Dd 3 dd

Probabilistic model for a pedigree: (2) Transmission probabilities II The factor 2 comes from summing over the two mutually exclusive and equiprobable ways 4 get a D and a d. 12 Dd P(3 dd, 4 Dd, 5DD | 1 Dd, 2 dd) = (½ x ½) x (2 x ½ x ½) x (½ x ½) 1 2 Dd dd DdDD

Probabilistic model for a pedigree: (3) Penetrance probabilities I Independent penetrance model Pedigree analyses usually suppose that, given the genotype at all loci, and in some cases age and sex, the chance of having a particular phenotype depends only on genotype at one locus, and is independent of all other factors: genotypes at other loci, environment, genotypes and phenotypes of relative, etc Complete penetrance Incomplete penetrance 13 DD P(affected | DD) = 1 P(affected | DD) =.8 DD

Probabilistic model for a pedigree: (3) Penetrance probabilities II Age & sex-dependent penetrance 14 DD (45) P(affected | DD, male, 45 y.o.) =.6

Probabilistic model for a pedigree: Putting all together I Assumptions Penetrance probabilities: P(affected | dd)=0.1, p(affected | Dd)=0.3, P(affected | DD)=0.8 Allele frequency of D is.01 The probability of this pedigree is the product: (2 x.01 x.99 x.7) x (2 x.01 x.99 x.3) x (½ x ½ x.9) x (2 x ½ x ½ x.7) x (½ x ½ x.8) 15 Dd 1 2 dd DdDD

Elements of pedigree likelihood Prior probabilities For founder genotypes e.g. P(g1), P(g2) Transmission probabilities For offspring genotypes, given parents e.g. P(g4|g1,g2) Penetrance For individual phenotypes, given genotype e.g. P(x1|g1) g1 x1 g2 x2 g4 x4 g3 x3 g5 x5 Bayesian network representation A pedigree

Elements of pedigree likelihood Overall pedigree likelihood g1 x1 g2 x2 g4 x4 g3 x3 g5 x5 Bayesian network representation A pedigree Probability of founder genotypes Probability of offspring given parents Probability of phenotypes given genotypes

Probabilistic model for a pedigree: Putting all together II To write the likelihood of a pedigree given complete data: We begin by multiplying founder gene frequencies, followed by transmission probabilities of non-founders given their parents, next penetrance probabilities of all the individuals given their genotypes. What if there are missing or incomplete data? We must sum over all mutually exclusive possibilities compatible with the observed data. All possible genotypes of individual 1 If the individual i’s genotype is known to be g i, then G i = {g i } 18

Probabilistic model for a pedigree: Putting all together II What if there are missing or incomplete data? We must sum over all mutually exclusive possibilities compatible with the observed data. ?? 1 2 Dd dd DdDD 19

Computationally … To write the likelihood of a pedigree: Computation rises exponentially with # people n. Computation rises exponentially with # markers Challenge is summation over all possible genotypes (or haplotypes) for each individual. ??

Computationally … Two algorithms: The general strategy of beginning with founders, then non-founders, and multiplying and summing as appropriate, has been codified in what is known as the Elston-Stewart algorithm for calculating probabilities over pedigrees. It is one of the two widely used approaches. The other is termed the Lander-Green algorithm and takes a quite different approach. 21

Elston and Stewart’s insight… Focus on “special pedigree” where Every person is either Related to someone in the previous generation Marrying into the pedigree No consanguineous marriages Process nuclear families, by fixing the genotype for one parent Conditional on parental genotypes, offsprings are independent 22 GfGf GmGm G o1 G on …

Elston and Stewart’s insight… Conditional on parental genotypes, offsprings are independent Thus, avoid nested sums, and produce likelihood whose cost increases linearly with the number of offspring 23 GfGf GmGm G o1 G on …

Successive Conditional Probabilities Starting at the bottom of the pedigree… Calculate conditional probabilities by fixing genotypes for one parent Specifically, calculate H k (G k ) Probability of descendants and spouse for person k Conditional on a particular genotype G k 24 G spouse G parent G o1 G on G GG G =G k

Formulae … So for each parent, calculate By convention, for individuals with no descendants 25 G spouse G parent G o1 G on G GG G Probability of o’s spouse and descendants when it’s genotype is G o

Final likelihood After processing all nuclear family units Simple sum gives the overall pedigree likelihood How? 26 P(X, given genotypes | G founder )=H founder (G founder )

What next? Computation of the pedigree likelihood For every marker, we want to Compute the pedigree likelihood for each marker and choose the marker that is closely linked to the disease gene. 27

Outline Review: disease association studies Association vs linkage analysis Genetic linkage analysis Pedigree-based gene mapping Elston-Stewart algorithm Systems biology basics Review: gene regulation Gene expression data Gene regulatory network 28

Review: Gene Regulation AGATATGTGGATTGTTAGGATTTATGCGCGTCAGTGACTACGCATGTTACGCACCTACGACTAGGTAATGATTGATC DNA AUGUGGAUUGUU AUGCGCGUC AUGUUACGCACCUAC AUGAUUGAU RNA Protein MWIV MRV MLRTY MID Gene AGATATGTGGATTGTTAGGATTTATGCGCGTCAGTGACTACGCATGTTACGCACCTACGACTAGGTAATGATTGATC Genes regulate each others’ expression and activity. AUGCGCGUC MRV Genetic regulatory network gene RNA degradation MID AUGAUUAU AUGAUUGAU MID “Gene Expression” a switch! (“transcription factor binding site”) Gene regulation transcription translation

Gene expression data 30 Experiments (samples) Genes Induced Repressed j i E ij - RNA level of gene j in experiment i Down- regulated Up- regulated Co-expression genes? ⇒ functionally related?

31 Goal: Inferring regulatory networks “Expression data” e1e1 eQeQ e6e6 … Infer the regulatory network that controls gene expression Causality relationships among e 1-Q Bayesian networks Q≈2x10 4 (for human) A and B regulate the expression of C (A and B are regulators of C) A B C Experimental conditions

32 Clustering expression profiles Data instances

33 Hierarchical agglomerative Compute all pairwise distances Data instances u Merge closest pair

34 Clustering expression profiles Data instances Co-regulated genes cluster together Infer gene function Limitations: No explanation on what caused expression of each gene (No regulatory mechanism) Limitations: No explanation on what caused expression of each gene (No regulatory mechanism)

35 Goal: Inferring regulatory networks “Expression data” e1e1 eQeQ e6e6 … Infer the regulatory network that controls gene expression Causality relationships among e 1-Q Bayesian networks Q≈2x10 4 (for human) A and B regulate the expression of C (A and B are regulators of C) A B C Experimental conditions

36 Regulatory network Bayesian network representation Xi: expression level of gene i Val(Xi): continuous Interpretation Conditional independence Causal relationships Joint distribution P(X) = X1X1 X3 X4 X5X6 X2 Conditional probability distribution (CPD)?

37 Context specificity of gene expression Context A Basal expression level Upstream region of target gene (X5) RNA level Activator (X3) activator binding site Context B Activator induces expression repressor binding site Repressor (X4) activator binding site Activator (X3) Context C Activator + repressor decrease expression X1X1 X3 X4 X5X6 X2 ?

38 Context specificity of gene expression Context A Basal expression level Upstream region of target gene (X5) RNA level Activator (X3) activator binding site Context B Activator induces expression repressor binding site Repressor (X4) activator binding site Activator (X3) Context C Activator + repressor decrease expression X1X1 X3 X4 X5X6 X2 ? true false X4  -3 P(Level) Level... truefalse 3 P(Level) Level 0 P(Level) Level X3  Context A Context CContext B

39 Parameterization Tree conditional probability distributions (CPDs) mean (μ) & variance (σ 2 ) of the normal distribution in each context X1X1 X3 X4 X5X6 X2 Tree CPD true false X4  -3 P(Level) Level... truefalse 3 P(Level) Level 0 P(Level) Level X3  Context A Context BContext C (μA,σA)(μA,σA) (μB,σB)(μB,σB)(μC,σC)(μC,σC)

40 Reconstructing the network Training data Gene expression data Goal Learn the structure & tree CPDs parameters X1X1 X3 X4 X5X6 X2 true false X4  -3 P(Level) Level... truefalse 3 P(Level) Level 0 P(Level) Level X3  Context A Context B Context C (μA,σA)(μA,σA) (μB,σB)(μB,σB) (μC,σC)(μC,σC)

41 Learning Structure learning [Koller & Friedman] Constraint based approaches Score based approaches Bayesian model averaging Scoring function Likelihood score Bayesian score X1X1 X3 X4 X5X6 X2 Given a set of all possible network structures and the scoring function that measures how well the model fits the observed data, we try to select the highest scoring network structure.