Estimating the distribution of the incubation period of HIV/AIDS Marloes H. Maathuis Joint work with: Piet Groeneboom and Jon A. Wellner.

Slides:



Advertisements
Similar presentations
Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.
Advertisements

CS188: Computational Models of Human Behavior
Problems and Their Classes
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
1 ECE 776 Project Information-theoretic Approaches for Sensor Selection and Placement in Sensor Networks for Target Localization and Tracking Renita Machado.
Image Modeling & Segmentation
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
1 LP Duality Lecture 13: Feb Min-Max Theorems In bipartite graph, Maximum matching = Minimum Vertex Cover In every graph, Maximum Flow = Minimum.
Sampling and Pulse Code Modulation
Approximations of points and polygonal chains
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Expectation Maximization
Nonparametric maximum likelihood estimation (MLE) for bivariate censored data Marloes H. Maathuis advisors: Piet Groeneboom and Jon A. Wellner.
Rajat K. Pal. Chapter 3 Emran Chowdhury # P Presented by.
Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Dimensional reduction, PCA
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Parametric Inference.
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Graph Theory Ch.5. Coloring of Graphs 1 Chapter 5 Coloring of Graphs.
Lecture 7 1 Statistics Statistics: 1. Model 2. Estimation 3. Hypothesis test.
EM Algorithm Likelihood, Mixture Models and Clustering.
Testing models against data Bas Kooijman Dept theoretical biology Vrije Universiteit Amsterdam master course WTC.
Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.
Application of reliability prediction model adapted for the analysis of the ERP system Frane Urem, Krešimir Fertalj, Željko Mikulić College of Šibenik,
Random Sampling, Point Estimation and Maximum Likelihood.
Small clique detection and approximate Nash equilibria Danny Vilenchik UCLA Joint work with Lorenz Minder.
Chanyoung Park Raphael T. Haftka Paper Helicopter Project.
Lecture 19: More EM Machine Learning April 15, 2010.
A Generalization of PCA to the Exponential Family Collins, Dasgupta and Schapire Presented by Guy Lebanon.
Restricted Track Assignment with Applications 報告人:林添進.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
An Introduction to Variational Methods for Graphical Models
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Lecture 2: Statistical learning primer for biologists
Week 41 How to find estimators? There are two main methods for finding estimators: 1) Method of moments. 2) The method of Maximum likelihood. Sometimes.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Stat 223 Introduction to the Theory of Statistics
Statistical Estimation
Stat 223 Introduction to the Theory of Statistics
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
Lecture 18 Expectation Maximization
Hidden Markov Models.
LECTURE 11: Advanced Discriminant Analysis
Algorithm design techniques Dr. M. Gavrilova
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
SMEM Algorithm for Mixture Models
EC 331 The Theory of and applications of Maximum Likelihood Method
Stat 223 Introduction to the Theory of Statistics
Nonparametric Hypothesis Tests for Dependency Structures
Estimating Maximal Information Amount under Interval Uncertainty
STATISTICAL INFERENCE PART I POINT ESTIMATION
12. Principles of Parameter Estimation
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Maximum Likelihood Estimation (MLE)
Presentation transcript:

Estimating the distribution of the incubation period of HIV/AIDS Marloes H. Maathuis Joint work with: Piet Groeneboom and Jon A. Wellner

Incubation period Time between HIV infection and onset of AIDS 1985 HIV 1996 AIDS Incubation period 11 years

1980 HIV AIDS

Censored data Interval of HIV infection Interval of onset of AIDS Lower bound of incubation period 6 years Upper bound of incubation period 13 years

X (HIV) Y (AIDS) Interval of onset of AIDS Interval of HIV infection

1980 X (HIV) Y (AIDS)

Distribution functions Goal: estimate the distribution function of the incubation period of HIV/AIDS Why? This is important for predicting the future course of the epidemic Strategy: First estimate the 2-dimensional distribution

Main focus Nonparametric maximum likelihood estimator (MLE) for 2-dimensional distribution: –Computational aspects –Theoretical properties (consistency)

Computation of the MLE Parameter reduction: determine the inner rectangles Optimization: determine the amounts of mass assigned to the inner rectangles.

max Inner rectangles X (HIV) Y (AIDS)

max Inner rectangles X (HIV) Y (AIDS)

max Inner rectangles X (HIV) Y (AIDS)

max Inner rectangles X (HIV) Y (AIDS)

max Inner rectangles X (HIV) Y (AIDS)

max Inner rectangles The MLE is insensitive to the distribution of mass within the inner rectangles. This gives non-uniqueness. X (HIV) Y (AIDS)

α1α1 α2α2 α3α3 α4α4 X (HIV) Y (AIDS)

α1α1 α2α2 α3α3 α4α4 X (HIV) Y (AIDS)

α1α1 α2α2 α3α3 α4α4 X (HIV) Y (AIDS)

α1α1 α2α2 α3α3 α4α4 X (HIV) Y (AIDS)

α1α1 α2α2 α3α3 α4α4 X (HIV) Y (AIDS)

s.t.and α1α1 α2α2 α3α3 α4α4 X (HIV) Y (AIDS)

s.t.and 3/ The α i ’s are not always uniquely determined: second type of non-uniqueness X (HIV) Y (AIDS)

Graph theory R4 R1 R2 R3 R5 R3R4 R2R5 R1 Intersection graph The maximal cliques correspond to the inner rectangles Maximal cliques: {R1,R2,R3}, {R3,R4}, {R4,R5}, {R2,R5} Set of rectangles

Existing reduction algorithms Betensky and Finkelstein (1999) Gentleman and Vandal (2001,2002) Song (2001) These algorithms are slow, complexity O(n 4 ) to O(n 5 )

New algorithms MaxCliqueFinder complexity ≤ O(n 2 log n) SimpleCliqueFinder complexity O(n 2 )

R4 R1 R2 R3 R5 Segment tree

R4 R1 R2 R3 R5 Segment tree

R4 R1 R2 R3 R5 Segment tree

R4 R1 R2 R3 R5 Segment tree

{R5,R2} {R3,R1,R2} Maximal cliques: R4 R1 R2 R3 R5

SimpleCliqueFinder

Computation of the MLE Parameter reduction: determine the inner rectangles Optimization: determine the amounts of mass assigned to the inner rectangles.

Optimization High-dimensional convex constrained optimization problem

Amsterdam Cohort Study among injecting drug users Open cohort study Data available from 1985 to individuals were enrolled 216 individuals tested positive for HIV during the study

Model X:time of HIV infection Y: time of onset of AIDS Z = Y-X: incubation period U 1,U 2 : observation times for X C: censoring variable for Y (X, Y) and (U 1,U 2, C) are independent

HIV AIDS u1u1 u2u2

HIV AIDS u1u1 u2u2

HIV AIDS u1u1 u2u2

HIV AIDS t = min(c,y) u1u1 u2u2

HIV AIDS t = min(c,y) u1u1 u2u2

HIV AIDS t = min(c,y) u1u1 u2u2

HIV AIDS u1u1 u2u2 We observe: W = (U 1, U 2, T=min(C,Y), Δ)

HIV AIDS u1u1 u2u2 t = min(c,y) We observe: W = (U 1, U 2, T=min(C,Y), Δ)

HIV AIDS u1u1 u2u2 t = min(c,y) We observe: W = (U 1, U 2, T=min(C,Y), Δ)

HIV AIDS u1u1 u2u2 t = min(c,y) We observe: W = (U 1, U 2, T=min(C,Y), Δ)

Inconsistency of the naive MLE

Methods to repair inconsistency Transform the lines into strips MLE on a sieve of piecewise constant densities Kullback-Leibler approach

X (HIV) Y (AIDS) How to estimate P(Y-X ≤ z)?

The distribution function of the incubation period cannot be estimated consistently P(Z ≤ z, Y ≤ 1997) What we can estimate consistently is

Conclusions (1) We found the graph theoretic framework very useful Our algorithms for the parameter reduction step are significantly faster than other methods. We proved that in general the naive MLE is an inconsistent estimator for our AIDS model.

Conclusions (2) We explored several methods to repair the inconsistency The MLE can be very sensitive to small changes in the data There is not enough information to estimate the incubation period consistently without making additional assumptions