Semiparametric Methods for Colonic Crypt Signaling Raymond J. Carroll Department of Statistics Faculty of Nutrition and Toxicology Texas A&M University.

Slides:



Advertisements
Similar presentations
EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
Advertisements

Lesson 10: Linear Regression and Correlation
Brief introduction on Logistic Regression
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Periodograms Bartlett Windows Data Windowing Blackman-Tukey Resources:
Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Lecture 6 (chapter 5) Revised on 2/22/2008. Parametric Models for Covariance Structure We consider the General Linear Model for correlated data, but assume.
Ch11 Curve Fitting Dr. Deshi Ye
Raymond J. Carroll Texas A&M University Non/Semiparametric Regression and Clustered/Longitudinal Data.
Data mining and statistical learning - lecture 6
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
A Similarity Analysis of Curves: A Comparison of the Distribution of Gangliosides in Brains of Old and Young Rats. Yolanda Munoz Maldonado Department of.
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
Raymond J. Carroll Texas A&M University Nonparametric Regression and Clustered/Longitudinal Data.
Raymond J. Carroll Department of Statistics and Nutrition Texas A&M University Non/Semiparametric Regression.
Curve-Fitting Regression
Nonparametric Regression and Clustered/Longitudinal Data
Score Tests in Semiparametric Models Raymond J. Carroll Department of Statistics Faculties of Nutrition and Toxicology Texas A&M University
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Deterministic Solutions Geostatistical Solutions
Model Selection in Semiparametrics and Measurement Error Models Raymond J. Carroll Department of Statistics Faculty of Nutrition and Toxicology Texas A&M.
Chapter 11 Multiple Regression.
1 Analysis of DNA Damage and Repair in Colonic Crypts Raymond J. Carroll Texas A&M University Postdoctoral.
Space-time Modelling Using Differential Equations Alan E. Gelfand, ISDS, Duke University (with J. Duan and G. Puggioni)
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Statistical Tools for Environmental Problems NRCSE.
Correlation and Linear Regression
Objectives of Multiple Regression
Linear Regression and Correlation
Session 4. Applied Regression -- Prof. Juran2 Outline for Session 4 Summary Measures for the Full Model –Top Section of the Output –Interval Estimation.
With many thanks for slides & images to: FIL Methods group, Virginia Flanagin and Klaas Enno Stephan Dr. Frederike Petzschner Translational Neuromodeling.
Kalman filtering techniques for parameter estimation Jared Barber Department of Mathematics, University of Pittsburgh Work with Ivan Yotov and Mark Tronzo.
Interactive Graphics Lecture 9: Slide 1 Interactive Graphics Lecture 9: Introduction to Spline Curves.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
Applications The General Linear Model. Transformations.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Curve-Fitting Regression
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Wavelet-based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis Jeffrey S. Morris M.D. Anderson Cancer Center Joint work with Marina.
Estimation Method of Moments (MM) Methods of Moment estimation is a general method where equations for estimating parameters are found by equating population.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Correlation & Regression Analysis
Effect of the Reference Set on Frequency Inference Donald A. Pierce Radiation Effects Research Foundation, Japan Ruggero Bellio Udine University, Italy.
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Sequence Kernel Association Tests (SKAT) for the Combined Effect of Rare and Common Variants 統計論文 奈良原.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Estimating standard error using bootstrap
Chapter 4 Basic Estimation Techniques
Chapter 7. Classification and Prediction
ECE3340 Numerical Fitting, Interpolation and Approximation
The general linear model and Statistical Parametric Mapping
CJT 765: Structural Equation Modeling
Two-Variable Regression Model: The Problem of Estimation
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
The general linear model and Statistical Parametric Mapping
Product moment correlation
Parametric Methods Berlin Chen, 2005 References:
Linear Regression and Correlation
Presentation transcript:

Semiparametric Methods for Colonic Crypt Signaling Raymond J. Carroll Department of Statistics Faculty of Nutrition and Toxicology Texas A&M University

Outline Problem: Modeling correlations among functions and apply to a colon carcinogenesis experiment Biological background Semiparametric framework Nonparametric Methods Asymptotic summary Analysis Summary

Biologist Who Did The Work Meeyoung Hong Postdoc in the lab of Joanne Lupton and Nancy Turner Data collection represents a year of work

Basic Background Apoptosis: Programmed cell death Cell Proliferation: Effectively the opposite p27: Differences in this marker are thought to stimulate and be predictive of apoptosis and cell proliferation Our experiment: understand some of the structure of p27 in the colon when animals are exposed to a carcinogen

Data Collection Structure of Colon Note the finger-like projections These are colonic crypts We measure expression of cells within colonic crypts

Data Collection p27 expression : Measured by staining techniques Brighter intensity = higher expression Done on a cell by cell basis within selected colonic crypts Very time intensive

Data Collection Animals sacrificed at 4 times : 0 = control, 12hr, 24hr and 48hr after exposure Rats: 12 at each time period Crypts: 20 are selected Cells: all cells collected, about 30 per crypt p27: measured on each cell, with logarithmic transformation

Nominal Cell Position X = nominal cell position Differentiated cells: at top, X = 1.0 Proliferating cells: in middle, X=0.5 Stem cells: at bottom, X=0

Standard Model Hierarchical structure: cells within crypts within rats within times

Standard Model Hierarchical structure: cell locations of cells within crypts within rats within times In our experiment, the residuals from fits at the crypt level are essentially white noise However, we also measured the location of the colonic crypts

Crypt Locations at 24 Hours, Nominal zero Scale: 1000’s of microns Our interest: relationships at between microns

Standard Model Hypothesis: it is biologically plausible that the nearer the crypts to one another, the greater the relationship of overall p27 expression. Expectation: The effect of the carcinogen might well alter the relationship over time Technically: Functional data where the functions are themselves correlated

Basic Model Two-Levels: Rat and crypt level functions Rat-Level: Modeled either nonparametrically or semiparametrically Semiparametrically: low-order regression spline Nonparametrically: some version of a kernel fit

Basic Model Crypt-Level: A regression spline, with few knots, in a parametric mixed-model formulation Linear spline: In practice, we used a quadratic spline. The covariance matrix of the quadratic part used the Pourahmadi- Daniels construction

Advertisement Please buy our book! Or steal it David Ruppert Matt Wand

Basic Model Crypt-Level: We modeled the functions across crypts semiparametrically as regression splines The covariance matrix of the parameters of the spline modeled as separable Matern family used to model correlations

Basic Model Crypt-Level: regression spline, few knots Separable covariance structure

Xihong Lin Theory for Parametric Version The semiparametric approach we used fits formally into a new general theory Recall that we fit the marginal function at the rat level “nonparametrically”. At the crypt level, we used a parametric mixed-model representation of low-order regression splines

General Formulation Y ij = Response X ij, Z ij = cell and crypt locations Likelihood (or criterion function) The key is that the function is evaluated multiple times for each rat This distinguishes it from standard semiparametric models The goal is to estimate and efficiently

General Formulation: Overview Likelihood (or criterion function) For iid versions of these problems, we have developed constructive kernel-based methods of estimation with Asymptotic expansions and inference available If the criterion function is a likelihood function, then the methods are semiparametric efficient. Methods avoid solving integral equations

General Formulation: Overview We also show The equivalence of profiling and backfitting Pseudo-likelihood calculations The results were submitted 7 months ago, and we confidently expect 1 st reviews within the next year or two, depending on when the referees open their mail

General Formulation: Overview In the application, modifications necessary both for theoretical and practical purposes Techniques described in the talk of Tanya Apanasovich can be used here as well to cut down on the computational burden due to the correlated functions.

Nonparametric Fits Equal Spacing: Assume cell locations are equally spaced Define = covariance between crypt- level functions that are  apart, one at location x 1 and the other at location x 2 Assume separable covariance structure

Nonparametric Fits Discrete Version: Pretend  x 1 and x 2 take on a small discrete set of values (we actually use a kernel-version of this idea) Form the sample covariance matrix per rat at  x 1 and x 2, then average across rats. Call this estimate

Nonparametric Fits Separability: Now use the separability to get a rough estimate of the correlation surface.

Nonparametric Fits The estimate is not a proper correlation function We fixed it up using a trick due to Peter Hall (1994, Annals), thus forming, a real correlation function Basic idea is to do a Fourier transform, force it to be non-negative, then invert Slower rates of convergence than the parametric fit, more variability, etc. Asymptotics worked out (non-trivial)

Semiparametric Method Details In the example, a cubic polynomial actually suffices for the rat-level functions Maximize in the covariance structure of the crypt-level splines, including Matern-order The covariance structure includes Matern order m (gridded) Matern parameter  Spline smoothing parameter Quadratic part’s covariance matrix (Pourahmadi- Daniels) AR(1) for residuals

Results: Part I Matern Order: Matern order of 0.5 is the classic autoregressive model Our Finding: For all times, an order of about 0.15 was the maximizer Simulations: Can distinguish from autocorrelation Different Matern orders lead to different interpretations about the extent of the correlations

Marginal over time Spline Fits Note time effects Location effects

24-Hour Fits: The Matern order matters

Nonparametric and Semiparametric Fits at 24 Hours

Comparison of Fits The semiparametric and nonparametric fits are roughly similar At 200 microns, quite far apart, the correlations in the functions are approximately 0.40 for both Surprising degree of correlation

Summary We have studied the problem of crypt-signaling in colon carcinogenesis experiments Technically, this is a problem of hierarchical functional data where the functions are not independent in the standard manner We developed (efficient, constructive) semiparametric and nonparametric methods, with asymptotic theory The correlations we see in the functions are surprisingly large.

Statistical Collaborators Yehua Li Naisyin Wang

Summary Insiration for this work: Water goanna, Kimberley Region, Australia