Human Growth: From data to functions. Challenges to measuring growth We need repeated and regular access to subjects for up to 20 years. We need repeated.

Slides:



Advertisements
Similar presentations
What Could We Do better? Alternative Statistical Methods Jim Crooks and Xingye Qiao.
Advertisements

Linear Regression.
Lecture Notes #11 Curves and Surfaces II
Lecture 15 Orthogonal Functions Fourier Series. LGA mean daily temperature time series is there a global warming signal?
Integrals over Operators
Ch11 Curve Fitting Dr. Deshi Ye
Introduction to Smoothing Splines
© University of Wisconsin, CS559 Spring 2004
MATH 685/ CSI 700/ OR 682 Lecture Notes
Selected from presentations by Jim Ramsay, McGill University, Hongliang Fei, and Brian Quanz Basis Basics.
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
1 Chapter 4 Interpolation and Approximation Lagrange Interpolation The basic interpolation problem can be posed in one of two ways: The basic interpolation.
1 Curve-Fitting Spline Interpolation. 2 Curve Fitting Regression Linear Regression Polynomial Regression Multiple Linear Regression Non-linear Regression.
1Notes  Assignment 0 is due today!  To get better feel for splines, play with formulas in MATLAB!
An Introduction to Functional Data Analysis Jim Ramsay McGill University.
An Introduction to Functional Data Analysis Jim Ramsay McGill University.
Jim Ramsay McGill University Basis Basics. Overview  What are basis functions?  What properties should they have?  How are they usually constructed?
Section 4.2 Fitting Curves and Surfaces by Least Squares.
Offset of curves. Alina Shaikhet (CS, Technion)
1cs426-winter-2008 Notes  Ian Mitchell is running a MATLAB tutorial, Tuesday January 15, 5pm-7pm, DMP 110 We won’t be directly using MATLAB in this course,
Lecture 3: Integration. Integration of discrete functions
Curve-Fitting Regression
From Data to Differential Equations Jim Ramsay McGill University With inspirations from Paul Speckman and Chong Gu.
Ch 5.1: Review of Power Series
Statistics.
Lecture 9 Interpolation and Splines. Lingo Interpolation – filling in gaps in data Find a function f(x) that 1) goes through all your data points 2) does.
Bezier and Spline Curves and Surfaces CS4395: Computer Graphics 1 Mohan Sridharan Based on slides created by Edward Angel.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Phase and Amplitude Variation in Montreal Weather Jim Ramsay McGill University.
CS Subdivision I: The Univariate Setting Peter Schröder.
LIAL HORNSBY SCHNEIDER
From Data to Differential Equations Jim Ramsay McGill University.
Human Growth: From data to functions. Challenges to measuring growth We need repeated and regular access to subjects for up to 20 years. We need repeated.
Calculus With Tech I Instructor: Dr. Chekad Sarami.
1 Chapter 8 The Discrete Fourier Transform 2 Introduction  In Chapters 2 and 3 we discussed the representation of sequences and LTI systems in terms.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
(Spline, Bezier, B-Spline)
V. Space Curves Types of curves Explicit Implicit Parametric.
Physics 114: Exam 2 Review Lectures 11-16
Today’s class Spline Interpolation Quadratic Spline Cubic Spline Fourier Approximation Numerical Methods Lecture 21 Prof. Jinbo Bi CSE, UConn 1.
Splines Vida Movahedi January 2007.
Precalculus Complex Zeros V. J. Motto. Introduction We have already seen that an nth-degree polynomial can have at most n real zeros. In the complex number.
Copyright © 2013, 2009, 2005 Pearson Education, Inc. 1 3 Polynomial and Rational Functions Copyright © 2013, 2009, 2005 Pearson Education, Inc.
Curve-Fitting Regression
Fundamentals of Electric Circuits Chapter 18 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Time series Decomposition Farideh Dehkordi-Vakil.
BIOSYST-MeBioSwww.biw.kuleuven.be The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius.
Curve Registration The rigid metric of physical time may not be directly relevant to the internal dynamics of many real-life systems. Rather, there can.
Issues in Estimation Data Generating Process:
Splines IV – B-spline Curves based on: Michael Gleicher: Curves, chapter 15 in Fundamentals of Computer Graphics, 3 rd ed. (Shirley & Marschner) Slides.
Lecture 22 Numerical Analysis. Chapter 5 Interpolation.
Ship Computer Aided Design MR 422. Geometry of Curves 1.Introduction 2.Mathematical Curve Definitions 3.Analytic Properties of Curves 4.Fairness of Curves.
Zero of Polynomial Functions Factor Theorem Rational Zeros Theorem Number of Zeros Conjugate Zeros Theorem Finding Zeros of a Polynomial Function.
H. SAIBI November 25, Outline Generalities Superposition of waves Superposition of the wave equation Interference of harmonic waves.
INTEGRALS We saw in Section 5.1 that a limit of the form arises when we compute an area. We also saw that it arises when we try to find the distance traveled.
5 INTEGRALS.
CHAPTER- 3.2 ERROR ANALYSIS. 3.3 SPECIFIC ERROR FORMULAS  The expressions of Equations (3.13) and (3.14) were derived for the general relationship of.
1 Chapter 4 Interpolation and Approximation Lagrange Interpolation The basic interpolation problem can be posed in one of two ways: The basic interpolation.
Curves University of British Columbia CPSC 314 Computer Graphics Jan-Apr 2013 Tamara Munzner.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Physics 114: Lecture 13 Probability Tests & Linear Fitting
d-Fold Hermite-Gauss Quadrature and Computation of Special Functions
Discrete Event Simulation - 4
Human Growth: From data to functions
Basis Expansions and Generalized Additive Models (2)
Basis Expansions and Generalized Additive Models (1)
Presentation transcript:

Human Growth: From data to functions

Challenges to measuring growth We need repeated and regular access to subjects for up to 20 years. We need repeated and regular access to subjects for up to 20 years. Height changes over the day, and must be measured at a fixed time. Height changes over the day, and must be measured at a fixed time. Height is measured in supine position in infancy, followed by standing height. The change involves an adjustment of about 1 cm. Height is measured in supine position in infancy, followed by standing height. The change involves an adjustment of about 1 cm. Measurement error is about 0.5 cm in later years, but is rather larger in infancy. Measurement error is about 0.5 cm in later years, but is rather larger in infancy.

Challenges to functional modeling We want smooth curves that fit the data as well as is reasonable. We want smooth curves that fit the data as well as is reasonable. We will want to look at velocity and acceleration, so we want to differentiate twice and still be smooth. We will want to look at velocity and acceleration, so we want to differentiate twice and still be smooth. In principle the curves should be monotone; i. e., have a positive derivative. In principle the curves should be monotone; i. e., have a positive derivative.

The monotonicity problem The tibia of a newborn measured daily shows us that over the short term growth takes places in spurts. This baby’s tibia grows as fast as 2 mm/day! How can we fit a smooth monotone function?

Weighted sums of basis functions We need a flexible method for constructing curves to fit the data. We need a flexible method for constructing curves to fit the data. We begin with a set of basic functional building blocks φ k (t), called basis functions. We begin with a set of basic functional building blocks φ k (t), called basis functions. Our fitting function x(t) is a weighted sum of these: Our fitting function x(t) is a weighted sum of these:

What are the main choices for basis functions? Fourier series: a constant term, a constant term, a sine/cosine pair of fixed frequency, and a sine/cosine pair of fixed frequency, and followed by a series of sine/cosine pairs with integer multiples of the base frequency. followed by a series of sine/cosine pairs with integer multiples of the base frequency. Fourier series are best for periodic data.

Five Fourier basis functions

These basis functions are piecewise polynomials defined by a set of discrete values called knots. These basis functions are piecewise polynomials defined by a set of discrete values called knots. The order of the polynomials (degree + 1) controls their smoothness. The order of the polynomials (degree + 1) controls their smoothness. Each basis function is nonzero only over a number of contiguous inter-knot intervals equal to the order. Each basis function is nonzero only over a number of contiguous inter-knot intervals equal to the order. Polynomials are a special type of B-spline, and are thus included within the system. Polynomials are a special type of B-spline, and are thus included within the system. B-splines

When should I use B-splines? B-splines are the basis of choice for most non-periodic. They give complete control over flexibility, allowing more flexibility where needed and less where not needed. They give complete control over flexibility, allowing more flexibility where needed and less where not needed. Computing with B-splines is extremely efficient. Computing with B-splines is extremely efficient.

Five order 2 B-spline basis functions: A basis for polygonal lines

Eight order 4 B-spline basis functions A basis for twice-differentiable functions

B-splines for growth data We use order 6 B-splines because we want to differentiate the result at least twice. Order 4 splines look smooth, but their second derivatives are rough. We use order 6 B-splines because we want to differentiate the result at least twice. Order 4 splines look smooth, but their second derivatives are rough. We place a knot at each of the 31 ages. We place a knot at each of the 31 ages. The total number of basis functions = order + number of interior knots. 35 in this case. The total number of basis functions = order + number of interior knots. 35 in this case.

Isn’t using 35 basis functions to fit 31 observations a problem? Yes. We will fit each observation exactly. Yes. We will fit each observation exactly. This will ignore the fact that the measurement error is typically about 0.5 cm. This will ignore the fact that the measurement error is typically about 0.5 cm. But we’ll fix this up later, when we look at roughness penalties. But we’ll fix this up later, when we look at roughness penalties.

Okay, let’s see what happens These two Matlab commands define the basis and fit the data: hgtbasis = create_bspline_basis([1,18], 35, 6, age); create_bspline_basis([1,18], 35, 6, age); hgtfd = data2fd(hgtfmat, age, hgtbasis); data2fd(hgtfmat, age, hgtbasis);

Why we need to smooth Noise in the data has a huge impact on derivative estimates.

Please let me smooth the data! This command sets up 12 B-spline basis functions defined by equally spaced knots. This gives us about the right amount of fitting power given the error level. hgtbasis = create_bspline_basis([1,18], 12, 6); create_bspline_basis([1,18], 12, 6);

These are velocities are much better. These are velocities are much better. They go negative on the right, though. They go negative on the right, though.

Let’s see some accelerations These acceleration curves are too unstable at the ends. These acceleration curves are too unstable at the ends. We need something better. We need something better.

A measure of roughness What do we mean by “smooth”? What do we mean by “smooth”? A function that is smooth has limited curvature. A function that is smooth has limited curvature. Curvature depends on the second derivative. A straight line is completely smooth. Curvature depends on the second derivative. A straight line is completely smooth.

Total curvature We can measure the roughness of a function x(t) by integrating its squared second derivative. The second derivative notation is D 2 x(t).

Total curvature of acceleration Since we want acceleration to be smooth, we measure roughness at the level of acceleration:

The penalized least squares criterion We strike a compromise between fitting the data and keeping the fit smooth.

How does this control roughness? Smoothing parameter λ controls roughness. Smoothing parameter λ controls roughness. When λ = 0, only fitting the data matters. When λ = 0, only fitting the data matters. But as λ increases, we place more and more emphasis on penalizing roughness. But as λ increases, we place more and more emphasis on penalizing roughness. As λ  ∞, only roughness matters, and functions having zero roughness are used. As λ  ∞, only roughness matters, and functions having zero roughness are used.

We can either smooth at the data fitting step, or smooth a rough function. We can either smooth at the data fitting step, or smooth a rough function. This Matlab command smooths the fit to the data obtained using knots at ages. The roughness of the fourth derivative is controlled. This Matlab command smooths the fit to the data obtained using knots at ages. The roughness of the fourth derivative is controlled. lambda = 0.01; hgtfd = smooth_fd(hgtfd, lambda, 4);

Accelerations using a roughness penalty These accelerations are much less variable at the extremes.

The corresponding velocities

How did you choose λ? We smooth just enough to obtain tolerable roughness in the estimated curves (accelerations in this case), but not so much as to lose interesting variation. We smooth just enough to obtain tolerable roughness in the estimated curves (accelerations in this case), but not so much as to lose interesting variation. There are data-driven methods for choosing λ, but they offer only a reasonable place to begin exploring. There are data-driven methods for choosing λ, but they offer only a reasonable place to begin exploring. But smoothing inevitably involves judgment. But smoothing inevitably involves judgment.

What about monotonicity? The growth curves should be monotonic. The growth curves should be monotonic. The velocities should be non-negative. The velocities should be non-negative. It’s hard to prevent linear combinations of anything from breaking the rules. It’s hard to prevent linear combinations of anything from breaking the rules. We need an indirect approach to constructing a monotonic model We need an indirect approach to constructing a monotonic model

A differential equation for monotonicity Any strictly monotonic function x(t) must satisfy a simple linear differential equation: The reason is simple: because of strict monotonicity, the first derivative Dx(t) will never be 0, and the first derivative Dx(t) will never be 0, and function w(t) is therefore simply D 2 x(t)/Dx(t). function w(t) is therefore simply D 2 x(t)/Dx(t).

The solution of the differential equation Consequently, any strictly monotonic function x(t) must be expressible in the form This suggests that we transform the monotone smoothing problem into one of estimating function w(t), and constants β 0 and β 1. smoothing problem into one of estimating function w(t), and constants β 0 and β 1.

What we have learned B-spline bases are a good choice for fitting non- periodic functions; Fourier series are right for periodic situations. B-spline bases are a good choice for fitting non- periodic functions; Fourier series are right for periodic situations. We can control smoothness by either using a restricted number of basis functions, or by imposing a roughness penalty. We can control smoothness by either using a restricted number of basis functions, or by imposing a roughness penalty. Roughness penalty methods generally work better. Roughness penalty methods generally work better. Differential equations can play a useful role when fitting constrained functions to data. Differential equations can play a useful role when fitting constrained functions to data.

More information Ramsay & Silverman (1997, 2004), Chs. 3, 4, 13 Ramsay & Silverman (1997, 2004), Chs. 3, 4, 13 Ramsay & Silverman (2002), Ch. 6. Ramsay & Silverman (2002), Ch. 6. The long-term growth data are from the Berkeley growth study. The long-term growth data are from the Berkeley growth study. The infant growth data were collected by Michael Hermanussen. The infant growth data were collected by Michael Hermanussen.

Where do we go from here? We need to look more systematically at how to smooth data. We need to look more systematically at how to smooth data. This involves deciding what basis function system to use. This involves deciding what basis function system to use. Splines are so important that we have to look at them in more detail. Splines are so important that we have to look at them in more detail. Here’s a serious problem … Here’s a serious problem …

What’s wrong with the mean? The cross-sectional mean is the heavy blue line. The cross-sectional mean is the heavy blue line. It has less amplitude variation than any single curve. It has less amplitude variation than any single curve. The pubertal growth spurt for the mean lasts longer than does any single curve. The pubertal growth spurt for the mean lasts longer than does any single curve. The problem is that we are averaging over curves in quite different stages of growth. The problem is that we are averaging over curves in quite different stages of growth.

What’s wrong with the mean? The cross-sectional mean is the heavy blue line. The cross-sectional mean is the heavy blue line. It has less amplitude variation than any single curve. It has less amplitude variation than any single curve. The pubertal growth spurt for the mean lasts longer than does any single curve. The pubertal growth spurt for the mean lasts longer than does any single curve. The problem is that we are averaging over curves in quite different stages of growth. The problem is that we are averaging over curves in quite different stages of growth.

Phase and Amplitude Variation Functional data like growth curves often show variation in the timing of events, like the pubertal growth spurt. Functional data like growth curves often show variation in the timing of events, like the pubertal growth spurt. This is called phase variation. This is called phase variation. We have to find out how to separate phase from amplitude variation before we can do even simple things like compute mean curves. We have to find out how to separate phase from amplitude variation before we can do even simple things like compute mean curves.