Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to Functional Data Analysis Jim Ramsay McGill University.

Similar presentations


Presentation on theme: "An Introduction to Functional Data Analysis Jim Ramsay McGill University."— Presentation transcript:

1 An Introduction to Functional Data Analysis Jim Ramsay McGill University

2 Overview We’ll use three case studies to see what is meant by functional data, and to consider some important issues in the analysis of functional data Human growth data Human growth data US nondurable goods manufacturing index US nondurable goods manufacturing index Thirty years of Montreal weather Thirty years of Montreal weather

3 Human Growth: From data to functions

4 We need repeated and regular access to subjects for up to 20 years. We need repeated and regular access to subjects for up to 20 years. Height changes over the day, and must be measured at a fixed time. Height changes over the day, and must be measured at a fixed time. Height is measured in supine position in infancy, followed by standing height. The change involves an adjustment of about 1 cm. Height is measured in supine position in infancy, followed by standing height. The change involves an adjustment of about 1 cm. Measurement error is about 0.5 cm in later years, but is rather larger in infancy. Measurement error is about 0.5 cm in later years, but is rather larger in infancy. Measurements are not taken at equally spaced points in time. Measurements are not taken at equally spaced points in time.

5 Challenges to functional modeling We want smooth curves that fit the data as well as is reasonable. We want smooth curves that fit the data as well as is reasonable. We will want to look at velocity and acceleration, so we want to differentiate twice and still be smooth. We will want to look at velocity and acceleration, so we want to differentiate twice and still be smooth. In principle the curves should be monotone; i. e., have a positive derivative. In principle the curves should be monotone; i. e., have a positive derivative.

6 The monotonicity problem The tibia of a newborn measured daily shows us that over the short term growth takes places in spurts. This baby’s tibia grows as fast as 2 mm/day! How can we fit a smooth monotone function?

7 Weighted sums of basis functions We need a flexible method for constructing curves to fit the data. We need a flexible method for constructing curves to fit the data. We begin with a set of basic functional building blocks φ k (t), called basis functions. We begin with a set of basic functional building blocks φ k (t), called basis functions. Our fitting function x(t) is a weighted sum of these: Our fitting function x(t) is a weighted sum of these:

8 B-splines for growth data Order 4 splines look smooth, but their second derivatives are rough. Order 4 splines look smooth, but their second derivatives are rough. We use order 6 B-splines because we want to differentiate the result at least twice. We use order 6 B-splines because we want to differentiate the result at least twice. We place a knot at each of the 31 ages. We place a knot at each of the 31 ages. The total number of basis functions = order + number of interior knots. 35 in this case. The total number of basis functions = order + number of interior knots. 35 in this case.

9 Isn’t using 35 basis functions to fit 31 observations a problem? Yes. We will fit each observation exactly. Yes. We will fit each observation exactly. This will ignore the fact that the measurement error is typically about 0.5 cm. This will ignore the fact that the measurement error is typically about 0.5 cm. But we’ll fix this up later, when we look at roughness penalties. But we’ll fix this up later, when we look at roughness penalties.

10 Okay, let’s see what happens These two Matlab commands define the basis and fit the data: hgtbasis = create_bspline_basis([1,18], 35, 6, age); create_bspline_basis([1,18], 35, 6, age); hgtfd = data2fd(hgtfmat, age, hgtbasis); data2fd(hgtfmat, age, hgtbasis);

11 Why we need to smooth Noise in the data has a huge impact on derivative estimates.

12 Please let me smooth the data! This command sets up 12 B-spline basis functions defined by equally spaced knots. This gives us about the right amount of fitting power given the error level. hgtbasis = create_bspline_basis([1,18], 12, 6); create_bspline_basis([1,18], 12, 6);

13 These are velocities are much better. These are velocities are much better. They go negative on the right, though. They go negative on the right, though.

14 Let’s see some accelerations These acceleration curves are too unstable at the ends. These acceleration curves are too unstable at the ends. We need something better. We need something better.

15 A measure of roughness What do we mean by “smooth”? What do we mean by “smooth”? A function that is smooth has limited curvature. A function that is smooth has limited curvature. Curvature depends on the second derivative. A straight line is completely smooth. Curvature depends on the second derivative. A straight line is completely smooth.

16 Total curvature We can measure the roughness of a function x(t) by integrating its squared second derivative. The second derivative notation is D 2 x(t).

17 Total curvature of acceleration Since we want acceleration to be smooth, we measure roughness at the level of acceleration:

18 The penalized least squares criterion We strike a compromise between fitting the data and keeping the fit smooth.

19 How does this control roughness? Smoothing parameter λ controls roughness. Smoothing parameter λ controls roughness. When λ = 0, only fitting the data matters. When λ = 0, only fitting the data matters. But as λ increases, we place more and more emphasis on penalizing roughness. But as λ increases, we place more and more emphasis on penalizing roughness. As λ  ∞, only roughness matters, and functions having zero roughness are used. As λ  ∞, only roughness matters, and functions having zero roughness are used.

20 We can either smooth at the data fitting step, or smooth a rough function. We can either smooth at the data fitting step, or smooth a rough function. This Matlab command smooths the fit to the data obtained using knots at ages. The roughness of the fourth derivative is controlled. This Matlab command smooths the fit to the data obtained using knots at ages. The roughness of the fourth derivative is controlled. lambda = 0.01; hgtfd = smooth_fd(hgtfd, lambda, 4);

21 Accelerations using a roughness penalty These accelerations are much less variable at the extremes.

22 The corresponding velocities look good, too

23 How did you choose λ? We smooth just enough to obtain tolerable roughness in the estimated curves (accelerations in this case), but not so much as to lose interesting variation. We smooth just enough to obtain tolerable roughness in the estimated curves (accelerations in this case), but not so much as to lose interesting variation. There are data-driven methods for choosing λ, but they offer only a reasonable place to begin exploring. There are data-driven methods for choosing λ, but they offer only a reasonable place to begin exploring. But this is inevitably involves judgment. But this is inevitably involves judgment.

24 What about monotonicity? The growth curves should be monotonic. The growth curves should be monotonic. The velocities should be non-negative. The velocities should be non-negative. It’s hard to prevent linear combinations of anything from breaking rules like monotonicity. It’s hard to prevent linear combinations of anything from breaking rules like monotonicity. We need an indirect approach to constructing a monotonic model. We need an indirect approach to constructing a monotonic model.

25 A differential equation for monotonicity Any strictly monotonic function x(t) must satisfy a simple linear differential equation: The reason is simple: Because of strict monotonicity, the first derivative Dx(t) will never be 0, and the first derivative Dx(t) will never be 0, and function w(t) is therefore simply D 2 x(t)/Dx(t). function w(t) is therefore simply D 2 x(t)/Dx(t).

26 The solution of the differential equation Consequently, any strictly monotonic function x(t) must be expressible in the form This suggests that we transform the monotone smoothing problem into one of estimating function w(t), and constants β 0 and β 1. smoothing problem into one of estimating function w(t), and constants β 0 and β 1.

27 What we have learned from the growth data We can control smoothness by either using a restricted number of basis functions, or by imposing a roughness penalty. We can control smoothness by either using a restricted number of basis functions, or by imposing a roughness penalty. Roughness penalty methods generally work better than simple basis expansions. Roughness penalty methods generally work better than simple basis expansions. Differential equations can play a useful role in defining constrained functions. Differential equations can play a useful role in defining constrained functions.

28 Phase-Plane Plotting the Nondurable Goods Index

29 Nondurable goods last less than two years: Food, clothing, cigarettes, alcohol, but not personal computers!! Nondurable goods last less than two years: Food, clothing, cigarettes, alcohol, but not personal computers!! The nondurable goods manufacturing index is an indicator of the economics of everyday life. The nondurable goods manufacturing index is an indicator of the economics of everyday life. The index has been published monthly by the US Federal Reserve Board since 1919. The index has been published monthly by the US Federal Reserve Board since 1919. It complements the durable goods manufacturing index. It complements the durable goods manufacturing index.

30 What we want to do Look at important events. Look at important events. Examine the overall trend in the index. Examine the overall trend in the index. Have a look at the annual or seasonal behavior of the index. Have a look at the annual or seasonal behavior of the index. Understand how the seasonal behavior changes over the years and with specific events. Understand how the seasonal behavior changes over the years and with specific events.

31 The log nondurable goods index

32 Events and Trends Short term: Short term: 1929 stock market crash 1929 stock market crash 1937 restriction of money supply 1937 restriction of money supply 1974 end of Vietnam war, OPEC oil crisis 1974 end of Vietnam war, OPEC oil crisis Medium term: Medium term: Depression Depression World War II World War II Unusually rapid growth 1960-1974 Unusually rapid growth 1960-1974 Unusually slow growth 1990 to present Unusually slow growth 1990 to present Long term increase of 1.5% per year Long term increase of 1.5% per year

33 The evolution of seasonal trend We focus on the years 1948 to 1999 We focus on the years 1948 to 1999 We estimate long- and medium-term trend by spline smoothing, but with knots too far apart to capture seasonal trend We estimate long- and medium-term trend by spline smoothing, but with knots too far apart to capture seasonal trend We subtract this smooth trend to leave only seasonal trend We subtract this smooth trend to leave only seasonal trend

34 Smoothing the data We want to represent the data y j by a smooth curve x(t). The curve should have at least two smooth derivatives. We use spline smoothing, penalizing the size of the 4 th derivative. A function Pspline in S-PLUS is available by ftp from ego.psych.mcgill.ca/pub/ramsay/FDAfuns

35 Three years of typical trend: 1964-1966

36 Seasonal Trend Typically three peaks per year Typically three peaks per year The largest is in the fall, peaking at the beginning of October The largest is in the fall, peaking at the beginning of October The low point is mid-December The low point is mid-December

37 Non-seasonal trend is in red

38 Seasonal trend = data – nonseasonal trend

39 Phase-Plane Plots Looking at seasonal trend itself does not reveal as much as looking at the interplay between: Looking at seasonal trend itself does not reveal as much as looking at the interplay between: Velocity or its first derivative, reflecting kinetic energy in the system. Velocity or its first derivative, reflecting kinetic energy in the system. Acceleration or its second derivative, reflecting potential energy. Acceleration or its second derivative, reflecting potential energy. The phase-plane diagram plots acceleration against velocity. The phase-plane diagram plots acceleration against velocity. For purely sinusoidal trend, the plot would be an ellipse. For purely sinusoidal trend, the plot would be an ellipse.

40 Position of a swinging pendulum

41 Phase-plane plot for pendulum

42 Phase-plane plot for 1964 There are three large loops separated by two small loops or cusps: 1. Spring cycle: mid- January into April 2. Summer cycle: May through August 3. Fall cycle: October through December

43 A look at the years 1929-1931.

44

45

46

47 1929 through 1931 The stock market crash shows up as a large negative surge in velocity. The stock market crash shows up as a large negative surge in velocity. Subsequent years nearly lose the fall production cycle, as people tighten their belts and spend less at Christmas. Subsequent years nearly lose the fall production cycle, as people tighten their belts and spend less at Christmas.

48 What happened in 1937-1938?

49

50

51

52 1937 and 1938 The Treasury Board, fearing that the economy was becoming overheated again, clamped down on the money supply. The effect was catastrophic, and nearly wiped out the fall cycle. This new crash was even more dramatic than that of 1929, but was forgotten because of the outbreak of World War II.

53 What about World War II?

54

55 During World War II, the seasonal cycle became very small, since the war, and the production that fed it, lasted all year long. Now look at three pivotal years, 1974 to 1976, when the Vietnam War ended and the OPEC oil crisis happened. Watch the shrinking of the fall cycle.

56

57

58

59 What about today?

60

61 These days Over the last ten years the size of all three cycles have become much smaller. Why? Is variation now smoothed out by information technology? Is variation now smoothed out by information technology? Are the aging baby boomers spending less? Are the aging baby boomers spending less? Are personal computers, video games, and other electronic goods really durable? Are personal computers, video games, and other electronic goods really durable? Has manufacturing now moved off shore? Has manufacturing now moved off shore?

62 Conclusions We can separate long- and medium-term trends from seasonal trends by smoothing. We can separate long- and medium-term trends from seasonal trends by smoothing. Phase-plane plots are great ways to inspect seasonality. Phase-plane plots are great ways to inspect seasonality. Derivatives were used in two ways: to penalize roughness, and to reflect the dynamics of manufacturing. Derivatives were used in two ways: to penalize roughness, and to reflect the dynamics of manufacturing.

63 Trends in Seasonality We see by inspection that seasonal trends change systematically over time, and can also change abruptly. We first estimate the principal components of seasonal variation, using a version of principal components analysis adapted to functional data, and sensitive only to effects periodic over one year.

64

65

66

67 The Components 1. Relative sizes of spring and summer cycles (53%) 2. Joint size of spring and summer cycles (25%) 3. Size of fall cycle (11%)

68 Plotting Component Scores We can compute scores at each year for these three principal components, sometimes called empirical orthogonal functions. Plotting the evolution of these scores over the 51 years shows some interesting structural changes in the economics of everyday life.

69

70

71 Wrap-up Phase-plane plots are good for inspecting seasonal quasi-harmonic trends Phase-plane plots are good for inspecting seasonal quasi-harmonic trends Principal components analysis reveals main components of variation in seasonal trend. Principal components analysis reveals main components of variation in seasonal trend. Plotting component scores shows how trend has evolved. Plotting component scores shows how trend has evolved.

72 This was joint with work with James B. Ramsey, Dept. of Economics, New York University, and is reported in: Ramsay, J. O. and Ramsey, J. B. (2001) Functional data analysis of the dynamics of the monthly index of non ‑ durable goods production. Journal of Econometrics, 107, 327 ‑ 344.

73 Phase and Amplitude Variation in Montreal Weather 34 years of daily temperatures, 1961-1994 inclusive 34 years of daily temperatures, 1961-1994 inclusive Values are averages of daily maximum and minimum Values are averages of daily maximum and minimum 12410 observations in tenths of a degree Celsius 12410 observations in tenths of a degree Celsius Available for Montreal and 34 other Canadian weather stations Available for Montreal and 34 other Canadian weather stations

74

75 We know that there are two kinds of variation in these data: 1. Amplitude variation: day-to-day and year-to-year variation in temperature at events such as the depth of winter. 2. Phase variation: the timing of these events -- the seasons arrive early in some years, and late in others.

76 Goals Separate phase variation from amplitude variation by registering the series to its strictly periodic image. Separate phase variation from amplitude variation by registering the series to its strictly periodic image. Estimate components of variation due to amplitude and phase variation. Estimate components of variation due to amplitude and phase variation.

77 Smoothing The registration process requires that we smooth the data two ways: 1. With an unconstrained smooth that removes the day-to-day variation, but leaves longer-term variation unchanged. 2. With a strictly periodic smooth that eliminates all but strictly periodic trend.

78 Unconstrained smooth Raw data are represented by a B-spline expansion using 500 basis functions of order 6. Raw data are represented by a B-spline expansion using 500 basis functions of order 6. Knot about every 25 days. Knot about every 25 days. The standard deviation of the raw data about this smooth, adjusted for degrees of freedom, is 1.52 degrees Celsius. The standard deviation of the raw data about this smooth, adjusted for degrees of freedom, is 1.52 degrees Celsius.

79

80 Periodic smooth The basis is Fourier, with 9 basis functions judged to be enough to capture most of the strictly periodic trend for a period of one year. The basis is Fourier, with 9 basis functions judged to be enough to capture most of the strictly periodic trend for a period of one year. The standard deviation of the raw about data about this smooth is 2.18 deg C. The standard deviation of the raw about data about this smooth is 2.18 deg C. Compare this to 2.07 deg C. for the unconstrained smooth. Compare this to 2.07 deg C. for the unconstrained smooth.

81

82 Plotting the unconstrained B-spline smooth minus the constrained Fourier smooth reveals some striking discrepancies. Plotting the unconstrained B-spline smooth minus the constrained Fourier smooth reveals some striking discrepancies. We focus on Christmas, 1989. The Ramsay’s spent the holidays in a chalet in the Townships, and awoke to –37 deg C. No skiing, car dead, marooned! We focus on Christmas, 1989. The Ramsay’s spent the holidays in a chalet in the Townships, and awoke to –37 deg C. No skiing, car dead, marooned! This temperature would still be cold in mid-January, but less unusual. This temperature would still be cold in mid-January, but less unusual.

83

84 Registration Let the unconstrained smooth be x(t) and the strictly periodic smooth be x 0 (t). Let the unconstrained smooth be x(t) and the strictly periodic smooth be x 0 (t). We need to estimate a nonlinear strictly increasing smooth transformation of time h(t), called a warping function, such that a fitting criterion is minimized. We need to estimate a nonlinear strictly increasing smooth transformation of time h(t), called a warping function, such that a fitting criterion is minimized.

85 Fitting criterion The fitting criterion was the smallest eigenvalue of the matrix This criterion measures the extent to which a plot of x[h(t)] against x 0 (t) is linear, and thus whether the two curves are in phase.

86 The warping function h(t) Every smooth strictly monotone function h(t) such that h(0) = 0 can be represented as We represent unconstrained function w(v) by a B-spline expansion. Constant C is determined by constraint h(T) = T.

87 The deformation u(t) = h(t) - t Plotting this allows us to see when the seasons come early (negative deformation) or late (positive deformation).

88

89 Mid-winter for 1989-1990 arrived about 25 days early. Mid-winter for 1989-1990 arrived about 25 days early. The next step is to register the temperature data by computing x*(t) = x[h(t)]. The registered curve x*(t) contains only amplitude variation. The next step is to register the temperature data by computing x*(t) = x[h(t)]. The registered curve x*(t) contains only amplitude variation. Registration was done by Matlab function registerfd, available by ftp from Registration was done by Matlab function registerfd, available by ftp fromego.psych.mcgill.ca/pub/ramsay/FDAfuns

90

91 Amplitude variation The standard deviation of the difference between the unconstrained smooth and the strictly periodic smooth is 2.15 C. The standard deviation of the difference between the unconstrained smooth and the strictly periodic smooth is 2.15 C. The standard deviation of the difference between the registered smooth and the period smooth is 1.73 C. The standard deviation of the difference between the registered smooth and the period smooth is 1.73 C. (2.15 2 – 1.73 2 )/2.15 2 =.35, the proportion of the variation due to phase. (2.15 2 – 1.73 2 )/2.15 2 =.35, the proportion of the variation due to phase.

92 The standard deviation of the raw data around the registered smooth is 2.13 C, compared with 2.07 C for the unregistered smooth. The standard deviation of the raw data around the registered smooth is 2.13 C, compared with 2.07 C for the unregistered smooth. About 10% of the total variation is due to phase. About 10% of the total variation is due to phase.

93 Conclusions Phase variation is an important part of weather behavior. Phase variation is an important part of weather behavior. Statisticians seldom think about phase variation, and classical time series methods ignore it completely. Statisticians seldom think about phase variation, and classical time series methods ignore it completely. Phase variation needs more attention, and registration is an essential tool. Phase variation needs more attention, and registration is an essential tool.

94 Unique Aspects of Functional Data Analysis The data are smooth, so we can use derivatives in various ways. The data are smooth, so we can use derivatives in various ways. Differential equations can play a big role. Differential equations can play a big role. Events in functional data occur over different time scales. Events in functional data occur over different time scales. Time itself may be an elastic medium, and vary over functional observations. Time itself may be an elastic medium, and vary over functional observations.

95 Finding out More Ramsay, J. O. and Silverman, B. W. (1997, 2004) Functional Data Analysis. Springer. Ramsay, J. O. and Silverman, B. W. (1997, 2004) Functional Data Analysis. Springer. Ramsay, J. O. and Silverman, B. W. Ramsay, J. O. and Silverman, B. W. (2002) Applied Functional Data Analysis. Springer (2002) Applied Functional Data Analysis. Springer Visit the FDA website: www.psych.mcgill.ca/misc/fda/ Visit the FDA website: www.psych.mcgill.ca/misc/fda/ www.psych.mcgill.ca/misc/fda/ Software in Matlab, R and S-PLUS available at ego.psych.mcgill.ca/pub/ramsay Software in Matlab, R and S-PLUS available at ego.psych.mcgill.ca/pub/ramsay


Download ppt "An Introduction to Functional Data Analysis Jim Ramsay McGill University."

Similar presentations


Ads by Google