Presentation is loading. Please wait.

Presentation is loading. Please wait.

Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Similar presentations


Presentation on theme: "Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation."— Presentation transcript:

1 Various topics Petter Mostad 2005.11.14

2 Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation of required sample size

3 Epidemiology Epidemiology is the study of diseases in a population –prevalence –incidence, mortality –survival Goals –describe occurrence and distribution –search for causes –determine effects in experiments

4 Some study types Observational studies –Cross-sectional studies –Cohort studies –Longitudinal studies –Case / control studies Experimental studies –Randomized, controlled experiments –Interventions

5 Cross-sectional studies Examines a sample of persons, at a single timepoint Time effects rely on memory of respondents Good for estimating prevalence Difficult for rare diseases Response rate bias

6 Cohort studies and longitudinal studies A sample (cohort) is followed over some time period. If queried at specific timepoints: Longitudinal study Gives better information about causal effects, as report of events is not based on memory Requires that a substantial group developes disease, and that substantial groups differ with respect to risk factors Problem: Long time perspective

7 Case – control studies Starts with a set of sick individuals (cases), and adds a set of controls, for comparison. Cases and controls should be from same populations Matching controls Good method for rare diseases Problem: Bias from selection

8 Measures of risk Relative risk Odds ratio Incidence rate ratio Attributable risk

9 Econometrics ”Econometrics is the field of economics that concerns itself with the application of mathematical statistics and the tools of statistical inference to the empirical measurement of relationships postulated by economic theory” Is the unification of –economic statistics –quantitative economic theory –mathematical economics

10 About econometrics Variations and extensions of the regression model –heteroscedasticity –autocorrelation models –panel data –logistic regression –non-linear regression models –multivariate regression Matrix computations (linear algebra) is almost indispensable tool Time series data Simultaneous equations models

11 Heteroscedasticity Recall: When the variances of independent errors in the model vary, the model is heteroscedastic. Example: In a regression model of house size against income, the variance of house sizes might increase with income In case of heteroscedasticity, ordinary regression models are not optimal. Previously, we mentioned variable transformation as a possible solution Much more advanced solutions exist, when the heteroscedasticity is known or can be estimated: Generalized least squares,…

12 Autocorrelations Recall: When for example the data is from a time series, the random errors for adjacent time steps might be correlated! Improvements in model might reduce problem Standard regression methods are not optimal Modelling and estimating the autoregression gives improved results

13 Panel data Data collected for the same sample, at repeated time points Corresponds to longitudinal epidemiological studies A combination of cross-sectional data and time series data Increasingly popular study type

14 Analyzing panel data Fixed effects: Standard regression, but using a constant term differing for each individual –We get a parameter for each person! Random effects: A stochastic variable models variation connected to individual –The individual variation is assumed drawn from a distribution with fixed variance –A generalization of least squares is needed for computations

15 Analyzing panel data Heteroscedasticity might also here be a problem Autocorrelations Dynamic models: Lagged variables

16 Logistic regression What if the dependent variable is an indicator variable? The model then has two stages: First, we predict a value z i from predictors as before, then the probability of indicator value 1 is given by Given data, we can estimate coefficients in a similar way as before

17 Non-linear regression models Ordinary regression is very useful, but it is limited by the linear form of the equations Sometimes, variable transformations can bring the connection between variables to a linear form Other times, this is not possible: The relationship describes the dependent variable as some function of independent variables and some random error. The model may still be estimated by minimizing the errors. This is non-linear regression.

18 Multivariate regression Instead of one dependent variable, one can have a vector of dependent variables A theory of multivariate multiple regression can be developed (with the help of matrix algebra): Many similar results to ordinary multiple regressions Captures the dependencies between dependent variables

19 Simultaneous equations models Often, you want to describe interdependencies between variables, rather than explaining one variable in terms of others Example: –Demand is a function of various variables, including price –The same is the case with supply –Setting demand = supply creates simultaneous equations Identifiability? Estimation: Least squares is not optimal; other methods exist

20 Time series models Time series issues: –Identifying trends, cycles, etc. –Predicting future values Autoregressive models: –Explicit models for time dependencies: (Box-Jenkins, ARIMA models) AR(1) AR(2)

21 The runs test (for random samples) In a random sample, the probability that an observation is above or below the median is independent of whether the previous observation is. A run is a (maximal) sequence of observations such that all are above the median, or all are below. For n observations, the number of runs has a null distribution under the assumption of no autocorrelation. With too few runs, the null hypothesis of no autocorrelation can be rejected. (Table in Newbold). For large samples, a formula based on a normal approximation can be used.

22 Sampling in practice Newbold mentions: 1.Information required? 2.Relevant population? 3.Sample selection? 4.Obtaining information? 5.Inferences from sample? 6.Conclusions? Sampling / nonsampling errors

23 Types of sampling Simple random sampling Stratified sampling Cluster sampling Two-phase sampling (using pilot studies) Each requires somewhat adjusted formulas for estimation

24 Correcting for finite population in estimations Our estimates of for example population variances, population proportions, etc. assumed an ”infinite” population When the population size N is comparable to the sample size n, a correction factor is necessary. (Why?) Examples: –Variance of population mean estimate: –Variance of population proportion estimate:

25 Estimation of required sample size An important part of experimental planning The answer will generally depend on the parameters you want to estimate in the first place, so only a rough estimate is possible However, a rough estimate may sometimes be very important to do A pilot study may be very helpful

26 Example: Estimating the mean of a normally distributed population We want to estimate mean We want a confidence interval to extend a distance a from the estimate We guess at the population variance A sample size estimate: If we have a population of size N, and want a specified, we get at 95% confidence


Download ppt "Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation."

Similar presentations


Ads by Google