# Introduction to Research Design Statlab Workshop, Fall 2010 Jeremy Green Nancy Hite.

## Presentation on theme: "Introduction to Research Design Statlab Workshop, Fall 2010 Jeremy Green Nancy Hite."— Presentation transcript:

Introduction to Research Design Statlab Workshop, Fall 2010 Jeremy Green Nancy Hite

Outline of a paper Introduction Theory Data Description Analysis Conclusion

Identifying a Question Tradeoff between work in and results Tradeoff between work in and results Easy to do, trivial results Easy to do, trivial results Result is interesting, but difficulty is high Result is interesting, but difficulty is high New tools open up new questions New tools open up new questions New statistical or computational tools make formerly difficult questions approachable New statistical or computational tools make formerly difficult questions approachable New theory opens up new questions New theory opens up new questions

Introduction Topic Most general level Most general levelQuestion What is the question you want to answer? What is the question you want to answer? Be specific Be specific Ask only what you can answer Ask only what you can answer Review the Literature “Stay the course” “Stay the course”

Theory Categorize your theory Descriptive vs. causal Descriptive vs. causal Write down your theory In paragraph form In paragraph form Using a statistical model Using a statistical model

Hypothesis Identify testable hypotheses. - how well does the hypothesis test the theory - what is the counterfactual argument - What is the scope of the hypothesis test - Spurious factors, contamination, endogenous factors

Do you need statistics after all? Quantitative v. Qualitative research Quantitative v. Qualitative research Mixed methods Mixed methods Trade-off between various data generating styles Trade-off between various data generating styles Archival Archival Pre-existing data Pre-existing data Primary data collection Primary data collection Surveys Surveys Experiments Experiments Downstream analysis Downstream analysis Function (TIME, BUDGET, ACCESS, FEASABILITY) Function (TIME, BUDGET, ACCESS, FEASABILITY)

Methodological Concerns: Consort Checklist Intro- scientific background and explanation of rationale Intro- scientific background and explanation of rationale Eligibility criteria for participants and settings and locations where the data were collected Eligibility criteria for participants and settings and locations where the data were collected Interventions- Interventions- Objectives Objectives Outcomes Outcomes Samples size Samples size Randomization Randomization Blinding (masking) Blinding (masking) Statistical methods Statistical methods Results- Results- Recruitment Recruitment Baseline data Baseline data Numbers Numbers Estimation Estimation Ancillary analysis Ancillary analysis Adverse events - indicate whether there was opportunity for treatment to spill e control Adverse events - indicate whether there was opportunity for treatment to spill e control Discussion Discussion Interpretation Interpretation Generalizability Generalizability Overall evidence Overall evidence

Variables Dependent Variable ( response, outcome, criterion) Independent Variables ( explanatory or predictor variables) Treatment Variable Treatment Variable Covariates / Confounding Variables Covariates / Confounding Variables Categorical and Continuous Variables Remember: Types of variables we choose, determine the statistics we use

You need Data Think about analyses early! Collecting your own data Retrospective, prospective, experimental & observational methods Retrospective, prospective, experimental & observational methods Can find most data you’ll need on-line! Statlab Webpage (http://statlab.stat.yale.edu) Advisors Advisors Yale StatCat (http://ssrs.yale.edu/statcat/) Yale StatCat (http://ssrs.yale.edu/statcat/) ICPSR (http://www.icpsr.umich.edu) ICPSR (http://www.icpsr.umich.edu) Reference Librarian (Julie Linden) Reference Librarian (Julie Linden)

So, you want to make a survey Extensive on-line resources and software Extensive on-line resources and software Question types determine analyses Question types determine analyses Open vs. close ended questions, Likert scales, rank order data Open vs. close ended questions, Likert scales, rank order data Assumptions of normality Assumptions of normality Validity Validity Internal & External validity Internal & External validity Pilot testing Pilot testing You need variance to analyze! You need variance to analyze! Sample size Sample size It depends; power, effect size, cost (UCLA power calculator) It depends; power, effect size, cost (UCLA power calculator)

Once You’ve Found or Collected your data Download the data and documentation StatTransfer (Statlab) StatTransfer (Statlab) Determine data file type Probably a text file (.txt,.dat,.raw) Probably a text file (.txt,.dat,.raw) Converting text & delimited files Choose a statistical software program SPSS, Stata, SAS, Matlab, Excel, R, C++ SPSS, Stata, SAS, Matlab, Excel, R, C++

Managing your data Back up all Master Data Files CDR/CDRW, USB Key Codebook All codes All codes Adding variables, cases, computing new variables Adding variables, cases, computing new variables Keep a roadmap Keep a log of all analyses with what you have done Keep a log of all analyses with what you have done Save syntax files Save syntax files

Data Entry - Codebook Always create a codebook that contains: Always create a codebook that contains: Instructions for entering data Instructions for entering data Instructions for making decisions when data are ambiguous Instructions for making decisions when data are ambiguous Instructions for handling missing observation Instructions for handling missing observation Numerical codes you will use for categorical data Numerical codes you will use for categorical data General troubleshooting information General troubleshooting information Treat it as a working document Treat it as a working document

Cleaning your data In order to minimize errors while manually entering data, you can set ranges in Excel so that if a value outside the range is entered, the cell will change color. In order to minimize errors while manually entering data, you can set ranges in Excel so that if a value outside the range is entered, the cell will change color. To to this go to Format - Conditional Formatting and specify the ranges for which a different format should show up. To to this go to Format - Conditional Formatting and specify the ranges for which a different format should show up. Also, you can use the data validation options. Go to Data - Validation Also, you can use the data validation options. Go to Data - Validation

Keeping Track of Data Sets Ever time you make changes to your data, save it with the current date Ever time you make changes to your data, save it with the current date Keep a document with a list of the major changes with each version Keep a document with a list of the major changes with each version A good idea is to keep a folder with the original data sets and create different subfolders as you make changes to the data set. Sometimes it is also a good idea to keep a working directory for currently active files A good idea is to keep a folder with the original data sets and create different subfolders as you make changes to the data set. Sometimes it is also a good idea to keep a working directory for currently active files Always make backup copies Always make backup copies

Keeping Track of Syntaxes and Outputs Save all the syntax you write Save all the syntax you write Save all the output you produce and try to annotate it as much as possible Save all the output you produce and try to annotate it as much as possible Save your syntax and output with the data file name a brief description of the analyses and the current date Save your syntax and output with the data file name a brief description of the analyses and the current date Save syntax and output in a separate folder from your data Save syntax and output in a separate folder from your data

So, how do I analyze my data? Correlation Correlation allows you to quantify relationships between variables (r, r-squared) Correlation allows you to quantify relationships between variables (r, r-squared) Regression allows prediction of dependent variable based on one or more independent variables Regression allows prediction of dependent variable based on one or more independent variables Group differences t-test & ANOVA t-test & ANOVA Chi-square for categorical and frequency data Chi-square for categorical and frequency data Significance v. effect size More Complex Models

Take Away Messages 1) Determine your question, methods and statistics before you start 2) Keep a codebook of everything 3) Keep a log of all commands issued 4) Save data at every step 5) Ask for help 6) Don’t get in over your head

Download ppt "Introduction to Research Design Statlab Workshop, Fall 2010 Jeremy Green Nancy Hite."

Similar presentations