Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Analysis using Stata workshop #4 / Kristin Bott reed.edu > K.Bott / Instructional Technology Services Reed College / Portland, OR.

Similar presentations


Presentation on theme: "Data Analysis using Stata workshop #4 / Kristin Bott reed.edu > K.Bott / Instructional Technology Services Reed College / Portland, OR."— Presentation transcript:

1 Data Analysis using Stata workshop #4 / data@reed Kristin Bott kbott @ reed.edu > K.Bott / Instructional Technology Services Reed College / Portland, OR

2 Where we’re headed (<60 min) Step zero : how to find and pilot Stata Step one : basic data skills Step two : external data Step three : data analysis Help and resources K.Bott / Instructional Technology Services Reed College / Portland, OR

3 Step zero : How to find + pilot

4 Accessing Stata Where: – Eliot 110/ PPW (social sciences), Psychology computer lab – (mLab, faculty use) – IRC’s in the ETC – Library, various Note: “Keyed” software – currently a limited number of seats for Reed users – If you run into problems w/ access, let me know Options for buying your own copy Options K.Bott / Instructional Technology Services Reed College / Portland, OR

5 Accessing Stata On your computers -- > Applications >> StataSE open Stata ! … and let’s take a look. K.Bott / Instructional Technology Services Reed College / Portland, OR

6 Review Variables (variable) properties Command Results

7 What did I do? What data am I working with? Data details Current action is here! What happened? Point + click (GUI) vs command-line

8 Stata datasets Pre-loaded, useful for training / learning Type:. sysuse dir. set more off K.Bott / Instructional Technology Services Reed College / Portland, OR

9 Stata datasets Pre-loaded, useful for training / learning Type:. sysuse dir. set more off. sysuse census (1980 Census data by state) K.Bott / Instructional Technology Services Reed College / Portland, OR

10 Step one: basic data skills

11 Data is loaded: now what? What sort of information do you want to know about your data? How can we get to this information? K.Bott / Instructional Technology Services Reed College / Portland, OR

12 Some things you may want to know Range of data / outliers Missing values [How many? How coded?] Types of variables Variation of data K.Bott / Instructional Technology Services Reed College / Portland, OR

13 first-glance tools. summarize (whole dataset).summarize marriage Observation / mean / StdDev / Min / Max. describe (whole dataset).describe [var] var name / storage type / disp format / value label / var label. codebook (whole dataset).codebook [var] type / range / units / unique / missing / mean / std dev / %tiles

14 first-glance tools. summarize (whole dataset).summarize [var] Observation / mean / StdDev / Min / Max. describe (whole dataset).describe marriage var name / storage type / disp format / value label / var label. codebook (whole dataset).codebook [var] type / range / units / unique / missing / mean / std dev / %tiles

15 first-glance tools. summarize (whole dataset).summarize [var] Observation / mean / StdDev / Min / Max. describe (whole dataset).describe [var] var name / storage type / disp format / value label / var label. codebook (whole dataset).codebook [var] type / range / units / unique / missing / mean / std dev / %tiles

16 Variable storage types Describe or variable window shows “storage type” Numbers – byte, int(eger), long, float, double – vary in precision + memory they use (bytes) Letters: – String – str1, str2, … str244 Question :: Why does this matter? K.Bott / Instructional Technology Services Reed College / Portland, OR

17 Variable storage types Describe or variable window shows “storage type” Numbers – byte, int(eger), long, float, double – vary in precision + memory they use (bytes) Letters: – String – str1, str2, … str244 Question :: Why does this matter? K.Bott / Instructional Technology Services Reed College / Portland, OR You can’t find the mean of words...

18 first-glance tools. summarize (whole dataset).summarize [var] Observation / mean / StdDev / Min / Max. describe (whole dataset).describe [var] var name / storage type / disp format / value label / var label. codebook (whole dataset).codebook marriage type / range / units / unique / missing / mean / std dev / %tiles

19 first-glance tools. summarize (whole dataset).summarize [var] Observation / mean / StdDev / Min / Max. describe (whole dataset).describe [var] var name / storage type / disp format / value label / var label. codebook (whole dataset).codebook [var] type / range / units / unique / missing / mean / std dev / %tiles

20 first-glance tools. summarize (whole dataset).summarize [var] Observation / mean / StdDev / Min / Max. describe (whole dataset).describe [var] var name / storage type / disp format / value label / var label. codebook.codebook [var] type / range / units / unique / missing / mean / std dev / %tiles Why is it missing? (didn’t ask / didn’t answer / doesn’t apply … other?) How is “missing” coded?

21 Stata datasets Pre-loaded, useful for training / learning Type:. sysuse dir. sysuse auto (1978 Automobile data) K.Bott / Instructional Technology Services Reed College / Portland, OR

22 some first-glance tools. tabulate foreign variable / frequency / percent / cumulative %.tabulate [var1] [var2] what does this do?.tabulate [var2] [var1] what does this do? K.Bott / Instructional Technology Services Reed College / Portland, OR

23 some first-glance tools. tabulate (whole dataset).tabulate [var] variable / frequency / percent / cumulative %.tabulate foreign rep78 what does this do?.tabulate [var2] [var1] what does this do? K.Bott / Instructional Technology Services Reed College / Portland, OR

24 kbott’s first-glance toolbox For dataset or [var] or [var1] [var2].summarize.codebook.describe.tabulate.inspect.browse.list K.Bott / Instructional Technology Services Reed College / Portland, OR

25 Basics: [browse] subsets of data browse if foreign == 1 (equals) browse if foreign ~= 1 (not equal) browse if foreign != 1 (not equal) browse if mpg > 5 & mpg < 20 (& joins multiple) browse mpg in 1/10 (range of values) K.Bott / Instructional Technology Services Reed College / Portland, OR Can also use view to see results in main window

26 Basics: alter data.sort var (sorts from low to high).drop var (drop variable, keep rest).keep var (keep variable, drop rest).replace var (replace existing variable).generate var (generate new variable).egen var (extended generate).clear [dataset] (clears from memory, does not erase data)

27 Example: population growth/ West.sysuse census.browse.gen pred = 1.05*pop if region==“West” generates a new variable pred_pop based on population and region data K.Bott / Instructional Technology Services Reed College / Portland, OR

28 Example: population growth/ West.sysuse census.browse.gen pred = 1.05*pop if region==“West” what happened? what did we do wrong? what do we do now? K.Bott / Instructional Technology Services Reed College / Portland, OR

29 Example: population growth/ West.sysuse census.gen pred = 1.05*pop if region==“West” type mismatch, eh? how is region coded? how can we fix this? K.Bott / Instructional Technology Services Reed College / Portland, OR

30 Example: population growth/ West.sysuse census.gen pred = 1.05*pop if region==4.replace pred = pop if region!=4.br region state pop pred K.Bott / Instructional Technology Services Reed College / Portland, OR

31 Step two: external data

32 Accessing external data Stata datafiles = *.dta StatTransfer can help transform other formats – downloads.reed.edu – Key Client Bringing data into Stata insheet – for spreadsheets *.csv import excel – for *.xls/*.xlsx files – Also accessible through menus (may be easier) K.Bott / Instructional Technology Services Reed College / Portland, OR

33 Example: Final exam scores, Math 141 Question: Does spending longer on your exam affect your final grade? Data via Albert Kim File is in the folder you downloaded File > Import > Excel > import first row as variable names K.Bott / Instructional Technology Services Reed College / Portland, OR Check your working directory! pwd

34 Step three: data analysis + visualization

35 Visualize: Test score data Grades. hist grade, freq. hist grade, frac. hist grade. hist grade, bin(5). hist grade, bin(20). hist grade, norm bin(10) K.Bott / Instructional Technology Services Reed College / Portland, OR

36 Visualize: Test time data Distribution of time for exam. histogram(time). histogram(time) if major == “Biology”. histogram(time) if major == “Economics” K.Bott / Instructional Technology Services Reed College / Portland, OR

37 Visualize: test score + time data. scatter grade time. lfit grade time. scatter grade time | lfit grade time K.Bott / Instructional Technology Services Reed College / Portland, OR

38 Visualize: test score + time data. scatter grade time. fit grade time. scatter grade time | lfit grade time Do you believe that result? What else do you want to know? K.Bott / Instructional Technology Services Reed College / Portland, OR

39 Analyze: test score data. correlate grade time. regress grade time K.Bott / Instructional Technology Services Reed College / Portland, OR

40 Analyze: test score data. correlate grade time. regress grade time. ttest grade time What is the problem here? K.Bott / Instructional Technology Services Reed College / Portland, OR

41 Analyze: test score data. correlate grade time. regress grade time. ttest grade time K.Bott / Instructional Technology Services Reed College / Portland, OR

42 Analyze: test score data (t-test) By major, is there a significant difference in the amount of time taken for exams?. ttest time, by(major) K.Bott / Instructional Technology Services Reed College / Portland, OR

43 Analyze: test score data (t-test) By major, is there a significant difference in the amount of time taken for exams?. ttest time, by(major) K.Bott / Instructional Technology Services Reed College / Portland, OR

44 Analyze: test score data (t-test) Is there a significant difference between the amt of time that bio majors + econ majors take for their exams?. ttest time if major==“Biology” | major==“Economics”, by(major) K.Bott / Instructional Technology Services Reed College / Portland, OR

45 Analyze: test score data (ANOVA) Within majors, does amount of time spent on an exam significantly affect the grade (outcome) of exam?. by major: anova grade time. by major: oneway grade time Note: many ANOVA permutations; see Psych Stata page, UCLA resources, and/or K.Bott for more info K.Bott / Instructional Technology Services Reed College / Portland, OR

46 Analyze: test score data (regression). regress grade time Within majors, does amount of time spent on an exam significantly affect the grade (outcome) of exam?. by major: regress grade time K.Bott / Instructional Technology Services Reed College / Portland, OR

47 Help + additional resources

48 Additional packages / tools Need a tool that isn’t there? Find it!.findit outreg.ssc install outreg K.Bott / Instructional Technology Services Reed College / Portland, OR

49 Stata lab notebook: Log files Example: click log button on menu (GUI) Watch results window log using [filename] log on / log off log suspend / log resume log close K.Bott / Instructional Technology Services Reed College / Portland, OR

50 Key to collaboration: Do files Save you time for repetitious tasks Minimizes errors Store your data analysis process.doedit or via GUI K.Bott / Instructional Technology Services Reed College / Portland, OR

51 Do file: example clear *clear data in memory set more off *silence extra output capture log close log using "wkshp_log.log", replace sysuse auto generate gpm = 1/mpg label var gpm "Gallons per mile" sort foreign twoway (scatter gpm weight), by(foreign, total) regress gpm weight foreign *press do button *run button will only run one command at a time *save do file

52 Help! Stata Help Menu – contents – search – command At the command line – help – search – findit External (to Stata) resources K.Bott / Instructional Technology Services Reed College / Portland, OR

53 External (to Stata) Resources our Stata pg: reed.edu/cis/help/software/stata- resourcesreed.edu/cis/help/software/stata- resources Links to :UCLA Stata site Psychology Stata site these slides K.Bott kbott@reed.edu kbott@reed.edu x6642, ETC 225 blogs.reed.edu/ed-tech No set office hours – get in touch to meet

54 Feedback, please: http://bit.ly/dataSkills2014_followup Questions? - kbott@reed.edu


Download ppt "Data Analysis using Stata workshop #4 / Kristin Bott reed.edu > K.Bott / Instructional Technology Services Reed College / Portland, OR."

Similar presentations


Ads by Google