Presentation is loading. Please wait.

Presentation is loading. Please wait.

CCPR Computing Services More Efficient Programming July 13, 2006.

Similar presentations

Presentation on theme: "CCPR Computing Services More Efficient Programming July 13, 2006."— Presentation transcript:

1 CCPR Computing Services More Efficient Programming July 13, 2006

2 Outline Thinking through a programming task Ways of efficiently documenting and organizing your project  Naming variables, programs, files  Commenting code  Including file header  Implementing directory structure Programming constructs Raw data -> finished product: are your results replicable?

3 Before you start coding… Think Clearly define the problem in writing Write down the solution/algorithm in English  Modularity  Create test (if reasonable) Translate one section to code Test the section thoroughly Translate/Test next section, etc.

4 Documentation - File Header Each do-file/program/file you create should include:  Your name  Project name  Project location  Date  Software Version  Purpose of program  Inputs, Outputs  Special Notes

5 Naming Files, Variables, and Functions Use language standard (if it exists) Be aware of language-specific rules  Max length, underscore, case, reserved words Differentiating log files: ,  Log filesMergeHHsas.log, MergeHHsta.log Meaningful variable names:  LogWt vs. var1  AgeLt30 vs. x Procedure that cleans missing values of Age:  fixMissingAge Matrix multiplication X transpose times X  matXX

6 Commenting Code Good code is self-commenting  Naming conventions, structure/formatting, header should explain 95% Comments should explain  Purpose of code, not every detail  Tricks used  Reasons for unusual coding Comments do not  fix sloppy code  translate syntax If it takes longer to read the comment than to read the code, don’t add a comment!

7 Commenting Code - Stata example SAMPLE 2 *Convert names in dataset to lowercase. program def lowerVarNames foreach v of varlist _all { local LowName = lower("`v'") if `"`v'"' != `"`LowName'"' { rename `v' `=lower("`v'")' } } end SAMPLE 1 program def function1 foreach v of varlist _all { local x = lower("`v'") if `"`v'"' != `"`x'"' { rename `v' `=lower("`v'")' } end Compare formatting, comments, variable name and function names

8 Directory Structure A project consists of many different types of files Use folders to separate files in a logical way Be consistent across projects if possible ATTIC folder for older versions HOME PROJECT NAME DATA RESULTS LOG PROGRAMS ATTIC

9 Stata example: using directory structure ** Paths: global parentpath "C:\Documents and Settings\piersol\Summer06\prog\progtips" global pgmsloc "$parentpath\pgms" global logsloc "$parentpath\logs" global cleandataloc "$parentpath\data\clean" global rawdataloc "$parentpath\data\raw" capture log close log using "$logsloc\test200607", text replace ********************************************************************* *INSERT FILE HEADER HERE...then it’s included in log file. ********************************************************************* macro list webuse union, clear save "$rawdataloc\union.dta", replace *keep idcode year age grade save "$cleandataloc\unionLJP.dta", replace log close

10 Programming Constructs Tools to simplify and clarify your coding Available in virtually all languages Constructs  Loops - for, foreach, do, while  If/elseif/else– if, then, else, case  continue  exit

11 Loop Example 1 Problem: Given 4 indicator variables (south, union, black, not_smsa) and 2 discrete variables (age, grade), generate 8 new indicator variables: south_age21 =south and age > 21, south_gr12=south and grade > 12 Similarly for union, black, not_smsa Solution without loop  8 lines of code similar to: generate newvar = (south==1 & age>21 & age<.) generate newvar = (south==1 & grade>12 & grade<.) Solution with loop foreach j in south union black not_smsa { gen `j'_age21 = (age>21 & age<. & `j'==1) gen `j'_gr12 = (grade>12 & grade<. & `j'==1) }

12 Loop Example 1, cont. *CHECK GENERATED VARIABLES AGAINST ORIGINAL VARIABLES foreach j in south union black not_smsa { qui count if `j'==1 & age>21 & age<. local origCount = r(N) qui count if `j'_age21==1 if `origCount' ~= `r(N)' { display "Counts do not match for `j'_age21!" } else display "Counts match for `j'_age21." qui count if `j'==1 & grade>12 & grade<. local origCount = r(N) qui count if `j'_gr12==1 if `origCount' ~= `r(N)' { display "Counts do not match for `j'_gr21!" } else display "Counts match for `j'_gr21." }

13 Loop Example 2 Given indicator variables white, black, other, and continuous variable educyrs, create interaction variables Solution using loop: local allraces "white black other" foreach race of varlist `allraces' { generate `race'_educ=`race'*educyrs }

14 Loop Example 3 Problem:  Dataset contains variables over multiple years (1970-1990)  Need to perform a number of commands separately for 1970, 1975, 1980, 1985. Solution without loop bysort year: command1 if year==70 | year==75 | year==80 | year==85 bysort year: command2 if year==70 | year==75 | year==80 | year==85 Solution with loop foreach year in 70 75 80 85 { di as result "***Regression for year = `year':" regress ln_wage grade tenure ttl_exp if year==`year' di as result "***Summarize for year = `year':" summarize ln_wage if year==`year' }

15 Loop Example 4 – pulling from 2 lists From Stata FAQ website Code: local agrp "cat dog cow pig" local bgrp "meow woof moo oinkoink" local n : word count `agrp' forvalues i = 1/`n' { local a : word `i' of `agrp' local b : word `i' of `bgrp' di "`a' says `b'" } Resulting output: cat says meow dog says woof cow says moo pig says oinkoink

16 Constructs - If/then/else Execute section of code if condition is true: if condition then {execute this code if condition true} end Execute one of two sections of code: if condition then {execute this code if condition true} else {execute this code if condition false} end

17 If/Else Example Problem: need to execute commands on an operating system, but only if the os is Unix…the commands will fail if os is anything else Solution: if "`c(os)'"~="Unix" { di as err "Sorry; this section requires Unix OS." } else { ** continue with unix commands… }

18 Constructs - Elseif/case Elseif - Execute one of many sections of code: if condition1 then {execute this code if condition1 true} elseif condition2 then {execute this code if condition2 true} else {execute this code if condition1, condition2 are all false} end Case- same idea, different name case condition1 then {execute this code if condition1 true} case condition2 then {execute this code if condition2 true} etc.

19 Elseif Example Problem: Continue example from if…else, but execute different section of code for Unix, Windows, and Mac Solution: if "`c(os)'"=="Unix" { di "This is a Unix environment" } else if "`c(os)'" == "Windows" { di "This is a Windows environment" } else if "`c(os)'" =="MacOSX" { di "This is a MacOS” environment." } else { di as err "`c(os)' not recognized." }

20 Stata- If command vs. if qualifier ifcmd was designed to be used with a single expression Example:  Given variable x with 5 observations: 1, 1, 2, 1, 3,  Compare the following three pieces of Stata code: if x==2 { replace x=99 } if x==1 { replace x=99 } replace x=99 if x==2

21 Stata- If command vs. if qualifier

22 Constucts -- Continue Example from Stata online help Continue is used to exit current iteration of loop and continue with next iteration The following two loops produce the same result: forvalues x = 1/10 { if mod(`x',2)==1 { display "`x' is odd" continue } display "`x' is even" } forvalues x = 1/10 { if mod(`x',2)==1 { display "`x' is odd" } else { display "`x' is even" }

23 Constructs – Exit Stop execution of program Examples:  Do-file contains a number of data checks followed by analysis commands. If data checks reveal something unacceptable, you can exit out of do-file before running analysis.  Program requires user input. If user enters “bad” information, need to quit program.  Debugging. If particular error occurs then break.  Check denominator prior to dividing. If equals zero, exit.

24 Raw data to finished product Raw data Analysis data Runs/results Finished product

25 Raw Data -> Analysis Data Always have two distinct data files- the raw data and analysis data A program should completely re-create analysis data from raw data NO interactive changes!! Final changes must go in a program!!

26 Raw Data -> Analysis Data Document all of the following:  Outliers?  Errors?  Missing data?  Changes to the data? Remember to check-  Consistency across variables  Duplicates  Individual records, not just summary stats  “Smell tests”

27 Analysis Data -> Results All results should be produced by a program Program should use analysis data (not raw) Have a “translation” of raw variable names -> analysis variable names -> publication variable names

28 Analysis Data -> Results Document-  How were variances estimated? Why?  What algorithms were used and why? Were results robust?  What starting values were used? Was convergence sensitive?  Did you perform diagnostics? Include in programs/documentation.

29 Log files Your log file should tell a story to the reader. As you print results to the log file, include words explaining the results Include not only what your code is doing, but your reasoning and thought process Don’t output everything to the log-file- use quietly and noisily in a meaningful way.

30 Project Clean-up Create a zip file that contains everything necessary for complete replication Use a readme.txt file to describe zip contents Delete/archive unused or old files Include any referenced files in zip When you have a final zip archive containing everything-  Open it in it’s own directory and run the script  Check that all the results match

31 Questions/Feedback

Download ppt "CCPR Computing Services More Efficient Programming July 13, 2006."

Similar presentations

Ads by Google