Presentation is loading. Please wait.

Presentation is loading. Please wait.

Notes for SAS programming

Similar presentations


Presentation on theme: "Notes for SAS programming"— Presentation transcript:

1 Notes for SAS programming
Econ424 Fall 2008

2 Why SAS? Able to process large data set(s)
Easy to cope with multiple variables Able to track all the operations on the data set(s) Generate systematic output Summary statistics Graphs Regression results Most government agencies and private sectors use SAS

3 Where to find SAS? The MS Window version of SAS is only available on campus class website, “computer resources”, “On-campus computer lab” list computer lab location, hours and software You can access SAS remotely via glue.umd.edu, but you cannot use the interactive windows use a secured telnet (e.g. ssh) to remotely login glue.umd.edu with your directory ID and password type “tap sas” to tell the system that you want to access SAS use any text editor (say pico) to edit your sas program (say myprog.sas). You can also create the text-only .sas file in your PC and (securely) ftp it into glue. In glue, type “sas myprog.sas &” will send the sas program to run in the background. The computer will automatically generate myprog.log to tell you how each command runs in sas. If your program produces any output, the output will be automatically saved in myprog.lst. All in the same directory as your .sas file.

4 Roadmap Thinking in “SAS” Basic rules Read in data
Data cleaning commands Summary statistics Combine two or more datasets Hypothesis testing Regression

5 Thinking in “SAS” What is a program? How is programming done in SAS?
Algorithm, recipe, set of instructions How is programming done in SAS? SAS is like programming in any language: Step by step instructions Can create your own routines to process data Most instructions are set up in a logical manner SAS is NOT like other languages: Some syntax is peculiar to SAS Written specifically for statistics so it isn’t all-purpose Canned processes that you cannot edit nor can you see the code

6 Thinking in “SAS” Creating a program
What is your problem? How can you find a solution? What steps need to be taken to find an answer? Do I need to read in data? What variables do I need? Where is the data? What format is the data in? How do I need to clean the data? Are there outliers? Are there any unexpected values in the data? How do I need to transform the data? Are the variables in the form that I need? Example: Midterm 2005 on 1997 College Ranking data Midterm questions: Midterm data:

7 Basic rules (1) – organize files
.sas – program file .log – notes, errors, warnings .lst – output .sas7bdat – data file library – a cabinet to put data in Default: Work library temporary, erased after you close the session Permanent library libname mylib “m:\”; mylib.mydata = a sas data file named “mydata” in library “mylib” run and recall .sas

8 Basic rules (2) -- program
every command ends with ; format does not matter if x=1 then y=1; else y=2; is the same as if x= then y=1; else y=2; case insensitive comment * this is comment; /* this is comment */;

9 Basic rule (3) – variable
Type numeric (default, 8 digit, . stands for missing value) character ($, default 8 digit, blank stands for missing) Variable names <=32 characters if SAS 9.0 or above <=8 characters if SAS 8 or below case insensitive Must start with letter or “_” _name, my_name, zip5, u_and_me -name, my-name, 5zip, per%, u&me, my$sign

10 Basic rules (4) – data step
create a new data set called “newdata” in the temporary library Data step DATA newdata; set olddata; miles=26.22; kilometers=1.61*miles; run; use the data set called “olddata” in the temporary library Define new variables input data “olddata” obs1 obs2 obs n define “miles” define “kilometers” output data “newdata”

11 Basic rules (5) – proc step
PROC PRINT data=newdata; var miles kilometers; title “print out new data”; run; Action Data source data=distance can be ignored, if you refer to the latest data used in the same program run can be ignore, SAS automatically realize the end of proc when it encounters another data or proc step Signal the end of PROC step, could be ignored if this is followed by a Data or Proc step

12 Read in data (1) – table editor
Tools – table editor choose an existing library or define a new library rename variable save as

13 The following several slides won’t be covered in class
The following several slides won’t be covered in class. But you are welcome to use them by yourselves.

14 Read in data (2) -- datalines
data testdata1; infile datalines; input id height weight gender $ age; datalines; M 23 M 34 F 37 ; /* you only need one semicolon at the end of all data lines, but the semicolon must stand alone in one line */ proc contents data=testdata1; run; proc print data=testdata1; run; No ; until you finish all the data lines Note on missing values $ for character variables try comma delimiter instead of space delimiter infile datalines dsd;

15 Read in data (3) – more datalines
/* read in data in fixed columns */ data testdata1; infile datalines; input id 1 height 2-3 weight 4-6 gender $7 age 8-9; datalines; 168144M23 M34 F37 ;

16 Read in data (4) – more datalines
data testdata1; infile datalines; input id : 1. height : 2. weight : 3. gender : $1. age : 2.; datalines; M 23 M 34 F 37 ;

17 Read in data (4) – more datalines
*alternatively; data testdata1; infile datalines; informat id 1. height 2. weight 3. gender $1. age 2.; input id height weight gender age; datalines; M 23 M 34 F 37 ;

18 Read in data (5) – .csv data testdata1;
infile datalines dlm=‘,’ dsd missover; /* what if you do not have dsd and/or missover */ input id height weight gender $ age ; datalines; 1, 68, 144, M, 23 2, 78, , M, 34 3, 62, 99, F, 37 ; /* what if you forget to type 23 */ run;

19 Read in data (6) – .csv file
/* save the midterm data in a comma delimited file (.csv) before running the following codes */ libname mylib "M:\"; filename midterm "M:\midterm-fall rank.csv"; data mylib.midterm; infile midterm firstobs=3 dlm=',' dsd missover lrecl=900; input year: 4. name: $50. score reputation accept_rate graduate_rate; run; proc contents data=mylib.midterm; proc print data=mylib.midterm; Note the options in the infile command!

20 Read in data (7) – from excel
filename myexcel “M:\midterm-fall rank.xls”; proc import datafile=myexcel out=readin1 DBMS=excel replace; run; data readin1; set readin1; acceptper=accept_rate*100; gradper=graduate_rate*100; proc contents data=readin1; run; proc print data=readin1; run; No ; until the end of the whole sentence

21 Read in data (7) – from excel
Be careful ! SAS will read the first line as variable names, our original data has two non-data lines – the first as labels, the second as variable names. You need to delete the first line to make sure correct data input into SAS. What happens if you don’t delete the first line? What happens if you delete the second line? SAS assigns numeric and character type automatically. Sometime it does make mistake.

22 Data cleaning (1) – if then
Format: IF condition THEN action; ELSE IF condition THEN action; ELSE action; Note: the if-then-else can be nested as many as you want if you need multiple actions instead of one action, use “DO; action1; action2; END; ”

23 Data cleaning (1) – if then
= or EQ means equals ~= or NE means not equal > or GT means greater than < or LT means less than >= or GE means greater than or equal <= or LE means less than or equal in means subset if gender in (‘M’, ‘F’) then ..; Multiple conditions: AND (&), OR(|)

24 Data cleaning (1) – if then
*reading in program of readin1 is on page 17; data readin1; set readin1; IF accept_rate<0.20 THEN acceptgrp=0; ELSE acceptgrp=1; run; proc contents data=readin1; run; proc print data=readin1; run; Note: (1) the code is less efficient if you replace ELSE ..; with IF acceptrate>=0.20 THEN ..; (2) missing value is always counted as the smallest negative, so accept_rate=. will satisfy the condition accept_rate<=0.20. If you want to ignore the missing obs set the condition as 0<accept_rate<=0.20.

25 Data cleaning (1) – if then
* Multiple actions in each branch; data readin1; set readin1; IF accept_rate<0.20 AND gradudate_rate>0.80 THEN DO; acceptgrp=1; gradgroup=‘high'; END; ELSE DO; acceptgrp=0; gradgroup=‘other'; run; proc print data=readin1; run; the do-end pair acts as brackets

26 Data cleaning (1) – if then
*Use if commands to choose a subsample; data readin2; /* note here we generate a new data set */ set readin1; IF accept_rate=. Then delete; If graduate_rate>=0.6; run; proc print data=readin2; run;

27 Data cleaning (1) – exercise
still use readin1. define newscore=[0.4*reputation+0.3*(-accept_rate)+0.3*graduate_rate]*100 define newgrp = 1 if newscore <70 (low) 2 if 70<=newscore<90 (mid-low) 3 if 90<= newscore<120 (mid-high) 4 if newscore>=120 (high).

28 Data cleaning (1) – exercise answer
data readin1; set readin1; newscore=100*(0.4*reputation+ 0.3*(-accept_rate)+0.3*graduate_rate); if newscore<70 then newgrp=1; else if newscore<90 then newgrp=2; else if newscore<120 then newgrp=3; else newgrp=4; run; proc contents data=readin1; run; proc print data=readin1; run; Question: What if one observation has newscore=.?

29 Pages 24-27 are optional material for data cleaning
Pages are optional material for data cleaning. They are not required, but you may find them useful in the future. We skip them in the regular class.

30 Data cleaning (2) – convert variable type
Numeric to character: age1=put(age, $2.); age and age1 have the same contents but different formats Character to numeric: age2=input(age1, 1.); now age and age2 are both numeric, but age2 is chopped at the first digit Take a sub string of a character age3=substr(age1,2,1); now age3 is a sub string of age1, starting from the second digit of age1 (the meaning of “2”) and having one digit in total (the meaning of “1”).

31 Data cleaning (2) - example
* we want to convert studid to ; data testdata2; infile datalines; input studid : 9. studname : $1.; datalines; A B C ; proc print; run; set testdata2; if studid LT 1E+7 then studid1= '00’||compress(put(studid, $9.)); else if 1E+7 LE studid LT 1E+8 then studid1='0'||compress(put(studid, $9.)); else studid1= put(studid, $9.); studid2=substr(studid1,1,3)||'-'||substr(studid1,4,3)||'-'||substr(studid1,7,3);

32 Data cleaning (2) - exercise
You have the following data, variables in sequence are SSN, score1, score2, score3, score4, score5: Calculate the average and standard deviation of the five scores for each individual. Use if-then command to find out who has the highest average score, and report his SSN without dashes.

33 Data summary roadmap Proc contents – variable definitions
Proc print – raw data Proc format – make your print look nicer Proc sort – sort the data Proc means – basic summary statistics Proc univariate – detailed summary stat Proc freq – frequency count Proc chart – histogram proc plot – scatter plot

34 Defining “group” as a numeric variable will save space
proc format (1) *Continue the data cleaning exercise on page 23; data readin1; set readin1; newscore=100*(0.4*reputation+ 0.3*(-accept_rate)+0.3*graduate_rate)); if newscore<70 then newgrp=1; else if newscore<90 then newgrp=2; else if newscore<120 then newgrp=3; else newgrp=4; run; Defining “group” as a numeric variable will save space

35 proc format (2) no ; here until the end of the value command
value newgroup 1=‘low’ 2=‘mid-low’ 3=‘mid-high’ 4=‘high’; run; proc print data=readin1; format newgrp newgroup. newscore 2.2; title ‘report new group in words’; label newscore=‘new score computed by our own rule’ newgrp=‘new group 1-4’; var newscore newgrp; no ; here until the end of the value command no ; here until the end of the label command

36 proc sort proc sort data=readin1; by reputation accept_rate; run;
proc sort data=readin1 out=readin3; by reputation descending accept_rate; * note that missing value is always counted as the smallest;

37 proc means and proc univariate
By default, proc means report mean, stdev, min, max Could choose what to report: proc means data=readin1 n mean median; proc means data=readin1; class acceptgrp; var score newscore reputation; run; proc sort data=readin1; by acceptgrp; run; proc univariate data=readin1; by acceptgrp; By default, proc univariate report median, and many other statistics

38 Notes on proc means and proc univariate
*if you do not use class or by command, the statistics are based on the full sample. If you use class or by var x, the statistics are based on the subsample defined by each value of var x. *You can use class or by in proc means, but only by in proc univariate; *whenever you use “by var x”, the data set should be sorted by var x beforehand;

39 More example on proc means and proc univariate
data readin1; set readin1; if graduate_rate<0.60 then gradgrp=“low”; else gradgrp=“high”; run; proc means data=readin1; class gradgrp; var reputation accept_rate graduate_rate; proc sort data=readin1; by gradgrp; run; proc univariate data=readin1; by gradgrp; var reputation; By default, proc means report mean, stdev, min, max Could choose what to report: proc means data=readin1 n mean median; By default, proc univariate report median, and many other statistics

40 proc freq data readin1; set readin1;
if graduate_rate<0.6 then gradgrp=“low”; else if 0.5<=graduate_rate<0.8 then gradgrp=‘middle’; else gradgrp=‘high’; run; proc freq data=readin1; tables acceptgrp gradgrp acceptgrp*gradgrp;

41 proc chart -- histogram
proc chart data=readin1; title ‘histogram for acceptgrp’; vbar acceptgrp; run; title ‘frequency by two variables’; vbar acceptgrp / group=gradgrp;

42 proc chart – continuous variable
proc chart data=readin1; title “histogram for continuous variable’; vbar accept_rate; run; title ‘histogram with specific midpoints’; vbar accept_rate / midpoints=0 to 1 by 0.1;

43 proc plot – scatter plot
proc plot data=readin1; title ‘scatter plot of accept rate and grad rate’; plot accept_rate*graduate_rate; run;

44 fancy proc means proc means data=readin1; class acceptgrp gradgrp;
var score reputation; output out = summary1 mean = avgscore avgreputn; run; proc print data=summary1;

45 The following page may be useful in practice, but I am not going to cover it in class.

46 some summary stat. in proc print
filename myexcel “M:\midterm-fall rank.xls”;; proc import datafile=myexcel out=readin1 DBMS=excel replace; run; data readin1; set readin1; if accept_rate<0.2 then acceptgrp=‘low’; else if 0.2<=accept_rate<0.5 then acceptgrp=‘middle’; else acceptgrp=‘high’; run; proc sort data=readin1; by acceptgrp; run; proc print data=readin1 n; where graduate_rate>=0.8; by acceptgrp; sum score; var name score accept_rate graduate_rate;

47 How to handle multiple data sets?
Add more observations to an existing data and the new observations follow the same data structure as the old one  append Add more variables to an existing data and the new variables refer to the same subjects as in the old data  merge Sometimes we may need to change data structure to fit in append or merge ….

48 merge and append readin1: name score reputation acceptgrp gradgrp
Harvard low high summary1: acceptgrp gradgrp avgscore avgreputn low high merged: name score reputation acceptgrp gradgrp avgscore avgreputn Harvard low high appended: name score reputation acceptgrp gradgrp avgscore avgreputn Harvard low high low high

49 merge two datasets proc sort data=readin1; by acceptgrp gradgrp; run;
proc sort data=summary1; data merged; merge readin1 (in=one) summary1 (in=two); if one=1 & two=1; What if this line is “if one=1 OR two=1;”?

50 Keep track of matched and unmatched records
data allrecords; merge readin1 (in=one) summary1 (in=two); by acceptgrp gradgrp; myone=one; mytwo=two; if one=1 or two=1; run; proc freq data=allrecords; tables myone*mytwo; SAS will drop variables “one” and “two” automatically at the end of the DATA step. If you want to keep them, you can copy them into new variables “myone” and “mytwo”

51 be careful about merge! always put the merged data into a new data set
must sort by the key variables before merge ok for one-to-one, multi-to-one, one-to-multi, but no good for multi-to-multi be careful of what records you want to keep, and what records you want to delete what if variable x appears in both datasets, but x is not in the “by” statement? after the merge x takes the value defined in the last dataset of the “merge” statement

52 append data appended; set readin1 summary1; run;
proc print data=appended; proc print data=merged;

53 Application related to Project 4
Suppose we have read in 2001 Washingtonian ratings from Project 4 data, and we also have demographic data by city. rest_rating: Restname Ratings Price Loc1 Phone Loc2 Phone2 Addie’s Rockville Andalucia Rockville Haandi Bethesda Falls Church city_demo: Loc State Pop Income Bethesda MD Falls Church VA Rockville MD How to merge the two data sets?

54 Main data issues One restaurant may have multiple locations and each location matches to a different city.  need to reshape the data so that we only have one location per obs but still maintain all the information in the raw data. The two data sets are linked by city name  we need to sort them by city name before merging.

55 Step 1 to reshape restaurant data
reshape data set 1 – focus on loc1 rest_rating: Restname Ratings Price Loc1 Phone Loc2 Phone2 Addie’s Rockville Andalucia Rockville Haandi Bethesda Falls Church restloc1 restname loc phone Addie’s Rockville Andalucia Rockville Haandi Bethesda data restloc1; set rest_rating; loc=loc1; phone=phone1; drop loc1 loc2 loc3 loc4 phone1 phone2 phone3 phone4; run;

56 Step 2 to reshape restaurant data
reshape data set 1 – focus on loc2 rest_rating: Restname Ratings Price Loc1 Phone Loc2 Phone2 Addie’s Rockville Andalucia Rockville Haandi Bethesda Falls Church restloc2 restname loc phone Haandi Falls Church data restloc2; set rest_rating; loc=loc2; phone=phone2; drop loc1 loc2 loc3 loc4 phone1 phone2 phone3 phone4; If loc=‘’ then delete; run;

57 Step 3 to reshape restaurant data
Similarly create restloc3 and restloc4

58 Step 4 to reshape restaurant data
append restloc1, restloc2, restloc3, restloc4 into restloc restloc1 restname loc phone Addie’s Rockville Andalucia Rockville Haandi Bethesda restloc restname loc phone Addie’s Rockville Andalucia Rockville Haandi Bethesda Haandi Falls Church restloc2 restname loc phone Haandi Falls Church data restloc; set restloc1 restloc2 restloc3 restloc4; run;

59 Step 5: merge the datasets
3. merge restloc with citydemo by loc restloc restname loc phone Addie’s Rockville Andalucia Rockville Haandi Bethesda Haandi Falls Church proc sort data=restloc; by loc; run; proc sort data=city_demo; data merged; merge restloc (in=one) city_demo (in=two); by loc; if one=1 | two=1; inone=one; intwo=two; run; city_demo: Loc State Pop Income Bethesda MD Falls Church VA Rockville MD

60 Questions How many restaurants have two locations? how many have three? How many restaurants locate in Rockville? How many restaurants do not have any match with city demographics? How many restaurants in cities where per capita income is at least 30,000? How many are asian and locate in areas with per capita income at least 30,000?

61 mean comparison: two groups
* compare the mean of two groups; * specific question: do Asian and non-Asian restaurants charge systematically different prices?; * define a dummy variable asianstyle=1 if the food style is Asian, Chinese, Japanese, Thai, Vietnamese, Lebanese, or Korean, 0 otherwise; proc glm data=merged; class asianstyle; model price=asianstyle; /* alternatively, model price=asianstyle/solution; */ means asianstyle; run; The F-value tests H0: regression has no power explaining variation in income H1: regression has some power. In our context, this is equivalent to H0: income is no different across asianstyle or non-asianstyle H1: income is different. This amounts to a two-tail test of equal mean. If you use “/solution” at the end of the model command, the output would report regression results. The t-stat of the coefficient for asianstyle or the p-value next to the t-stat could apply to a two-tail test or a one-tail test of equal mean.

62 mean comparison: two groups
Alternatively, you can use proc reg data=merged; model price=asianstyle; Run; The t-stat of the coefficient for asianstyle or the p-value next to the t-stat could apply to a two-tail test or a one-tail test of equal mean.

63 mean comparison: three or more groups
* compare the mean of four groups – ratings =1,2,3,4; * specific question: do restaurants of different ratings charge systematically different prices?; proc glm data=merged; class ratings; model price=ratings; means ratings/waller; means ratings/lsd cldiff; run;

64 notes on mean comparison
The logic of mean comparison is the same as in Excel Be careful about one-tail and two-tail tests. The standard SAS output of coefficient t-stat and p-value are based on a two-tail test of H0: coeff=0, but could be used for a one-tail test with appropriate adjustment on p-value. The F-value reported as the overall regression statistics only applies to the right tail of F distribution, because F is bounded to be positive. The F-value can test H0: regression explains no variation in your dependent variable, against H1: regression explains some variation. This is equivalent to a two-tail test of equal mean. 3. Comparison across more than two groups H0: all groups have the same mean  F-test of whole regression OR H0: group x and group y has the same mean  waller or lsd statistics

65 regression in SAS specific question: how do restaurant price relate to ratings, city population and per capita income? libname n “N:\share\”; proc reg data=n.merged; model price=ratings population per_capita_income; run; Equivalent to: Price=alpha+beta1*ratings+beta2*pop+beta3*per_cap_inc

66 regression in SAS * specific question: how do restaurant prices relate to ratings, city population, per capita income, and state specifics? * note that state itself is a character variable, and we would like to separate MD, VA and DC. This amounts to define three dummy variables denoting each state. This is easy to achieve in SAS; proc glm data=merged; class state; model price=ratings population per_capita_income state / solution; run;

67 Regression: exercise 1 The coefficient of ratings in the first regression tells us how much price will change by a unit change in ratings. However, one may think the change from 1 star to 2 star is different from the change from 2 star to 3 star, or from 3 star to 4 star. To capture these differences, we need more than one coefficents for ratings. proc glm data=merged; class ratings; model price=ratings population per_capita_income / solution; run;

68 Regression: exercise 2 What if we also want to control for states?
SAS does not allow more than one variables in the “class ” command, so we need to define one set of dummy variables by ourselves; Below we define 4 dummy variables for each value of ratings. Since the sum of the four variables is always 1, we need to drop one of them in the regression to avoid their colinearity with the intercept. One way to do it is: data merged; set merged; if ratings=1 then rating_1star=1; else rating_1star=0; if ratings=2 then rating_2star=1; else rating_2star=0; if ratings=3 then rating_3star=1; else rating_3star=0; if ratings=4 then rating_4star=1; else rating_4star=0; proc glm data=merged; class state; model price=rating_2star rating_3star rating_4star population per_capita_income state / solution; run; The other way is: class state ratings; model price=ratimgs population per_capita_income state / solution;

69 Regression: exercise 3 What if we focus on the average price change by one unit increase of ratings (like in regression 1 on page 59), but we would like to see whether this price change vary by states? We need to define ratings*(dummy for MD), ratings*(dummy for VA),ratings*(dummy for DC) data merged; set merged; if state=“MD”then state_MD=1; else state_MD=0; if state=“VA” then state_VA=1; else state_VA=0; if state=“DC” then state_DC=1; else state_DC=0; ratings_MD=ratings*state_MD; ratings_VA=ratings*state_VA; ratings_DC=ratings*state_DC; run; proc reg data=merged; model price=ratings_MD ratings_VA ratings_DC population per_capita_income; * Control for states; proc glm data=merged; class state; model price=ratings_MD ratings_VA ratings_DC population per_capita_income state/solution;

70 A comprehensive example
A review of readin data summary statistics mean comparison regression reg-cityreg-simple.sas in N:\share\

71 A Case Study of Los Angeles Restaurants
Nov , 1997 CBS 2 News “Behind the Kitchen Door” January 16, 1998, LA county inspectors start issuing hygiene grade cards A grade if score of 90 to 100 B grade if score of 80 to 89 C grade if score of 70 to 79 score below 70 actual score shown Grade cards are prominently displayed in restaurant windows Score not shown on grade cards

72

73 Take the idea to data Research Question:
Does better information lead to better hygiene quality? better Information better quality regulation by county by city hygiene scores

74 Data complications (blue font indicates our final choices)
Unit of analysis: individual restaurant? city? zipcode? census tract? Unit of time: each inspection? per month? per quarter? per year? Define information: county regulation? city regulation? the date of passing the regulation? days since passing the regulation? % of days under regulation? Define quality: average hygiene score? the number of A restaurants? % of A restaurants?

75 How to test the idea? Regression: quality = α+β*information+error
+something else? Something else could be: year trend, seasonality, city specific effects, ….

76 real test reg-cityreg-simple.sas in N:\share\

77 Questions How many observations in the sample?
log of the first data step, or output from proc contents How many variables in the sample? How many are numerical, how many are characters? Output from proc contents How many percentage of restaurants have A quality in a typical city-month? Output from proc means, on per_A

78 Questions What is the difference between cityreg and ctyreg? We know county regulation came earlier than city regulation, is that reflected in our data? Yes, cityreg<=ctyreg in every observation We can check this in proc means for cityreg and ctyreg, or add a proc print to eyeball each obs What is the difference between cityreg and citymper? What is the mean of cityreg? What is the mean of citymper? Are they consistent with their definitions? The unit of cityreg is # of days, so it should be a non-negative integer The unit of citymper is % of days, so it should be a real number between 0 and 1 To check this, we can add a proc means for cityreg and citymper

79 Questions Economic theories suggest quality be higher after the regulation if regulation gives consumers better information. Is that true? The summary statistics reported in proc means (class citym_g or ctym_g) show the average percentage of A restaurants in different regulation environments. Rigorous mean comparison tests are done in proc glm with waller or lsd options.

80 Questions Summary statistics often reflect many economic factors, not only the one in our mind. That is why we need regressions. Does more regulation lead to higher quality? is the coefficient of city regulation positive and significantly different from zero? (proc reg) is the coefficient of county regulation positive and significantly different from zero? (proc reg) Do we omit other sensible explanations for quality changes? What are they? (proc glm, year, quarter, city)

81 Count duplicates (not required)
data dups nodups ; set clasdata ; by name class ; /* If the combination of NAME and CLASS is in the data set once, output NODUPS, else output DUPS. */ if first.class and last.class then output nodups ; else output dups ; run;

82 Some project 5 ideas from previous students
Does super-bowl advertising boost stock price? Does financial aid promote GDP growth? What are the most important determinants of house price? Advice on smart investment: which stock(s) should John Doe invest in given his preference for return and risk? Does a chain store charge different prices in different locations?

83 Idea I: Does super-bowl advertising boost stock price?
Theory advertising in super-bowl is extremely expensive and therefore increase the cost of the advertiser  negative effect on stock price advertising at the superbowl signals good finanical status of the advertiser  positive effect on stock price Data: list of superbowl advertisers from major newspapers list of non-advertisers who are major competitors with the advertisers in the same industry New York Stock Exchange Timing: right before and after super-bowl (event study)

84 Idea II: Does financial aid promote GDP growth?
Theory: positive effect because aid means help negative effect because aids tend to go to the most under-developed countries Data focus on Sub-Sahara Africa World Development Indicators (WDI 2002) database from the World Bank : GDP per capita, aid, FDI, adjusted net savings, health expenditures, adjult illiteracy rate

85 Idea III: determinants of house price?
Theory: house attributes location time of sale Data zillows.org

86 Idea IV: Smart Investment
Theory: CAPM John Doe’s preference: risk and return Sample: treasury bills S&P 500 index six individual stocks monthly data in the past 60 months

87 Idea V: Does a chain store charge different prices in different locations?
Theory: Yes: A chain store will charge different prices because different locations have different rent costs and different consumer willingness to pay No: same pricing facilitates advertising from the chain headquarter Data: Hand collected price for 10 CVS stores Choose a basket of goods that represent CVS sales Land price and demographic data from Census

88 course evaluation University wide: www.CourseEvalUM.umd.edu
TTclass in particular:


Download ppt "Notes for SAS programming"

Similar presentations


Ads by Google