Presentation is loading. Please wait.

Presentation is loading. Please wait.

Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the.

Similar presentations


Presentation on theme: "Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the."— Presentation transcript:

1 Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the first 10 counties Include: ID, County, Number of reporting Units (v1), Number of employees (v2), Payroll (v3) Save to your flash drive as ‘countydata’

2 The SAS ® System Statistical Analysis Programming

3 Introduction to SAS ® Arguably the most popular computer software for conducting statistical data analysis Does both data management & statistical analysis Useful for managing even the most complex data sets Operates on its own language

4 Introduction to SAS ® Open the SAS ® Window

5 Introduction to SAS ® You essentially have 4 windows within SAS: The Explorer Sidebar Window The Log Window The Editor Window The Output Window You can resize and reconfigure these windows, and minimize & maximize as you would in any windows-based program

6 Introduction to SAS ® The Editor Window is for constructing & running programs “Programming” in SAS involves writing out step-by-step instructions in the correct order in a format the SAS System can understand The program you write must be perfect SAS will give you error messages

7 SAS ® Programming Three major components to most SAS programs: Input Manipulation Output

8 SAS ® Programming Input Most of the time data are placed into a data file and inputted into the program The program tells the system which variables are located in which columns

9 SAS ® Programming: Input Input data & column locations

10 SAS ® Programming Manipulation Data are then manipulated to accomplish the tasks for which the program was written: transforming or combining variables or conducting statistical or other analyses

11 SAS ® Programming: Manipulation Manipulate the Data

12 SAS ® Programming Output Program Output The results of the program are then outputted into the Output Window You must save these results Log

13 SAS ® Programming: Output

14 SAS ® Programming: Log

15 SAS ® Programming Basic Input Statement = “DATA Step” Begins with an “options” statement that formats what the output page will look like Names the temporary data set location “data1,” “data 2,” etc. or text name (8 characters max) Tells SAS where to find your actual data set File location Gives the “Input” – or, column locations for your variables

16 SAS ® Programming: Input Options Temporary Data Set Data Location Input Column Locations

17 SAS ® Programming Basic Input Statement After your input statement, you add statements to transform or manipulate the data Add statements to perform analysis procedures Ends with a RUN statement

18 SAS ® Programming: Input Data Manipulations & Transformations Analysis Procedure

19 SAS ® Programming: Syntax SAS Statements Commands or instructions that can be interpreted by the SAS system These commands appear as blue text in the Enhanced Editor window DATA, PROC, PUT, INPUT, RUN, etc.

20 SAS ® Programming: Syntax Every SAS statement must end in a semicolon; This is how the system knows the statement is complete One of the most common errors is omitting semicolons Comments begin with an asterisk *

21 SAS ® Programming: Syntax In the Enhanced Editor: Plain text is black Numerical values are teal SAS Statements are blue Errors are red Basic arithmetic functions can be used (+, -, *, /)

22 SAS ® Programming: Logical Operators Symbol AbbreviationOperation = eqequal to ^= or ~= nenot equal to > gtgreater than < ltless than >= or => gegreater than or equal to <= or =< leless than or equal to & and | or

23 Building a SAS ® Program 1. Open the SAS Program and Click inside the Editor Window 2. Add your “options” statements: options nocenter nonumber nodate linesize=88 pagesize=72; 3. Add the “data” statement, then the name of your first temporary data file (data1)

24 Building a SAS ® Program

25 4. Add the “infile” statement, then the file location where your data is stored 5. Add the “input” statement, then each variable name followed by its numeric location A dollar sign $ after a variable name signifies that the variable is character (text) data Recommend that you input data in 80 column lines, #2 would signify the start of a new line

26 Building a SAS ® Program

27 6. Add statements for data management or statistical analysis. SAS Statements vary based on the task to be accomplished Data management: create new variables, change values, etc. Statistical procedures: frequencies, correlations, crosstabulations, regression, etc.

28 Building a SAS ® Program

29 Hands-On Exercise 1: Build a Basic SAS Program Using SAS, write a basic program for the county data set you created For your analysis, run a “print” command: Proc Print; var county v1 v2 v3;

30 Exercise 1

31 SAS ® Procedures PROC Commands SAS procedures that perform different operations use “PROC” commands A lot of different PROC commands, we’ll touch on a few of the most used Some for data management Some for statistical analysis

32 SAS ® Procedures PROC PRINT Prints the data you have in your temporary SAS data set Will print the variables you designate (either those from your initial INPUT statement, or variables you create) Helps you better understand your data set; helps you spot errors

33 SAS ® Procedures Proc Print; var v1 v2 v3; This statement tells SAS to print the data / information for v1, v2, and v3 If you run “PROC PRINT” without any variables designated, it will print ALL of your variables

34 SAS ® Procedures PROC PRINT You should run a proc print when you transform variables or create new variables to insure that the transformations were done correctly Example Create a new variable by adding two others: newvar = v1+v2; Proc print; var v1 v2 newvar; Check the output to insure that the operation is correct

35 Variable Manipulations SAS will permit you to perform many different types of variable manipulations Add Variables newvar1 = v1+v2+v3; Subtract Variables newvar2 = v3 – v2;

36 Variable Manipulations Multiply Variables newvar3 = v2 * v3; Divide Variables newvar4 = v2/v1; More complex transformations can be done following basic rules for arithmetic operations newvar5 = (v1+v2/v3)*4;

37 Variable Manipulations You can also use your new variables in other transformations newvar6 = newvar4*newvar5 Create categorical variables You can reformat your data into new variables If you have a survey question with responses showing ‘year of birth’ you can convert it to ‘age’

38 Variable Manipulations

39 For example, if you have a series of data for a variable: Variable name: “vexample” Values: 1 2 3 4 5 6 7 8 9 10 We want to create a categorical variable with the categories and corresponding values of: Low = 1 Medium = 2 High = 3

40 Variable Manipulations Give your new variable a name like “newvexample” or “vexamplecat” Your new categorical variable would be created with this if/then syntax:

41 Variable Manipulations If your data is not as simple as 1 2 3 4 5 6 and so on, you can use the “PROC SORT” command to help you sort your data set

42 Variable Manipulations Run a PROC SORT for v2, and then run a PROC PRINT to show the variable rearranged in ascending order

43 Variable Manipulations

44 Now, create a new variable “newv2” with the following categories: Low = 1 (values less than 100) Medium = 2 (values 100 to 500) High = 3 (values more than 500) Run a PROC PRINT and PROC FREQ to check your transformations

45 Variable Manipulations

46

47 IF/THEN Statements In the previous exercise, you saw how if / then statements can be used to create new variables If / then statements are very powerful and can be used in a number of ways to help you manage your data

48 IF/THEN Statements Segmenting Data Sets – the IF statement Simple IF Statements The SAS “IF” command can be used to segment or partition your data set For example, suppose you only want to examine certain cases in your data set – only females, only people over age 55, only Florida counties with populations greater than 500,000, etc.

49 IF/THEN Statements You can segment in this way, using the IF statement: If we only want to examine the number of reporting units in our sample for counties with a “low” number of employees: If newv2 is low looks like this in SAS language: IF newv2=1;

50 IF/THEN Statements

51 Combining IF statements to segment data sets with the DATA command It is very useful to combine the IF command to segment data with the DATA command we learned earlier Recall that your initial data step started with the command: data data1; This created the initial temporary SAS data set

52 IF/THEN Statements The temporary data set “data 1” contained all of the cases that you entered into your data set If you now want to examine only a subset of those cases, you can do that in a second data set: data data 2; set data1; This creates a second temporary data set called “data 2” (remember SAS allows a large number of data sets)

53 IF/THEN Statements We can now use an IF statement to segment the data in our set “data 2” Let’s create a second data set that includes only counties with a “medium” number of employees Run a PROC PRINT to check the output

54 IF/THEN Statements

55 The PROC PRINT shows us that the temporary data set we’re now dealing with has only the 5 counties with a “medium” number of employees

56 IF/THEN Statements Hands-On Exercise Use the commands we’ve just learned to: 1. Create a new variable for high, medium, and low payroll amounts (newv3) 2. Use the DATA and IF statements to create a new data set that contains only those counties with the highest payroll for gasoline services stations – run a PROC PRINT to check your results

57 IF/THEN Statements

58

59

60 The IF and THEN commands are most often used together with the operators we talked about before

61 SAS ® Programming: Logical Operators Symbol AbbreviationOperation = eqequal to ^= or ~= nenot equal to > gtgreater than < ltless than >= or => gegreater than or equal to <= or =< leless than or equal to & and | or

62 IF/THEN Statements More Complex IF statements Multiple IF statements can be connected using “and” or “or” statements to make more complex statements: if v1 eq 2 or v2 gt 5 and v3 ne 2 then newvar =1

63 IF/THEN Statements Using IF and THEN statements: The general form of this command (for creating new variables, separating data sets, etc.) is: IF variable condition exists (character indicator abbreviation: eq, ne, lt, le, ge) THEN new variable condition (numeric symbol) IF v2 eq 5 then newv2 = 1; Again, you can combine conditions for more complex statements

64 IF/THEN Statements

65 Add Variables & Cases Two other important data management functions that SAS can perform are adding additional cases or observations and adding new variables

66 Add Variables & Cases Adding Cases The term for adding cases or observations is “concatenation” This allows you to add new cases to the bottom of your existing data set You simply create a second data set and add it to your initial data set

67 Add Variables & Cases Initial Data Set Additional Cases Merged Set

68 Add Variables & Cases Hands-On Exercise You have already created one data set of 10 counties 1. Create a new data set containing information for the next 4 counties (Collier, Columbia, De Soto, and Dixie) 2. Add these cases to your existing data set 3. Do a PROC PRINT for data3 to verify

69 Exercise

70 Add Variables & Cases Adding Variables Adding variables to your existing data is simple as well Again, you will need to create a second data set that will essentially add a column or columns to your initial data set The second data set will contain the new variable you are adding and one variable that matches exactly a variable in your initial data set – usually the sequential ID number (similar to Access)

71 Add Variables & Cases To make sure that the data sets are properly combined, you must SORT the initial and second data set by the matching variable The syntax looks like this:

72 Add Variables & Cases Initial Data Set Added Variables Merge

73 SAS ® Statistical Procedures Descriptive Procedure for Continuous Data PROC UNIVARIATE; Proc Univariate will provide basic descriptive information for continuous variables The syntax looks like this:

74 SAS ® Statistical Procedures

75 Descriptive Procedure for Categorical Data PROC FREQ; Proc Freq will provide basic descriptive information for categorical or ordinal variables The syntax looks like this:

76 SAS ® Statistical Procedures

77 Analytical Procedures for Continuous Data PROC CORR; Proc Corr provides an analysis of the association between two continuous variables Computes a correlation coefficient that demonstrates the level of association, as well as a p-value showing the significance of that association The syntax looks like this:

78 SAS ® Statistical Procedures Correlation coefficient p-value

79 SAS ® Statistical Procedures Analytical Procedures for Categorical Data PROC FREQ; Proc Freq can also be used to calculate the level of association between two categorical or nominal variables X 2 can be added to assess the significance level of that association The syntax looks like this: DV IV

80 SAS ® Statistical Procedures Crosstab Table Chi-square analysis

81 SAS ® Statistical Procedures PROC FREQ can also be used in conjunction with DEVIATION to analyze the standard deviation Many SAS procedures like this have additional analyses that can be added in this way

82 SAS ® Statistical Procedures Multivariate Analysis: PROC REG; computes the association between a continuous dependent variable and numerous independent variables PROC LOGIT; computes the association between a categorical dependent variable and numerous independent variables

83 SAS ® Statistical Procedures Regression analysis: PROC REG; Uses the “model” command Construct your model with your dependent variable first, then your independent variables The syntax looks like this:

84 SAS ® Statistical Procedures

85 These are only a few examples of the analyses you can do with SAS SAS can also do: Time series analysis Factor analysis ANNOVA T-tests …and more!


Download ppt "Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the."

Similar presentations


Ads by Google