Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAS for Data Management and Analysis

Similar presentations


Presentation on theme: "SAS for Data Management and Analysis"— Presentation transcript:

1 SAS for Data Management and Analysis
Introduction, Reading SAS Data Sets

2 This new SAS data set must contain the following:
An existing data source contains information on Orion Star sales employees from Australia and the United States. A new SAS data set needs to be created that contains a subset of this existing data source. This new SAS data set must contain the following: only the employees from Australia who are Sales Representatives the employee's first name, last name, salary, job title, and hired date labels and formats in the descriptor portion

3 Reading SAS Data Sets Reading SAS Data Sets libname ; data ; set ; ...
run;

4 The LIBNAME Statement (Review)
A library reference name (libref) is needed if a permanent data set is being read or created. The LIBNAME statement assigns a libref to a SAS data library. LIBNAME libref 'SAS-data-library'; DATA output-SAS-data-set; SET input-SAS-data-set; <additional SAS statements> RUN;

5 SAS for Data Management and Analysis
Reading a SAS data set

6 The Data Step LIBNAME libref 'SAS-data-library';
DATA output-SAS-data-set; SET input-SAS-data-set; WHERE where-expression; KEEP variable-list; LABEL variable = 'label' variable = 'label' variable = 'label'; FORMAT variable(s) format; RUN;

7 LIBNAME libref 'SAS-data-library'; DATA output-SAS-data-set;
SET input-SAS-data-set; WHERE where-expression; KEEP variable-list; LABEL variable = 'label' variable = 'label' variable = 'label'; FORMAT variable(s) format; RUN; Part 1 Part 2 Part 3

8 SAS for Data Management and Analysis
The Data Statement

9 The DATA Statement The DATA statement begins a DATA step and provides the name of the SAS data set being created. The DATA statement can create temporary or permanent data sets. LIBNAME libref 'SAS-data-library'; DATA output-SAS-data-set; SET input-SAS-data-set; <additional SAS statements> RUN;

10 SAS for Data Management and Analysis
The Set Statement

11 The SET Statement The SET statement reads observations from a SAS data set for further processing in the DATA step. By default, the SET statement reads all observations and all variables from the input data set. The SET statement can read temporary or permanent data sets. LIBNAME libref 'SAS-data-library'; DATA output-SAS-data-set; SET input-SAS-data-set; <additional SAS statements> RUN;

12 Make an exact copy. libname orion "/courses/d0f434e5ba27fe300/s5066/prg1"; data work.subset1; set orion.sales; run; proc print data=work.subset1;

13 SAS for Data Management and Analysis
Subsetting Observations WHERE Statement

14 LIBNAME libref 'SAS-data-library'; DATA output-SAS-data-set;
SET input-SAS-data-set; WHERE where-expression; KEEP variable-list; LABEL variable = 'label' variable = 'label' variable = 'label'; FORMAT variable(s) format; RUN; Part 1 Part 2 Part 3

15 The WHERE Statement WHERE where-expression ; The where-expression is a sequence of operands and operators that form a set of instructions that define a condition for selecting observations. Operands include constants and variables. Operators are symbols that request a comparison, arithmetic calculation, or logical operation.

16 Operands A constant operand is a fixed value.
Character values must be enclosed in quotation marks and are case sensitive. Numeric values do not use quotation marks. A variable operand must be on the input data set. where Gender = 'M'; where Salary > 50000; variable constant variable constant

17 Comparison Operators Comparison operators compare a variable with a value or with another variable. Symbol Mnemonic Definition = EQ equal to ^= ¬= ~= NE not equal to > GT greater than < LT less than >= GE greater than or equal <= LE less than or equal IN equal to one of a list

18 Equality title "Equality comparison operator"; data sub1;
libname orion "/courses/u_fsu.edu1/i_648881/c_5934/Prg1" access=readonly; title "Equality comparison operator"; data sub1; set orion.sales; where Gender = 'M';/*eq*/ run; proc print data=sub1 noobs; var last_name first_name gender; title;

19 Non-equality title "Non-equality operator "; data sub1;
set orion.sales; where Salary ne .;/* ^= */ run; proc print data=sub1; var last_name first_name salary; title;

20 Greater Than title "Greater than comparison operator"; data sub1;
set orion.sales; where Salary > 84260; run; proc print data=sub1; var last_name first_name salary; title;

21 Greater than or equal to
title "Greater than or equal to operator"; data sub1; set orion.sales; where Salary >= 84260; run; proc print data=sub1; var last_name first_name salary; title;

22 Less than title "less than operator"; data sub1; set orion.sales;
where salary < 25110;/* lt */ run; proc print data=sub1; var last_name first_name salary; title;

23 Less than or equal to title "less than or equal to operator";
data sub1; set orion.sales; where salary le 25110;/* <= */ run; proc print data=sub1; var last_name first_name salary; title;

24 SAS for Data Management and Analysis
The in operator

25 In operator title "The in operator with character variable";
data sub1; set orion.sales; where Country in ('AU','US'); run; proc print data=sub1; var last_name first_name country; title;

26 In operator title "The in operator with numeric variable"; data sub1;
set orion.sales; where employee_id in (120102,120142,121025,121145); run; proc print data=sub1; var last_name first_name employee_id ; title;

27 SAS for Data Management and Analysis
Arithmetic Operators

28 Arithmetic Operators Arithmetic operators indicate that an arithmetic calculation is performed. Symbol Definition ** exponentiation * multiplication / division + addition - subtraction

29 Arithmetic in where statement
title "Arithmetic to find all employees with monthly salary < 2200"; data sub1; set orion.sales; where Salary / 12 < 2200; run; proc print data=sub1; var last_name first_name salary; title;

30 Arithmetic in where statement
title "Arithmetic to find all employees with monthly salary plus 10% greater than 7500"; data sub1; set orion.sales; where Salary / 12 * 1.10 > 7500; run; proc print data=sub1; var last_name first_name salary; title;

31 Use Parens title "Arithmetic to find all employees with monthly salary plus 10% greater than 7500"; data sub1; set orion.sales; where (Salary / 12 ) * 1.10 >= 7500;/* PEMDAS */ run; proc print data=sub1; var last_name first_name salary; title;

32 SAS for Data Management and Analysis
Logical Operators

33 Logical Operators Symbol Mnemonic Definition & AND logical and | OR
logical or ^ ¬ ~ NOT logical not

34 Logical And title "Logical AND operator"; data sub1; set orion.sales;
/* parens add readability*/ where (Gender ne 'M') and (Salary >=50000); run; proc print data=sub1; var last_name first_name gender salary; title

35 Logical Or title "Logical OR operator"; data sub1; set orion.sales;
/* parens add readability*/ where (Gender ne 'M') or (Salary >= 50000); run; proc print data=sub1; var last_name first_name gender salary; title;

36 Logical Not title "Logical NOT operator"; data sub1; set orion.sales;
/* parens add readability*/ where Country not in ('AU', 'US'); run; /*note we get no observations*/ proc print data=sub1; var last_name first_name country; title;

37 More complex example title "Logical operators, a more complex example"; data sub1; set orion.sales; /* parens add readability*/ where ( (country ne 'US') and (salary ge 50000) ) or ( (country ='US') and (salary gt 75000) ); run; proc print data=sub1; var last_name first_name country salary; title;

38 SAS for Data Management and Analysis
Special where operators

39 Special WHERE Operators
Symbol Mnemonic Definition BETWEEN-AND an inclusive range IS NULL missing value IS MISSING ? CONTAINS a character string LIKE a character pattern

40 Between Operator libname dlmsas "/courses/d0f434e5ba27fe300/s5066/prg1" access=readonly; title "A small dataset to demonstrate special operators"; proc print data=dlmSAS.small_sales; run; title "List of salaries between $40,000 and $50,000"; data sub1; set dlmSas.small_sales; where salary between and 50000; proc print data=sub1;

41 Not Between data sub2; set dlmSas.small_sales;
title "List of salaries excluding those between $40,000 and $50,000"; data sub2; set dlmSas.small_sales; where salary not between and 50000; run; proc print data=sub2; title;

42 Is null operator title "List of names with missing salary"; data sub1;
set dlmSas.small_sales; where salary is null; run; proc print data=sub1; title;

43 Contains Operator title "List of names that contain the string 'an' ";
data sub1; set dlmSas.small_sales; where name contains 'an'; run; proc print data=sub1; title;

44 Like operator There are two special characters available for specifying a pattern: A percent sign (%) replaces any number of characters. An underscore (_) replaces one character. title "List all names that begin with 'M'"; data sub1; set dlmsas.small_sales; where name like 'M%'; run; proc print data=sub1; There are two special characters available for specifying a pattern: A percent sign (%) replaces any number of characters. An underscore (_) replaces one character. Consecutive underscores can be specified. A percent sign and an underscore can be specified in the same pattern. The operator is case sensitive.

45 Like operator title "Names that contain 'i' followed by any character followed by 'a' "; data sub2; set dlmsas.small_sales; where name like '%i_a%'; run; proc print data=sub2;

46 SAS for Data Management and Analysis
Subsetting variables Drop and keep statements

47 The DROP and KEEP Statements
The DROP statement specifies the names of the variables to omit from the output data set(s). The KEEP statement specifies the names of the variable to write to the output data set(s). The variable-list specifies the variables to drop or keep, respectively, in the output data set. DROP variable-list ; KEEP variable-list ;

48 Drop statement title "Contents before the drop statement";
libname mysas "/courses/u_fsu.edu1/i_648881/c_5934/Prg1" access=readonly; title "Contents before the drop statement"; proc contents data=mysas.sales; run; data subset; set mysas.sales; drop Employee_ID Gender Country Birth_Date; title "Contents after the drop statement"; proc contents data=subset; title;

49 Keep statement title "Contents before the keep statement";
proc contents data=mysas.sales; run; data subset; set mysas.sales; keep First_Name Last_Name Salary Job_Title Hire_Date; title "Contents after the keep statement"; proc contents data=subset; title;

50 LIBNAME libref 'SAS-data-library'; DATA output-SAS-data-set;
SET input-SAS-data-set; WHERE where-expression; KEEP variable-list; LABEL variable = 'label' variable = 'label' variable = 'label'; FORMAT variable(s) format; RUN; Part 1 Part 2 Part 3

51 SAS for Data Management and Analysis
Adding Permanent Attributes

52 Labels and formats can be stored in the descriptor portion.

53 The LABEL Statement LABEL variable = 'label' variable = 'label' variable = 'label'; A label can be up to 256 characters. Any number of variables can be associated with labels in a single LABEL statement. Using a LABEL statement in a DATA step permanently associates labels with variables by storing the label in the descriptor portion of the SAS data set.

54 The FORMAT Statement assigns formats to variable values.
A format is an instruction that SAS uses to write data values. Using a FORMAT statement in a DATA step permanently associates formats with variables by storing the format in the descriptor portion of the SAS data set. FORMAT variable(s) format ;

55 SAS for Data Management and Analysis
SAS formats

56 SAS Formats SAS formats have the following form:
<$>format<w>.<d> $ indicates a character format format names the SAS format or user-defined format w specifies the total format width including decimal places and special characters . required delimiter d specifies the number of decimal places in numeric formats

57 Documentation SAS formats and informats:

58 SAS Formats Selected SAS formats: Format Definition $w.
Writes standard character data w.d Writes standard numeric data COMMAw.d Writes numeric values with a comma that separates every three digits and a period that separates the decimal fraction COMMAXw.d Writes numeric values with a period that separates every three digits and a comma that separates the decimal fraction DOLLARw.d Writes numeric values with a leading dollar sign, a comma that separates every three digits, and a period that separates the decimal fraction EUROXw.d Writes numeric values with a leading euro symbol (€), a period that separates every three digits, and a comma that separates the decimal fraction Selected SAS formats: Do not read the definitions for each format. Quickly give an explanation because examples follow on the next slide.

59 Selected SAS Formats Format Stored Value Displayed Value $4.
Programming Prog 12. 27134 12.2 COMMA12.2 27,134.29 COMMAX12.2 27.134,29 DOLLAR12.2 $27,134.29 EUROX12.2 €27.134,29

60 SAS Formats If you do not specify a format width that is large enough to accommodate a numeric value, the displayed value is automatically adjusted to fit into the width. Format Stored Value Displayed Value DOLLAR12.2 $27,134.29 DOLLAR9.2 $ DOLLAR8.2 DOLLAR5.2 27134 DOLLAR4.2 27E3

61 SAS Date Formats Format Stored Value Displayed Value MMDDYY6. 010160
010160 MMDDYY8. 01/01/60 MMDDYY10. 01/01/1960 DDMMYY6. 365 311260 DDMMYY8. 31/12/60 DDMMYY10. 31/12/1960

62 SAS Date Formats Format Stored Value Displayed Value DATE7. -1 31DEC59
WORDDATE. January 1, 1960 WEEKDATE. Friday, January 1, 1960 MONYY7. JAN1960 YEAR4. 1960

63 English_UnitedStates
SAS Date Formats Format Locale Example NLDATEw. English_UnitedStates January 01, 1960 German_Germany 01. Januar 1960 NLDATEMNw. January Januar NLDATEWw. Fri, Jan 01, 60 Fr, 01. Jan 60 NLDATEWNw. Friday Freitag

64 SAS for Data Management and Analysis
Variable Labels

65 Add Variable Labels libname orion "/courses/d0f434e5ba27fe300/s5066/prg1"; title "Contents of original dataset"; proc contents data=orion.sales; data subset1; set orion.sales; label Job_Title='Sales Title' Hire_Date='Date Hired'; run; title "Contents of new dataset with labels added"; proc contents data=work.subset1; title "Sales dataset with labels added"; proc print data=work.subset1 label;/*label option*/ title;

66 Add Labels and Formats data work.subset1; set orion.sales;
label Job_Title='Sales Title' Hire_Date='Date Hired'; format Salary comma8. Hire_Date ddmmyy10.; run; title "Contents with labels and formats added, comma format for salary"; proc contents data=work.subset1; proc print data=work.subset1 label; var employee_id salary hire_date;

67 SAS for Data Management and Analysis
A note on computer logic A problematic or statement

68 Problematic Version data temp; input X;
if X =3 or 4 then Match = 'Yes'; else Match = 'No'; datalines; 3 7 . ; title "Listing of Results"; proc print data=temp noobs; run; title;

69 Corrected Version data temp; input X;
if X in (3,4) then Match = 'Yes'; else Match = 'No'; datalines; 3 7 . ; title "Listing of Results"; proc print data=temp noobs; run; title;

70 Arithmetic with logic statements
data tmp; input x y; equal=(x=y); datalines; 7 7 7 6 5 5 3 4 ; proc print data=tmp; run;


Download ppt "SAS for Data Management and Analysis"

Similar presentations


Ads by Google