SAS for Data Management and Analysis

Slides:



Advertisements
Similar presentations
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Advertisements

Objectives After completing this lesson, you should be able to do the following: Describe various types of conversion functions that are available in.
Copyright © 2007, Oracle. All rights reserved Using Single-Row Functions to Customize Output Modified: October 21, 2014.
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
SAS Programming: Working With Variables. Data Step Manipulations New variables should be created during a Data step Existing variables should be manipulated.
The Information Delivery Process Data In Information Out ManageOrganizeExploit.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
2 Copyright © 2004, Oracle. All rights reserved. Restricting and Sorting Data.
JavaScript, Fourth Edition
JavaScript, Third Edition
Using Proc Datasets for Efficiency Originally presented as a Coder’s NESUG2000 by Ken Friedman Reviewed by Karol Katz.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Into to SAS ®. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
Creating SAS® Data Sets
Ceng 356-Lab2. Objectives After completing this lesson, you should be able to do the following: Limit the rows that are retrieved by a query Sort the.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
FORMAT FESTIVAL AN INTRODUCTION TO SAS® FORMATS AND INFORMATS By David Maddox.
1 Chapter Two Using Data. 2 Objectives Learn about variable types and how to declare variables Learn how to display variable values Learn about the integral.
11 Chapter 2: Working with Data in a Project 2.1 Introduction to Tabular Data 2.2 Accessing Local Data 2.3 Importing Text Files 2.4 Editing Tables in the.
Chapter 2: Working with Data in a Project
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall1 Exploring Microsoft Office Access Committed to Shaping the Next Generation.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina Chapter 17 supplement: Review of Formatting Data STAT 541.
BMTRY 789 Lecture 3: Categorical Data and Dates Readings – Chapter 3 & 4 Lab Problems 3.1, 3.2, 3.19, 4.1, 4.3, 4.5 Homework – HW 2 Book Problems Due 6/24!
Sales person receive RM200/week plus 9% of their gross sales for that week. Write an algorithms to calculate the sales person’s earning from the input.
Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Input, Output, and Processing
4 Copyright © 2006, Oracle. All rights reserved. Restricting and Sorting Data.
TRANSFORMING SAS DATA SETS Creating new SAS data sets with the SET statement – how the DATA step works Creating and transforming variables – assignment.
EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.
I OWA S TATE U NIVERSITY Department of Animal Science Getting Your Data Into SAS (Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 3 September.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
1 EPIB 698E Lecture 1 Notes Instructor: Raul Cruz 7/9/13.
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
Copyright © 2004, Oracle. All rights reserved. Lecture 4: 1-Retrieving Data Using the SQL SELECT Statement 2-Restricting and Sorting Data Lecture 4: 1-Retrieving.
2 Copyright © 2004, Oracle. All rights reserved. Restricting and Sorting Data.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Summer SAS Workshop Lecture 3. Summer SAS Workshop Website
Controlling Input and Output
Part:2.  Keywords are words with special meaning in JavaScript  Keyword var ◦ Used to declare the names of variables ◦ A variable is a location in the.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.
2 Copyright © 2009, Oracle. All rights reserved. Restricting and Sorting Data.
Chapter 4: Creating List Reports
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
1 Checking Data with the PRINT and FREQ Procedures.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
SAS Certification Prep Guide Chapter 7 Creating and Applying User-Defined Formats.
1 Cleaning Invalid Data. Clean data by using assignment statements in the DATA step. Clean data by using IF-THEN / ELSE statements in the DATA step. 2.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Restricting and Sorting Data
Writing Basic SQL SELECT Statements
BASIC ELEMENTS OF A COMPUTER PROGRAM
Instructor: Raul Cruz-Cano 7/9/2012
Basic Queries Specifying Columns
Instructor: Raul Cruz-Cano
Tamara Arenovich Tony Panzarella
Chapter 3 The DATA DIVISION.
WEB PROGRAMMING JavaScript.
Subsetting Rows with the WHERE clause
Restricting and Sorting Data
DATA TYPES AND OPERATIONS
Restricting and Sorting Data
Presentation transcript:

SAS for Data Management and Analysis Introduction, Reading SAS Data Sets

This new SAS data set must contain the following: An existing data source contains information on Orion Star sales employees from Australia and the United States. A new SAS data set needs to be created that contains a subset of this existing data source. This new SAS data set must contain the following: only the employees from Australia who are Sales Representatives the employee's first name, last name, salary, job title, and hired date labels and formats in the descriptor portion

Reading SAS Data Sets Reading SAS Data Sets libname ; data ; set ; ... run;

The LIBNAME Statement (Review) A library reference name (libref) is needed if a permanent data set is being read or created. The LIBNAME statement assigns a libref to a SAS data library. LIBNAME libref 'SAS-data-library'; DATA output-SAS-data-set; SET input-SAS-data-set; <additional SAS statements> RUN;

SAS for Data Management and Analysis Reading a SAS data set

The Data Step LIBNAME libref 'SAS-data-library'; DATA output-SAS-data-set; SET input-SAS-data-set; WHERE where-expression; KEEP variable-list; LABEL variable = 'label' variable = 'label' variable = 'label'; FORMAT variable(s) format; RUN;

LIBNAME libref 'SAS-data-library'; DATA output-SAS-data-set; SET input-SAS-data-set; WHERE where-expression; KEEP variable-list; LABEL variable = 'label' variable = 'label' variable = 'label'; FORMAT variable(s) format; RUN; Part 1 Part 2 Part 3

SAS for Data Management and Analysis The Data Statement

The DATA Statement The DATA statement begins a DATA step and provides the name of the SAS data set being created. The DATA statement can create temporary or permanent data sets. LIBNAME libref 'SAS-data-library'; DATA output-SAS-data-set; SET input-SAS-data-set; <additional SAS statements> RUN;

SAS for Data Management and Analysis The Set Statement

The SET Statement The SET statement reads observations from a SAS data set for further processing in the DATA step. By default, the SET statement reads all observations and all variables from the input data set. The SET statement can read temporary or permanent data sets. LIBNAME libref 'SAS-data-library'; DATA output-SAS-data-set; SET input-SAS-data-set; <additional SAS statements> RUN;

Make an exact copy. libname orion "/courses/d0f434e5ba27fe300/s5066/prg1"; data work.subset1; set orion.sales; run; proc print data=work.subset1;

SAS for Data Management and Analysis Subsetting Observations WHERE Statement

LIBNAME libref 'SAS-data-library'; DATA output-SAS-data-set; SET input-SAS-data-set; WHERE where-expression; KEEP variable-list; LABEL variable = 'label' variable = 'label' variable = 'label'; FORMAT variable(s) format; RUN; Part 1 Part 2 Part 3

The WHERE Statement WHERE where-expression ; The where-expression is a sequence of operands and operators that form a set of instructions that define a condition for selecting observations. Operands include constants and variables. Operators are symbols that request a comparison, arithmetic calculation, or logical operation.

Operands A constant operand is a fixed value. Character values must be enclosed in quotation marks and are case sensitive. Numeric values do not use quotation marks. A variable operand must be on the input data set. where Gender = 'M'; where Salary > 50000; variable constant variable constant

Comparison Operators Comparison operators compare a variable with a value or with another variable. Symbol Mnemonic Definition = EQ equal to ^= ¬= ~= NE not equal to > GT greater than < LT less than >= GE greater than or equal <= LE less than or equal IN equal to one of a list

Equality title "Equality comparison operator"; data sub1; libname orion "/courses/u_fsu.edu1/i_648881/c_5934/Prg1" access=readonly; title "Equality comparison operator"; data sub1; set orion.sales; where Gender = 'M';/*eq*/ run; proc print data=sub1 noobs; var last_name first_name gender; title;

Non-equality title "Non-equality operator "; data sub1; set orion.sales; where Salary ne .;/* ^= */ run; proc print data=sub1; var last_name first_name salary; title;

Greater Than title "Greater than comparison operator"; data sub1; set orion.sales; where Salary > 84260; run; proc print data=sub1; var last_name first_name salary; title;

Greater than or equal to title "Greater than or equal to operator"; data sub1; set orion.sales; where Salary >= 84260; run; proc print data=sub1; var last_name first_name salary; title;

Less than title "less than operator"; data sub1; set orion.sales; where salary < 25110;/* lt */ run; proc print data=sub1; var last_name first_name salary; title;

Less than or equal to title "less than or equal to operator"; data sub1; set orion.sales; where salary le 25110;/* <= */ run; proc print data=sub1; var last_name first_name salary; title;

SAS for Data Management and Analysis The in operator

In operator title "The in operator with character variable"; data sub1; set orion.sales; where Country in ('AU','US'); run; proc print data=sub1; var last_name first_name country; title;

In operator title "The in operator with numeric variable"; data sub1; set orion.sales; where employee_id in (120102,120142,121025,121145); run; proc print data=sub1; var last_name first_name employee_id ; title;

SAS for Data Management and Analysis Arithmetic Operators

Arithmetic Operators Arithmetic operators indicate that an arithmetic calculation is performed. Symbol Definition ** exponentiation * multiplication / division + addition - subtraction

Arithmetic in where statement title "Arithmetic to find all employees with monthly salary < 2200"; data sub1; set orion.sales; where Salary / 12 < 2200; run; proc print data=sub1; var last_name first_name salary; title;

Arithmetic in where statement title "Arithmetic to find all employees with monthly salary plus 10% greater than 7500"; data sub1; set orion.sales; where Salary / 12 * 1.10 > 7500; run; proc print data=sub1; var last_name first_name salary; title;

Use Parens title "Arithmetic to find all employees with monthly salary plus 10% greater than 7500"; data sub1; set orion.sales; where (Salary / 12 ) * 1.10 >= 7500;/* PEMDAS */ run; proc print data=sub1; var last_name first_name salary; title;

SAS for Data Management and Analysis Logical Operators

Logical Operators Symbol Mnemonic Definition & AND logical and | OR logical or ^ ¬ ~ NOT logical not

Logical And title "Logical AND operator"; data sub1; set orion.sales; /* parens add readability*/ where (Gender ne 'M') and (Salary >=50000); run; proc print data=sub1; var last_name first_name gender salary; title

Logical Or title "Logical OR operator"; data sub1; set orion.sales; /* parens add readability*/ where (Gender ne 'M') or (Salary >= 50000); run; proc print data=sub1; var last_name first_name gender salary; title;

Logical Not title "Logical NOT operator"; data sub1; set orion.sales; /* parens add readability*/ where Country not in ('AU', 'US'); run; /*note we get no observations*/ proc print data=sub1; var last_name first_name country; title;

More complex example title "Logical operators, a more complex example"; data sub1; set orion.sales; /* parens add readability*/ where ( (country ne 'US') and (salary ge 50000) ) or ( (country ='US') and (salary gt 75000) ); run; proc print data=sub1; var last_name first_name country salary; title;

SAS for Data Management and Analysis Special where operators

Special WHERE Operators Symbol Mnemonic Definition BETWEEN-AND an inclusive range IS NULL missing value IS MISSING ? CONTAINS a character string LIKE a character pattern

Between Operator libname dlmsas "/courses/d0f434e5ba27fe300/s5066/prg1" access=readonly; title "A small dataset to demonstrate special operators"; proc print data=dlmSAS.small_sales; run; title "List of salaries between $40,000 and $50,000"; data sub1; set dlmSas.small_sales; where salary between 40000 and 50000; proc print data=sub1;

Not Between data sub2; set dlmSas.small_sales; title "List of salaries excluding those between $40,000 and $50,000"; data sub2; set dlmSas.small_sales; where salary not between 40000 and 50000; run; proc print data=sub2; title;

Is null operator title "List of names with missing salary"; data sub1; set dlmSas.small_sales; where salary is null; run; proc print data=sub1; title;

Contains Operator title "List of names that contain the string 'an' "; data sub1; set dlmSas.small_sales; where name contains 'an'; run; proc print data=sub1; title;

Like operator There are two special characters available for specifying a pattern: A percent sign (%) replaces any number of characters. An underscore (_) replaces one character. title "List all names that begin with 'M'"; data sub1; set dlmsas.small_sales; where name like 'M%'; run; proc print data=sub1; There are two special characters available for specifying a pattern: A percent sign (%) replaces any number of characters. An underscore (_) replaces one character. Consecutive underscores can be specified. A percent sign and an underscore can be specified in the same pattern. The operator is case sensitive.

Like operator title "Names that contain 'i' followed by any character followed by 'a' "; data sub2; set dlmsas.small_sales; where name like '%i_a%'; run; proc print data=sub2;

SAS for Data Management and Analysis Subsetting variables Drop and keep statements

The DROP and KEEP Statements The DROP statement specifies the names of the variables to omit from the output data set(s). The KEEP statement specifies the names of the variable to write to the output data set(s). The variable-list specifies the variables to drop or keep, respectively, in the output data set. DROP variable-list ; KEEP variable-list ;

Drop statement title "Contents before the drop statement"; libname mysas "/courses/u_fsu.edu1/i_648881/c_5934/Prg1" access=readonly; title "Contents before the drop statement"; proc contents data=mysas.sales; run; data subset; set mysas.sales; drop Employee_ID Gender Country Birth_Date; title "Contents after the drop statement"; proc contents data=subset; title;

Keep statement title "Contents before the keep statement"; proc contents data=mysas.sales; run; data subset; set mysas.sales; keep First_Name Last_Name Salary Job_Title Hire_Date; title "Contents after the keep statement"; proc contents data=subset; title;

LIBNAME libref 'SAS-data-library'; DATA output-SAS-data-set; SET input-SAS-data-set; WHERE where-expression; KEEP variable-list; LABEL variable = 'label' variable = 'label' variable = 'label'; FORMAT variable(s) format; RUN; Part 1 Part 2 Part 3

SAS for Data Management and Analysis Adding Permanent Attributes

Labels and formats can be stored in the descriptor portion.

The LABEL Statement LABEL variable = 'label' variable = 'label' variable = 'label'; A label can be up to 256 characters. Any number of variables can be associated with labels in a single LABEL statement. Using a LABEL statement in a DATA step permanently associates labels with variables by storing the label in the descriptor portion of the SAS data set.

The FORMAT Statement assigns formats to variable values. A format is an instruction that SAS uses to write data values. Using a FORMAT statement in a DATA step permanently associates formats with variables by storing the format in the descriptor portion of the SAS data set. FORMAT variable(s) format ;

SAS for Data Management and Analysis SAS formats

SAS Formats SAS formats have the following form: <$>format<w>.<d> $ indicates a character format format names the SAS format or user-defined format w specifies the total format width including decimal places and special characters . required delimiter d specifies the number of decimal places in numeric formats

Documentation SAS formats and informats: http://support.sas.com/documentation/cdl/en/leforinforref/64790/HTML/default/viewer.htm#titlepage.htm

SAS Formats Selected SAS formats: Format Definition $w. Writes standard character data w.d Writes standard numeric data COMMAw.d Writes numeric values with a comma that separates every three digits and a period that separates the decimal fraction COMMAXw.d Writes numeric values with a period that separates every three digits and a comma that separates the decimal fraction DOLLARw.d Writes numeric values with a leading dollar sign, a comma that separates every three digits, and a period that separates the decimal fraction EUROXw.d Writes numeric values with a leading euro symbol (€), a period that separates every three digits, and a comma that separates the decimal fraction Selected SAS formats: Do not read the definitions for each format. Quickly give an explanation because examples follow on the next slide.

Selected SAS Formats Format Stored Value Displayed Value $4. Programming Prog 12. 27134.2864 27134 12.2 27134.29 COMMA12.2 27,134.29 COMMAX12.2 27.134,29 DOLLAR12.2 $27,134.29 EUROX12.2 €27.134,29

SAS Formats If you do not specify a format width that is large enough to accommodate a numeric value, the displayed value is automatically adjusted to fit into the width. Format Stored Value Displayed Value DOLLAR12.2 27134.2864 $27,134.29 DOLLAR9.2 $27134.29 DOLLAR8.2 27134.29 DOLLAR5.2 27134 DOLLAR4.2 27E3

SAS Date Formats Format Stored Value Displayed Value MMDDYY6. 010160 010160 MMDDYY8. 01/01/60 MMDDYY10. 01/01/1960 DDMMYY6. 365 311260 DDMMYY8. 31/12/60 DDMMYY10. 31/12/1960

SAS Date Formats Format Stored Value Displayed Value DATE7. -1 31DEC59 WORDDATE. January 1, 1960 WEEKDATE. Friday, January 1, 1960 MONYY7. JAN1960 YEAR4. 1960

English_UnitedStates SAS Date Formats Format Locale Example NLDATEw. English_UnitedStates January 01, 1960 German_Germany 01. Januar 1960 NLDATEMNw. January Januar NLDATEWw. Fri, Jan 01, 60 Fr, 01. Jan 60 NLDATEWNw. Friday Freitag

SAS for Data Management and Analysis Variable Labels

Add Variable Labels libname orion "/courses/d0f434e5ba27fe300/s5066/prg1"; title "Contents of original dataset"; proc contents data=orion.sales; data subset1; set orion.sales; label Job_Title='Sales Title' Hire_Date='Date Hired'; run; title "Contents of new dataset with labels added"; proc contents data=work.subset1; title "Sales dataset with labels added"; proc print data=work.subset1 label;/*label option*/ title;

Add Labels and Formats data work.subset1; set orion.sales; label Job_Title='Sales Title' Hire_Date='Date Hired'; format Salary comma8. Hire_Date ddmmyy10.; run; title "Contents with labels and formats added, comma format for salary"; proc contents data=work.subset1; proc print data=work.subset1 label; var employee_id salary hire_date;

SAS for Data Management and Analysis A note on computer logic A problematic or statement

Problematic Version data temp; input X; if X =3 or 4 then Match = 'Yes'; else Match = 'No'; datalines; 3 7 . ; title "Listing of Results"; proc print data=temp noobs; run; title;

Corrected Version data temp; input X; if X in (3,4) then Match = 'Yes'; else Match = 'No'; datalines; 3 7 . ; title "Listing of Results"; proc print data=temp noobs; run; title;

Arithmetic with logic statements data tmp; input x y; equal=(x=y); datalines; 7 7 7 6 5 5 3 4 ; proc print data=tmp; run;