Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.

Slides:



Advertisements
Similar presentations
The SAS ® System Additional Information on Statistical Analysis Programming.
Advertisements

Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Benchmark Series Microsoft Access 2010 Level 1
Module R2 CS450. Next Week R1 is due next Friday ▫Bring manuals in a binder - make sure to have a cover page with group number, module, and date. You.
Chapter 3: Editing and Debugging SAS Programs. Some useful tips of using Program Editor Add line number: In the Command Box, type num, enter. Save SAS.
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
Statistics in Science  Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy.
1 Computer Applications in Epidemiology Dongmei Li Lecture 26 5/6/2009.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Into to SAS ®. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
Creating SAS® Data Sets
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Chapter 21 Reading Hierarchical Files Reading Hierarchical Raw Data Files.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Introduction to SAS Essentials Mastering SAS for Data Analytics
Chapter 20 Creating Multiple Observations from a Single Record Objectives Create multiple observations from a single record containing repeating blocks.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the.
SAS Macro: Some Tips for Debugging Stat St. Paul’s Hospital April 2, 2007.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.
Chapter 1: Introduction to SAS  SAS programs: A sequence of statements in a particular order  Rules for SAS statements: –Every SAS statement ends in.
I OWA S TATE U NIVERSITY Department of Animal Science Getting Your Data Into SAS (Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 3 September.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Introduction to SAS Essentials Mastering SAS for Data Analytics
SQL Chapter Two. Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows.
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
Here’s another problem (see section 2.13 on page 54). A file contains two different types of records (say A’s and B’s) and we only want to read in the.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
Controlling Input and Output
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
Chapter 2 Getting Data into SAS Directly enter data into SAS data sets –use the ViewTable window. You can define columns (variables) with the Column Attributes.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Chapter 11 Reading SAS Data
By Sasikumar Palanisamy
SQL and SQL*Plus Interaction
Chapter 2: Getting Data into SAS
SAS Programming Introduction to SAS.
Intro to PHP & Variables
Chapter 1: Introduction to SAS
Instructor: Raul Cruz-Cano
Chapter 22 Reading Hierarchical Files
SAS Essentials How SAS Thinks
Introduction to SAS A SAS program is a list of SAS statements executed in order Every SAS statement ends with a semicolon! SAS statements can be in caps.
Introduction to SAS Essentials Mastering SAS for Data Analytics
Introduction to SAS Essentials Mastering SAS for Data Analytics
Introduction to SAS Essentials Mastering SAS for Data Analytics
Introduction to SAS Essentials Mastering SAS for Data Analytics
Introduction to SAS Essentials Mastering SAS for Data Analytics
Introduction to SAS Essentials Mastering SAS for Data Analytics
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1

Intro to SAS Chapter 3 Part 2

3.9 GOING DEEPER: UNDERSTANDING HOW THE DATA STEP READS AND STORES DATA  To understand how SAS works, it can be helpful to "look under the hood" and see what is happening in all those bits and bytes as SAS reads and processes data. Knowing how SAS handles data is a little bit like knowing how the motor in your car works. You can usually drive around okay without knowing anything about pistons, but sometimes, it is good to know what that knocking sound means.

How SAS Thinks  UNDERSTANDING HOWTHE DATA STEP READS AND STORES DATA  Data Step Processing  The DATA Step vs The PROC Step  More about reading data files  Review of how to read data into SAS

Consider the following program  Using the following SAS program: DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*(9/5)+32; DATALINES; ; run; proc print;run; How does SAS read in this data and create a SAS data set? This code calculates a value and creates a variable named TEMPF. We’ll learn more about calculations later…

Overview of SAS Data Step Compile Phase (Look at Syntax) Execution Phase (Read data, Calculate) Output Phase (Create Data Set)

Concepts… COMPILE - SAS reads the syntax of the SAS program to see if there are any errors in the code. If there are no errors found, SAS “compiles” this code – that is, it transforms the SAS code into a code used internally by SAS. (You don’t need to know this internal code.) EXECUTION - If the code syntax checks out, SAS begins performing the tasks specified by the code. For example, the first line of code is DATA NEW, so during the execution phase, SAS creates a “blank” dataset (no data in it) named NEW that it will use to put the data into as it is read. OUTPUT -SAS reads in each data line. It interprets this line of data into the values for each variable and stores them into the data set one line at a time until all data have been output into the specified data set.

Compile Phase DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*(9/5)+32; DATALINES; ; run; proc print;run; SAS Checks the syntax of the program. Identifies type and length of each variable Does any variable need conversion? If everything is okay, proceed to the next step. If errors are discovered, SAS attempts to interpret what you mean. If SAS can’t correct the error, it prints an error message to the log.

Create Input Buffer  SAS creates an input buffer  INPUT BUFFER contains data as it is read in DATALINES; ; INPUT BUFFER

Execution Phase  PROGRAM DATA VECTOR (PDV) is created and contains information about the variables  Two automatic variables _N_ and _ERROR_ and a position for each of the four variables in the DATA step.  Sets _N_ = 1 _ERROR_ = 0 (no initial error) and remaining variables to missing. _N__ERROR_IDAGETEMPCTEMPF 10...

Buffer to PDV _N__ERROR_IDAGETEMPCTEMPF Calculated value Buffer PDV _N__ERROR_IDAGETEMPCTEMPF Processes the code TEMPF=TEMPC*(9/5)+32; Initially missing Reads 1 st record If there is an executable statement…

Output Phase  The values in the PDV are written to the output data set (NEW) as the first observation: _N__ERROR_IDAGETEMPCTEMPF IDAGETEMPCTEMPF This is the first record in the output data set named “NEW.” Note that _N_ and _ERROR_ are dropped. From PDV Write data to data set.

Exceptions to Missing in PDV  Some data values are not initially set to missing in the PDV  variables in a RETAIN statement  variables created in a SUM statement  data elements in a _TEMPORARY_ array  variables created with options in the FILE or INFILE statements  These exceptions are covered later. _N__ERROR _ IDAG E TEMP C TEMPF Initial values usually set to missing in PDV

Know the Difference  INPUT BUFFER  PROGRAM DATA VECTOR

Next data record read  Once SAS finished reading the first data record, it continues the same process, and reads the second record…sending results to output data set (named NEW in this case.)  …and so on for all records. IDAGETEMPCTEMPF

Descriptor Information  For the data set, SAS creates and maintains a description about each SAS data set:  data set attributes  variable attributes  the name of the data set  member type, the date and time that the data set was created, and the number, names and data types (character or numeric) of the variables.

Data Set Description proc datasets ; contents data=new; run; Contents output… (abbreviated) #NameMember Type File SizeLast Modified 1NEWDATA512020Nov13:0 8:59:32 Alternate program proc contents data= new; run;

Description output continued… Data Set NameWORK.NEWObservations2 Member TypeDATAVariables4 EngineV9Indexes0 CreatedWed, Nov 20, :59:32 AM Observation Length32 Last ModifiedWed, Nov 20, :59:32 AM Deleted Observations 0 ProtectionCompressedNO Data Set TypeSortedNO Label Data RepresentationWINDOWS_64 Encodingwlatin1 Western (Windows)

Description output continued… Alphabetic List of Variables and Attributes #VariableTypeLen 2AGENum8 1IDChar8 3TEMPCNum8 4TEMPFNum8

Example -- How Errors are Found During Compilation  REVIEW THIS PROGRAM DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*(9/5)+32; DATALINES; ; run; proc print;run;

Original Program DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*(9/5)+32; DATALINES; ; run; proc print;run; ObsIDAGETEMP C TEMP F Program output

Example of Error DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*(9/5)+32 DATALINES; ; run; proc print;run; proc datasets ; contents data=new; run; Missing Semi-colon

76 DATA NEW; 77 INPUT ID $ AGE TEMPC; 78 TEMPF=TEMPC*(9/5) DATALINES; ERROR : Syntax error, expecting one of the following: !, !!, &, *, **, +, -, /,, =, >, > =, AND, EQ, GE, GT, IN, LE, LT, MAX, MIN, NE, NG, NL, NOTIN, OR, ^=, |, ||, ~=. ERROR : Statement is not valid or it is used out of proper order ; 83 run; ERROR: No DATALINES or INFILE statement. Error found during compilation

Summary - Compilation Phase  During Compilation  Check syntax  Identify type and length of each new variable (is a data type conversion needed?)  creates input buffer if there is an INPUT statement for an external file  creates the Program Data Vector (PDV)  creates descriptor information for data sets and variable attributes  Other options not discussed here: DROP; KEEP; RENAME; RETAIN; WHERE; LABEL; LENGTH; FORMAT; ARRAY; BY; ATTRIB; END=, IN=, FIRST, LAST, POINT=

Summary – Execution Phase 1. The DATA step iterates once for each observation being created. 2. Each time the DATA statement executes, _N_ is incremented by Newly created variables set to missing in the PDV. 4. SAS reads a data record from a raw data file into the input buffer (there are other possibilities not discussed here). 5. SAS executes any other programming statements for the current record. 6. At the end of the data statements (RUN;) SAS writes an observation to the SAS data set (OUTPUT PHASE) 7. SAS returns to the top of the DATA step (Step 3 above) 8. The DATA step terminates when there is no more data.

Quiz - Find Syntax Errors DATA MYDATA; INPUT ID $ SBP DBP GENDER $ AGE WT; DATALINES; M F M F F ; PROC PRINT; RUN; Where is the syntax error?

Find Syntax Errors DATA MYDATA; INFILE 'C:\SASDATA\EXAMPLE.DAT'; INPUT ID $ 1-3 GP $ 5 AGE 6-9 TIME TIME TIME ; DATALINES; PROC MEANS; RUN; Where is the syntax error?

Find Syntax Errors DATA MYDATA; INFILE 'C:\SASDATA\EXAMPLE.CSV'; DLM=', ' FIRSTOBS=2 OBS=26; INPUT GROUP $ AGE TIME2 TIME3 Time4 SOCIO; PROC MEANS; RUN ; Where is the syntax error?

Character Variable LENGTH in SAS  By default, character variables have a length of 8. DATA NAMES; INPUT FIRST $ LAST $ AGE; DATALINES; GEORGE WASHINGTON 30 JAMES ADAMS 34 BERNIE RUMPELSTILTSKIN 55 ; proc print; run;

Results ObsFIRSTLASTAGE 1GEORGEWASHINGT30 2JAMESADAMS34 3BERNIERUMPELST55 NOTE THE PROBLEM

Use LENGTH Statement data names; LENGTH LAST $15.; input FIRST $ LAST $ AGE; Etc… ObsLASTFIRSTAGE 1WASHINGTONGEORGE30 2ADAMSJAMES34 3RUMPELSTILTSKINBERNIE55 Problem corrected…

Missing Data in Freeform data names; LENGTH LAST $15. input FIRST $ LAST $ AGE; DATALINES; GEORGE WASHINGTON 30 JAMES ADAMS BERNIE RUMPELSTILTSKIN 55 ; proc print; run; Note: No AGE for JAMES Adams

Results are ObsLASTFIRSTAGE 1WASHINGTONGEORGE30 2ADAMSJAMES Did not read all of the data!

Indicate missing data data names; LENGTH LAST $15. input FIRST $ LAST $ AGE; DATALINES; GEORGE WASHINGTON 30 JAMES ADAMS. BERNIE RUMPELSTILTSKIN 55 ; proc print; run; Note: Note missing value denoted as dot (.)

Results ObsLASTFIRSTAGE 1WASHINGTONGEORGE30 2ADAMSJAMES. 3RUMPELSTILTSKINBERNIE55

Missing Data in Column DATA MYDATA; INPUT ID $ 1 SBP 2-4 DBP 5-7 GENDER $ 8 AGE 9-10 WT 11-13; DATALINES; 1120 M F F F20110 ; RUN; PROC PRINT; RUN; 1.Read in DCOLUMN.SAS 2.Delete the 80 in the first record, the 150 in the 4 th record, and M in record 3 (preserve columns) 3.Change PROC MEANS to PROC PRINT 4.Run the program and observe output

Resulting Output ObsIDSBPDBPGENDERAGEWT M F F F20110 Note the blanks in the data set are read as missing values – the numeric missing values are indicated by dot (.) and text missing values are indicated with a blank. This works if data are read using column or formatted input.

Location of created Data sets  In the left window in SAS, click on the Explore tab. Notice the Contents of “SAS Environment”  Click on the Libraries icon.  You will see several “Libraries” including Work.  Click on Work

Work Library  The WORK library contains all of the SAS data sets we’ve created so far. (You may have a little different list)  Double Click on the one named Employees

SAS Viewtable  SAS Viewtable displays the contents of a data set.  Click on “x” to close  Note that the name is work.employees  Close viewer

3.10 SUMMARY  This chapter defined the difference between temporary and permanent data sets and illustrated  several methods for importing data sets into SAS using either the SAS Wizard or  PROC IMPORT. Finally, the way SAS "thinks" as it is inputting data is explained.

SAS ESSENTIALS -- Elliott & Woodward42 These slides are based on the book: Introduction to SAS Essentials Mastering SAS for Data Analytics, 2 nd Edition By Alan C, Elliott and Wayne A. Woodward Paperback: 512 pages Publisher: Wiley; 2 edition (August 3, 2015) Language: English ISBN-10: X ISBN-13: These slides are provided for you to use to teach SAS using this book. Feel free to modify them for your own needs. Please send comments about errors in the slides (or suggestions for improvements) to Thanks.