SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.

Slides:



Advertisements
Similar presentations
The SAS ® System Additional Information on Statistical Analysis Programming.
Advertisements

The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
Chapter 17 Read Raw Data in Fixed Format using Formatted Input Objectives Distinguish between standard and nonstandard numeric data Read standard fixed-field.
Knowing Understanding the Basics Writing your own code part 2 SAS Lab.
Creating a Compact Columnar Output with PROC REPORT Walter R. Young Principal Clinical Programmer Analyst Wyeth.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Basic And Advanced SAS Programming
Using Proc Datasets for Efficiency Originally presented as a Coder’s NESUG2000 by Ken Friedman Reviewed by Karol Katz.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Into to SAS ®. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
Creating SAS® Data Sets
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
FORMAT FESTIVAL AN INTRODUCTION TO SAS® FORMATS AND INFORMATS By David Maddox.
Collection and Analysis of Data CPH 608 Spring 2015.
11 Chapter 2: Working with Data in a Project 2.1 Introduction to Tabular Data 2.2 Accessing Local Data 2.3 Importing Text Files 2.4 Editing Tables in the.
Chapter 2: Working with Data in a Project
Lecture 5 Sorting, Printing, and Summarizing Your Data.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
PREPARING DATA FOR STATISTICAL ANALYSIS Data Cleaning Data Cleaning Dataset Preparation Dataset Preparation Documentation Documentation 9 September 2008.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.
Bringing Data into SAS From Menu: –File –Import Data –Spreadsheet example first Pick file by browsing Select Library and Member (we will talk about this.
BMTRY 789 Lecture 2 SAS Syntax, entering raw data, etc. Lecturer: Annie N. Simpson, MSc. Readings – Chapters 1, 2, 12, & 13 Lab Problems 1.1, 1.2, 1.3,
I OWA S TATE U NIVERSITY Department of Animal Science Getting Your Data Into SAS (Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 3 September.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Introduction to SAS Essentials Mastering SAS for Data Analytics
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
Creating and Using Custom Formats for Data Manipulation and Summarization Presented by John Schmitz, Ph.D. Schmitz Analytic Solutions, LLC Certified Advanced.
Chapter 9: Advanced SQL and PL/SQL Guide to Oracle 10g.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Data Input in SAS Many ways to get your data into SAS: –Through data entry.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
SAS for Data Management and Analysis
An Introduction Katherine Nicholas & Liqiong Fan.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
Chapter 2 Getting Data into SAS Directly enter data into SAS data sets –use the ViewTable window. You can define columns (variables) with the Column Attributes.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Chapter 17 Supplement: Alternatives to IF-THEN/ELSE Processing STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
SAS Certification Prep Guide Chapter 7 Creating and Applying User-Defined Formats.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 14 & 19 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Instructor: Raul Cruz-Cano 7/9/2012
Chapter 2: Getting Data into SAS
SAS Programming Introduction to SAS.
Introduction to SAS A SAS program is a list of SAS statements executed in order Every SAS statement ends with a semicolon! SAS statements can be in caps.
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal

Topics covered…  Formats  Informats  Reading external data  PROC Import  PROC Format  Using formats and labels in DATA vs. PROC  PROC Datasets

SAS Format

What are formats?  Formats define the appearance of data values  Formats do not change the internal value of the data  Can be used to improve appearance  Can also be used to group data

What are formats?  Can use either SAS supplied formats or create your own using PROC Format  Formats can be applied in both DATA and PROC steps  Formats applied in DATA steps (or PROC Datasets) are permanent  Formats applied in PROC steps only apply within the procedure

Pre-formatted valueFormatFormatted value comma10. 2,125, dollar24.2 $52, mmddyy8. 12/26/ weekdate. Wednesday, December 26, 2007 M $Gender. Male 12 AgeGroup. Under 18 C $PassFail. Passing Grade Examples of formats

Pre-formatted valueFormatFormatted value comma10. 2,125, dollar24.2 $52, mmddyy8. 12/26/ weekdate. Wednesday, December 26, 2007 M $Gender. Male 12 AgeGroup. Under 18 C $PassFail. Passing Grade Examples of formats

SAS Documentation

Format names format.  $ : indicates a character format; absence indicates numeric format  format : names the format  w : format width (number of columns)  d : optional decimal scaling factor (number of columns after decimal point)

Format names dollar14.2  Numeric format (input values are numeric)  Format named “dollar”  Output value will be 14 columns wide (max)  2 columns are for the decimal part of the value.  This leaves 12 columns for all other characters, including the decimal point, dollar sign, commas, minus sign, etc.  Max value represented: $99,999,999.99

The importance of informats Reading external data

What are informats?  Informats are instructions that tell SAS how to read a data value  Can be as simple as w.d  3.1 tells SAS to read ‘123’ as 12.3  $3. tells SAS to read ‘123’ as ‘123’ and store it as character data  Excellent for reading dates, dollars, and percents  MMDDYY8. tells SAS to read ’12/26/07’ and store it as (a SAS date that can be used for calculations, etc.)

Four variables: Subj, DOB, Gender, Balance Fixed column data Four variables: Subj, DOB, Gender, Balance Fixed column data Reading data from a text file

subj – name of variable $ – indicates character variable 1-3 – indicates starting and ending columns subj – name of variable $ – indicates character variable 1-3 – indicates starting and ending columns Reading data from a text file

subj – name of variable $ – indicates character variable 1-3 – indicates starting and ending columns subj – name of variable $ – indicates character variable 1-3 – indicates starting and ending columns Reading data from a text file Date of birth would be stored as a character variable. Wouldn’t be able to perform calculations or change format of data. Date of birth would be stored as a character variable. Wouldn’t be able to perform calculations or change format of data.

Reading data from a text – indicates starting column subj – name of variable $3. – indicates informat (how to read the input data – indicates starting column subj – name of variable $3. – indicates informat (how to read the input data values)

Reading data from a text – indicates starting column subj – name of variable $3. – indicates informat (how to read the input data – indicates starting column subj – name of variable $3. – indicates informat (how to read the input data values) Date of birth would be stored as a numeric SAS date. Can now perform calculations or change format of data.

Reading external data  There are numerous ways to read raw data into SAS  My favorite… PROC Import (with a twist)

PROC Import  PROC Import reads raw data to a SAS dataset  Easy to use, but…  Clunky and hard to customize  Uses first twenty lines of input file to decide which informat to use  Can often result in truncated variables and values that are not formatted correctly

PROC Import OUT= name of output SAS dataset DATAFILE= where to find the data (same as INFILE) DBMS= type of incoming raw data (in this case comma-separated) REPLACE option that allows existing SAS data set to be overwritten (useful if you run the same procedure more than once) GETNAMES=yes uses the first record of input file to generate variable names OUT= name of output SAS dataset DATAFILE= where to find the data (same as INFILE) DBMS= type of incoming raw data (in this case comma-separated) REPLACE option that allows existing SAS data set to be overwritten (useful if you run the same procedure more than once) GETNAMES=yes uses the first record of input file to generate variable names

PROC Import (with a twist)  Run PROC Import  Copy the SAS log to the Program Editor  PROC Import will create a DATA step with INFILE and INPUT statements in the log  Delete any non-SAS code  Modify informats, formats, and lengths (as needed)  Run the new code

PROC Import (with a twist)  Run PROC Import  Copy the SAS log to the Program Editor  Delete any non-SAS code  Modify informats, formats, and lengths (as needed)  Run the new code

PROC Import (with a twist)  Run PROC Import  Copy the SAS log to the Program Editor  Delete any non-SAS code  Modify informats, formats, and lengths (as needed)  Run the new code

PROC Import (with a twist)  Run PROC Import  Copy the SAS log to the Program Editor  Delete any non-SAS code  Modify informats, formats, and lengths (as needed)  Run the new code

PROC Import (with a twist)  Run PROC Import  Copy the SAS log to the Program Editor  Delete any non-SAS code  Modify informats, formats, and lengths (as needed)  Run the new code Changed ID to character Changed length of Gender to 1

PROC Import (with a twist)  Run PROC Import  Copy the SAS log to the Program Editor  Delete any non-SAS code  Modify informats, formats, and lengths (as needed)  Run the new code

How to create your own formats PROC Format

 PROC Format allows you to create your own formats  Can create formats for numeric or character data

PROC Format  User-created format names cannot end with a number  (Trailing numbers used to specify width – w.d)  Formats created with value statement used to convert appearance of data values to specified character string  Formats created with picture statement used to create a template for printing numbers  For example – becomes (503)

PROC Format  value $gender  Value statement begins new format Can create more than one format per PROC Format  $gender is the name of the new format  Format name begins with a $ to indicate that the format is to be applied to Character data Input value Output value

Unformatted output PROC Format

Output with $Gender format applied to gender variable PROC Format

 value $gender  Data values that do not match the specified list of input values appear in their unformatted form Data value of ‘U’ would appear as ‘U’ in the output  Input values are case sensitive Data value of ‘m’ wouldn’t match to 'M' = 'Male'

PROC Format  value YNscale  Value statement begins new format  YNscale is the name of the new format  Format name does not begins with a $ to indicate that the format is to be applied to Numeric data

PROC Format  value $groupdata  Can use formats to group data  Groups must be mutually exclusive Unless using multilabel formats  Can group either character and numeric data

PROC Format  value $grades  Can use lists or ranges in the input values  Can create a formatted value for missing data Blanks for character ' ' = 'Missing' Periods for numeric. = 'Missing'  Can use other or else option to capture non-specified input values

PROC Format  value age  Can use low or high to capture outer bounds of input values  Caution! Make sure you have clean data! What if the input dataset used 255 as their value for missing age?

PROC Format  value wages  Watch out for the cracks! Oops! Whoops!

PROC Format  value wages  Solution: Use < symbol  Up to, but excluding, listed value  Can be used on either side of the dash “600<-high” means “ through upper limit”

Using formats

Use a format statement to apply formats in PROC steps Using formats

Output with $Gender format applied to gender variable Using formats

Can apply more than one format in a single format statement Using formats

Output with formats applied to every variable Using formats

Formats applied in a PROC step only apply to that PROC step Using formats

Second PROC Print step with no formats applied Using formats

Formats can also be applied in a DATA step Unlike a PROC step, format statements in a DATA step will permanently associate the format with the variable Formats can also be applied in a DATA step Unlike a PROC step, format statements in a DATA step will permanently associate the format with the variable Using formats

PROC Contents of work.test Formats become part of the attributes of the dataset PROC Contents of work.test Formats become part of the attributes of the dataset Using formats

Even if formats have been applied in a DATA step, they can be temporarily superseded by a PROC step (or permanently overwritten with another DATA step) Even if formats have been applied in a DATA step, they can be temporarily superseded by a PROC step (or permanently overwritten with another DATA step) Using formats

PROC Print with worddate. format applied to Date variable Using formats

Formats can be used to group data in analytical and reporting procedures (such as PROC Means, PROC Freq, etc.) Formats can be used to group data in analytical and reporting procedures (such as PROC Means, PROC Freq, etc.) Using formats

Analyses will be performed on the formatted values

Using labels

 Like formats, labels can be applied to variables in either the DATA or PROC step  Labels applied in DATA steps (or PROC Datasets) are permanent  Labels applied in PROC steps only apply within the procedure  Labels are created using the label statement  Some procedures require additional options to specify use of labels (vs. variable names) in output

Using labels PROC Print requires a label option when you want to display labels (instead of field names) in the column header The label statement can be used in either a DATA or PROC step

Example of a label statement Using labels

PROC Datasets

 PROC Datasets allows you to change the permanent attributes of a dataset without running a DATA step  Labels  Formats  Rename variables  and more…  Less processing time  Don’t need to recreate a dataset  Remember every DATA step creates a new dataset!

PROC Datasets  PROC Datasets  library= Specify the library where the datasets reside  modify Specify the dataset you want to modify  Can make more than one modification per dataset  Can modify more than one dataset per PROC Datasets Put a run between each modify statement End procedure with a quit statement

Read chapters 7 & 10 (skip sections 10.6 and 10.13) For next day…