SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions.

Slides:



Advertisements
Similar presentations
Debugging ACL Scripts.
Advertisements

Maintaining data quality: fundamental steps
Axio Research E-Compare A Tool for Data Review Bill Coar.
The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Chapter 3: Editing and Debugging SAS Programs. Some useful tips of using Program Editor Add line number: In the Command Box, type num, enter. Save SAS.
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
Statistics in Science  Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy.
3/5/2009Computer systems1 Analyzing System Using Data Dictionaries Computer System: 1. Data Dictionary 2. Data Dictionary Categories 3. Creating Data Dictionary.
Today: Run SAS programs on Saturn (UNIX tutorial) Runs SAS programs on the PC.
1 SAS Formats and SAS Macro Language HRP223 – 2011 November 9 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Chapter 8: I/O Streams and Data Files. In this chapter, you will learn about: – I/O file stream objects and functions – Reading and writing character-based.
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
Troy Eversen | 19 May 2015 Data Integrity Workshop.
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
PROC_CODEBOOK: An Automated, General Purpose Codebook Generator
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
Advanced Shell Programming. 2 Objectives Use techniques to ensure a script is employing the correct shell Set the default shell Configure Bash login and.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. ACCESS 2007 M I C R O S O F T ® THE PROFESSIONAL APPROACH S E R I E S Lesson 4 – Creating New.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Developing Workflows with SharePoint Designer David Coe Application Development Consultant Microsoft Corporation.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
PMS /134/182 HEX 0886B6 PMS /39/80 HEX 5E2750 PMS /168/180 HEX 00A8B4 PMS /190/40 HEX 66CC33 By Adrian Gardener Date 9 July 2012.
RTSUG 04Feb2014: Beyond Directory Listings in SAS By: Jim Worley.
ALLIANCE Administration 20 Oct 2009 (Based on Release 2.2) Michaël Petit.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
Testing E001 Access to Computing: Programming. 2 Introduction This presentation is designed to show you the importance of testing, and how it is used.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Professor Michael J. Losacco CIS 1110 – Using Computers Database Management Chapter 9.
Chapter 1: Introduction to SAS  SAS programs: A sequence of statements in a particular order  Rules for SAS statements: –Every SAS statement ends in.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Macro Express. What is a Macro? “A macro is a way to automate a task that you perform repeatedly or on a regular basis. It is a series of commands and.
Lesson 12: Creating a Manual and Using Mail Merge.
Computers and Scientific Thinking David Reed, Creighton University Functions and Libraries 1.
Microsoft Office Outlook 2013 Microsoft Office Outlook 2013 Courseware # 3252 Lesson 6: Organizing Information.
Forms and Server Side Includes. What are Forms? Forms are used to get user input We’ve all used them before. For example, ever had to sign up for courses.
Create Lists in Millennium Jenny Schmidt SWITCH Library Consortium.
Chapter 9 I/O Streams and Data Files
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
Verification & Validation. Batch processing In a batch processing system, documents such as sales orders are collected into batches of typically 50 documents.
A SAS User's Guide to Storage Management Allan Page Senior Marketing Analyst Canadian Tire Financial Services.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Designing Classes CS239 – Jan 26, Key points from yesterday’s lab  Enumerated types are abstract data types that define a set of values.  They.
Starting Out with C++ Early Objects ~~ 7 th Edition by Tony Gaddis, Judy Walters, Godfrey Muganda Modified for CMPS 1044 Midwestern State University 6-1.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
Summer SAS Workshop Lecture 3. Summer SAS Workshop Website
Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015.
TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
Customize SAS Output Using ODS Joan Dong. The Output Delivery System (ODS) gives you greater flexibility in generating, storing, and reproducing SAS procedure.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Accomplish more with macros! Presenter: Joyce Bell Princeton University
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
SAS Programming Introduction to SAS.
ECONOMETRICS ii – spring 2018
Chapter 1: Introduction to SAS
Topics Introduction to File Input and Output
Topics Introduction to File Input and Output
Writing Robust SAS Macros
Presentation transcript:

SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

Rants and Raves of a SAS Programmer

Purpose I. Quality Control II. SAS Macros for Quality Control III. Sources of SAS Macros and QC Code

I. Quality Control An ongoing effort for validation, improvement and facilitation of the data related process to insure that data meets the business needs.

Quality Control “Quality control means you can have what you need, how you need it, when you need it.” E. Demming

Why Practice QC? It Saves Time It Saves Money It Makes Money Ignorance is not Bliss

How Data Goes Bad “Bad Genes”.. Poor design and collection “Adoption” … Someone Else’s Design “Child Abuse”... Poorly Nurtured “Terrible Teens”... Growing Pains

The QC Process 1. Define Requirements 2. Identify Data Issues 3. Analyze Options 4. Improve Data Quality Document every step and repeat

Define Requirements What do you need? Requires an understanding of the business process, the data, the operating system and the users. Documentation, business specs and “experts”.

Devil’s Advocate What is correct for one task / group may be incorrect for another. What is correct now may be incorrect later. What is correct now... may not be able to be repeated.

Identify Data Issues Accuracy Completeness Consistency Timeliness Uniqueness Validity

G = Good F = Fair B = Bad

Analyze Options What do you need? What do you have? What changes need to be made? Will you break anything along the way?

Improve Data Quality Selective Processing Clean Existing Values Correcting Existing Values Delete “bad” data Add additional data Document original and new values.

Documentation Design Process... business specs “As You Go”... in the code, log, Input and Output files (Freqs & Means) Modifications.... “as per xxx “, Exceptions (Errors and Issues) User’s Manual Elizabeth Axelrod... Big ‘D’ “Just Shoot Them”

General Suggestions “Drive Out Fear” Early Intervention Obtain “Buy In” from all parties Keep it “Simple”... use macros Be consistent … use macros Monitor results Document everything, every time

II. SAS Macros Macros allow you to use, re-use and share “object-oriented” code. QC is very redundant.... the same or similar process performed on each data set, each variable and each process.

Reality People are: Ignorant Forgetful Busy Lazy Don’t Care

Why Macros Minimal Effort Parameters Available (FREE)

FREQOUT Produces Frequencies for multiple variables % FREQOUT (data= /* input dataset name */, out= freqout /* output data set name, vars= /* list of variables */, by = /* list of by variables */, fmtassign = /* var fmt var fmt */, debugging = NO /* YES or NO */ Author: Ian Whitlock Location: and sconsig.com

%EAP_RPT (DSN=, LIBIN=, LIBOUT=, _VARS=, _FMTS=); DSN = Name of input SAS data set LIBIN= SAS library of input data set LIBOUT= SAS library of output data set _VARS= list of character variables to review.. paired with _FMTS _FMTS= list of formats to apply... paired with _VARS Example: %EAP_RPT(_VARS = AGE INCOME EDUCATION, _FMTS = AGE INC EDU, LIBIN = PROJ_IN, LIBOUT = PROJ_OUT, DSN = STUDY_1); EAP_RPT

DATA CLEANING TIP00128a - Cleansing Macro, Data Scrubbing routine (see tip for more) %cleanse(schlib=work, schema=, strlen=50, var=, target=target, replace=replace, case=nocase); Author: Charles Patridge Version: 2.1 (sug. by Ian Whitlock) Location:

REMOVE OUTLIERS %outlier ( data = _SAS_dataset_name_, out = _SAS_output_dataset_name var = _variable_to_screen pass = _number_of_passes except = _exception_report_data_set_, mult = _multiplier_of_standard_deviations_) The %OUTLIER macro completes outlier screens based on statistical values of a numeric variable in a SAS data set. It is set up to remove any outlier records that are within a given number of Standard Deviations from the mean, and will run that screen a given number of times. For example, a "3-Pass-2" outlier screen will remove any values outside 3 standard deviations from the mean, and will run that outlier screen twice. The given numbers can be any integer. Author: Unknown Location:

CONT_COMPARE Compares two data sets, list all variables and reports potential issues: 1)Fields in Both 2)Type 3)Length %cont_compare (dsn1, dsn2)

KEEPDBLS: Documents Duplicates TIP KeepDbls %MACRO KeepDbls (SourceDs =_LAST_, TargetDs =, Overwrit =N, IdList =, Where =); Moves duplicate observations to another file. Author: Jim Groeneveld Location:

CK_MISSING Evaluates variables in regards to missing and non missing status. Default= _numeric_ missing. _character_ $missing. Parms: DSN = libname and name of data set. Default is the last read/created. PATH= path to directory where QC info is stored. VAR = list of variables to b evaluated. FMT = format statment. %ck_missing( dsn=mylib.recentfile, var=UPB FICO1 FICO2 FICO3 CHANNEL, fmt=UPB upb. FICO1 FICO2 FICO3 fico. CHANNEL $chnl. );

LOG FILTER: Examines and Reports on SAS Log Log Filter checks your log for errors, warnings, and other "interesting" messages. It then displays what it finds in its summary window. Double-click on a row and it'll reposition the log window to display the message in context (if it's an external log file, it'll open it in a viewer window and position it for you). Author: Ratcliffe Location:

MK_FORMATS Create a format from a SAS data set. Parms: DSN = SAS data set START =Unique key value ie. SSN LABEL =Value to be associated with start ie. Full Name with SSN FMTNAME =Name of Format (sans ".") TYPE= C or N for Character or Numeric LIBRARY = Libname of Format Library (default =work) OTHER = Value to supply for missing (default =OTHER)

III. Sources of SAS Macros and QC Code (examples) (proceeding)

More Sources SAS-L Books By Users: Ron Cody’s Data Cleaning Numerous books on Macros.... “By Example”

Questions ? Gary McQuown