1 Checking Data with the PRINT and FREQ Procedures.

Slides:



Advertisements
Similar presentations
Statistics in Science  Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy.
Advertisements

SAS Programming: Working With Variables. Data Step Manipulations New variables should be created during a Data step Existing variables should be manipulated.
Writing Basic SQL SELECT Statements. Capabilities of SQL SELECT Statements A SELECT statement retrieves information from the database. Using a SELECT.
The Information Delivery Process Data In Information Out ManageOrganizeExploit.
Descriptive Statistics In SAS Exploring Your Data.
2 Copyright © 2004, Oracle. All rights reserved. Restricting and Sorting Data.
Introduction to SQL Session 1 Retrieving Data From a Single Table.
Ceng 356-Lab2. Objectives After completing this lesson, you should be able to do the following: Limit the rows that are retrieved by a query Sort the.
Data Cleaning 101 Ron Cody, Ed.D Robert Wood Johnson Medical School Piscataway, NJ.
1 Copyright © Oracle Corporation, All rights reserved. Writing Basic SQL SELECT Statements.
Chapter 2 Basic SQL SELECT Statements
Chapter 8 Producing Summary Reports. Section 8.1 Introduction to Summary Reports.
SAS PROC REPORT PROC TABULATE
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
Chapter 2 Basic SQL SELECT Statements Oracle 10g: SQL.
Microsoft Visual Basic 2008: Reloaded Fourth Edition
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
11 Chapter 2: Basic Queries 2.1: Overview of the SQL Procedure 2.2: Specifying Columns 2.3: Specifying Rows.
USING SAS PROCEDURES SAS System Options OPTIONS Statement
©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina Chapter 17 supplement: Review of Formatting Data STAT 541.
Restricting and Sorting Data. ◦ Limiting rows with:  The WHERE clause  The comparison conditions using =,
2 Copyright © Oracle Corporation, All rights reserved. Restricting and Sorting Data.
2 Copyright © 2004, Oracle. All rights reserved. Restricting and Sorting Data.
4 Copyright © 2006, Oracle. All rights reserved. Restricting and Sorting Data.
SQL (DDL & DML Commands)
SQL Chapter Two. Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows.
Summer SAS Workshop Lecture 2. Summer Summer SAS Workshop Lecture 2 I’ve got Data…how do I get started? Libname Review How do you do arithmetic.
1 Filling in the blanks with PROC FREQ Bill Klein Ryerson University.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
Chapter 12: String Manipulation Introduction to Programming with C++ Fourth Edition.
How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
2 第二讲 Restricting and Sorting Data. Objectives After completing this lesson, you should be able to do the following: Limit the rows retrieved by a query.
Copyright © 2004, Oracle. All rights reserved. Retrieving Data Using the SQL SELECT Statement Satrio Agung Wicaksono, S.Kom., M.Kom.
Copyright © 2004, Oracle. All rights reserved. Lecture 4: 1-Retrieving Data Using the SQL SELECT Statement 2-Restricting and Sorting Data Lecture 4: 1-Retrieving.
Queries SELECT [DISTINCT] FROM ( { }| ),... [WHERE ] [GROUP BY [HAVING ]] [ORDER BY [ ],...]
2 Copyright © 2004, Oracle. All rights reserved. Restricting and Sorting Data.
Performing Advanced Queries Using PROC SQL Chapter 2 1.
1 EPIB 698C Lecture 4 Raul Cruz-Cano Summer 2012.
Controlling Input and Output
Working with Columns, Characters, and Rows. 2 home back first prev next last What Will I Learn? In this lesson, you will learn to: –Apply the concatenation.
SAS for Data Management and Analysis
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.
2 Copyright © 2009, Oracle. All rights reserved. Restricting and Sorting Data.
Visual Basic CDA College Limassol Campus COM123 Visual Basic Programming Semester C Lecture:Pelekanou Olga Week 3: Using Variables.
Chapter 17 Supplement: Alternatives to IF-THEN/ELSE Processing STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South.
Chapter 4: Variables, Constants, and Arithmetic Operators Introduction to Programming with C++ Fourth Edition.
An Introduction to Programming with C++ Sixth Edition Chapter 5 The Selection Structure.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
The Urban Institute - SAS Training6/9/20161 SAS Training This SAS Training Course was designed to introduce users at The Urban Institute to SAS programming.
1 Copyright © 2007, Oracle. All rights reserved. Retrieving Data Using the SQL SELECT Statement.
SAS Certification Prep Guide Chapter 7 Creating and Applying User-Defined Formats.
1 Cleaning Invalid Data. Clean data by using assignment statements in the DATA step. Clean data by using IF-THEN / ELSE statements in the DATA step. 2.
Oracle 10g Retrieving Data Using the SQL SELECT Statement.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Restricting and Sorting Data
Retrieving Data Using the SQL SELECT Statement
Applied Business Forecasting and Regression Analysis
Writing Basic SQL SELECT Statements
Basic select statement
Two “identical” programs
Basic Queries Specifying Columns
Noncorrelated subquery
Variables In programming, we often need to have places to store data. These receptacles are called variables. They are called that because they can change.
Restricting and Sorting Data
Producing Descriptive Statistics
Presentation transcript:

1 Checking Data with the PRINT and FREQ Procedures

Check data by using the PRINT procedure with the WHERE statement. Check data by using the FREQ procedure with the TABLES statement. 2

Example Constraints on the non-sales employee data: Employee_ID must be unique and not missing. Gender must have a value of F or M. Salary must be in the numeric range of – Job_Title must not be missing. Country must have a value of AU or US. Birth_Date value must occur before Hire_Date value. Hire_Date must have a value of 01/01/1974 or later. 3

SAS Procedures for Validating Data 4 PROC PRINT step with VAR and WHERE statements Detects invalid character and numeric values by subsetting observations based on conditions PROC FREQ step with TABLES statement Detects invalid character and numeric values by looking at distinct values PROC MEANS step with VAR statement Detects invalid numeric values by using summary statistics PROC UNIVARIATE step with VAR statement Detects invalid numeric values by looking at extreme values

The PRINT Procedure The PRINT procedure produces detail reports based on SAS data sets. The VAR statement selects variables to include in the report and their order in the report. The WHERE statement is used to obtain a subset of observations. 5 PROC PRINT DATA=SAS-data-set ; VAR variable(s) ; WHERE where-expression ; RUN; PROC PRINT DATA=SAS-data-set ; VAR variable(s) ; WHERE where-expression ; RUN;

The WHERE Statement For validating data, the WHERE statement is used to retrieve the observations that do not meet the data requirements. General form of the WHERE statement: The where-expression is a sequence of operands and operators that form a set of instructions that define a condition for selecting observations. Operands include constants and variables. Operators are symbols that request a comparison, arithmetic calculation, or logical operation. 6 WHERE where-expression ;

The WHERE Statement 7 proc print data=orion.nonsales; var Employee_ID Last Job_Title; where Job_Title = ' '; run;

The WHERE Statement A WHERE statement might need to reference a SAS date value. For example, the PRINT procedure needs to retrieve observations that have values of Hire_Date less than January 1, Use a SAS date constant to convert a calendar date to a SAS date value. 8 What is the numeric SAS date value for January 1, 1974?

SAS Date Constant To write a SAS date constant, enclose a date in quotation marks in the form ddMMMyyyy and immediately follow the final quotation mark with the letter d. Example: Date constant for January 1, 1974, is 9 ddis a one- or two-digit value for the day. MMMis a three-letter abbreviation for the month. yyyyis a four-digit value for the year. dis required to convert the quoted string to a SAS date. '01JAN1974'd

SAS Date Constant 10 libname orion "&path/prg1"; proc print data=orion.nonsales; var Employee_ID Birth_Date Hire_Date; where Hire_Date < '01JAN1974'd; run; proc contents data=orion._all_ nods; run;

Data Requirements 11 Data Requirementwhere-expression to obtain invalid data Employee_ID must be unique and not missing. Employee_ID =. Gender must have a value of F or M. Gender not in ('F','M') Salary must be in the range of – Salary not between and Job_Title must not be missing. Job_Title = ' ' Country must have a value of AU or US. Country not in ('AU','US') Birth_Date must occur before Hire_Date. Birth_Date > Hire_Date Hire_Date must have a value of 01/01/1974 or later. Hire_Date < '01JAN1974'd Does not account for uniqueness.

Data Requirements 12 %clearall proc print data=orion.nonsales; var Employee_ID Gender Salary Job_Title Country Birth_Date Hire_Date; where Employee_ID =. or Gender not in ('F','M') or Salary not between and or Job_Title = ' ' or Country not in ('AU','US') or Birth_Date > Hire_Date or Hire_Date < '01JAN1974'd; run;

The FREQ Procedure The FREQ procedure produces one-way to n-way frequency tables. General form of the FREQ procedure: The TABLES statement specifies the frequency tables to produce. The NLEVELS option displays a table that provides the number of distinct values for each variable named in the TABLES statement. 13 PROC FREQ DATA=SAS-data-set ; TABLES variable(s) ; RUN; PROC FREQ DATA=SAS-data-set ; TABLES variable(s) ; RUN;

The FREQ Procedure The following PROC FREQ step will show if there are any invalid values for Gender and Country. Without the TABLES statement, PROC FREQ produces a frequency table for each variable. 14 %clearall proc freq data=orion.nonsales; tables Gender Country; run;

The FREQ Procedure This PROC FREQ step will show if there are any duplicates for Employee_ID. 15 proc freq data=orion.nonsales; tables Employee_ID; run;

The FREQ Procedure 16 The FREQ Procedure Cumulative Cumulative Employee_ID Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Frequency Missing = 1 The FREQ Procedure Cumulative Cumulative Employee_ID Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

The NLEVELS Option If the number of desired distinct values is known, the NLEVELS option can help determine if there are any duplicates. The NLEVELS option displays a table that provides the number of distinct values for each variable named in the TABLES statement. The Number of Variable Levels table appears before the individual frequency tables. 17 proc freq data=orion.nonsales nlevels; tables Gender Country Employee_ID; run;