Chapter 17 Read Raw Data in Fixed Format using Formatted Input Objectives Distinguish between standard and nonstandard numeric data Read standard fixed-field.

Slides:



Advertisements
Similar presentations
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Advertisements

SAS Programming:File Merging and Manipulation. Reading External Files (review) data barf; * create the dataset BARF; infile ’s:\mysas\Table7.1'; * open.
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
The Information Delivery Process Data In Information Out ManageOrganizeExploit.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2005 Pearson Education, Inc. All rights reserved Chapter 3 Control Statements.
8 November Forms and JavaScript. Types of Inputs Radio Buttons (select one of a list) Checkbox (select as many as wanted) Text inputs (user types text)
Chapter 9 Formatted Input/Output Acknowledgment The notes are adapted from those provided by Deitel & Associates, Inc. and Pearson Education Inc.
1 SAS SAS is a statistics software package developed by SAS Institute Inc. in U.S.A. SAS products include SAS/STAT, SAS/IML, SAS/OR, etc. The most.
1 Computer Applications in Epidemiology Dongmei Li Lecture 26 5/6/2009.
C Formatted Input/Output /* Using Integer Conversion Specifiers */ #include int main ( ) { printf( "%d\n", 455 ); printf( "%i\n", 455 ); printf( "%d\n",
 2007 Pearson Education, Inc. All rights reserved C Formatted Input/Output.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Into to SAS ®. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
Creating SAS® Data Sets
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
FORMAT FESTIVAL AN INTRODUCTION TO SAS® FORMATS AND INFORMATS By David Maddox.
Simple Data Type Representation and conversion of numbers
Chapter 2: Working with Data in a Project
Advanced Excel for Finance Professionals A self study material from South Asian Management Technologies Foundation.
Lecture 5 Sorting, Printing, and Summarizing Your Data.
Fortran 1- Basics Chapters 1-2 in your Fortran book.
Number Systems Part 2 Numerical Overflow Right and Left Shifts Storage Methods Subtraction Ranges.
Chapter 21 Reading Hierarchical Files Reading Hierarchical Raw Data Files.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
Chapter 20 Creating Multiple Observations from a Single Record Objectives Create multiple observations from a single record containing repeating blocks.
©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina Chapter 17 supplement: Review of Formatting Data STAT 541.
Chapter 5 Using Data and COBOL Operators. Initializing Variables When you define a variable in WORKING- STORAGE, you also can assign it an initial value.
Input, Output, and Processing
EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.
Chapter 2 The Balance Sheet! Quiz on Friday Sep 12.
BMTRY 789 Lecture 2 SAS Syntax, entering raw data, etc. Lecturer: Annie N. Simpson, MSc. Readings – Chapters 1, 2, 12, & 13 Lab Problems 1.1, 1.2, 1.3,
Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets.
I OWA S TATE U NIVERSITY Department of Animal Science Getting Your Data Into SAS (Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 3 September.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Chapter 3: Formatted Input/Output Copyright © 2008 W. W. Norton & Company. All rights reserved. 1 Chapter 3 Formatted Input/Output.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
Chapter 3: Assignment, Formatting, and Interactive Input.
Asking the USER for values to use in a software 1 Input.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
SAS for Data Management and Analysis
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,
Chapter 2 Getting Data into SAS Directly enter data into SAS data sets –use the ViewTable window. You can define columns (variables) with the Column Attributes.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
Chapter 14: Combining Data Vertically 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 3: Formatted Input/Output 1 Chapter 3 Formatted Input/Output.
+ Note On the Use of Different Data Types Use the data type that conserves memory and still accomplishes the desired purpose. For example, depending on.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Department of Computer Science Georgia State University
C Formatted Input/Output
Lesson 2 Topic - Reading raw data into SAS
Instructor: Raul Cruz-Cano 7/9/2012
Chapter 2: Getting Data into SAS
Other Kinds of Arrays Chapter 11
Two “identical” programs
Input/Output Input/Output operations are performed using input/output functions Common input/output functions are provided as part of C’s standard input/output.
Variables In programming, we often need to have places to store data. These receptacles are called variables. They are called that because they can change.
Chapter 3 The DATA DIVISION.
Differences between Java and C
Computing in COBOL: The Arithmetic Verbs and Intrinsic Functions
Conversion Check your class notes and given examples at class.
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

Chapter 17 Read Raw Data in Fixed Format using Formatted Input Objectives Distinguish between standard and nonstandard numeric data Read standard fixed-field data Read nonstandard fixed-field data

Review of Column Input General Syntax of Column Input: INPUT var start_col – end_col ………. ; – Var is the variable name – $ is for character variable. – Start_col, end _clo specify the starting and ending col # for reading the variable. Ex. INPUT L_Name $ 1 – 15 F_name $ age Choles 30-35;

Important features and usages of Column Input It can read character variables of the data values have embedded blanks. Missing data values will be read as missing from the defined columns (Blank for character and ‘.’ for numeric). Columns can be re-read. ex. INPUT supplier $ 5-20 ItemNum amount 22-30; Columns can be read backwards or forwards. Ex. INPUT F_name $ L_Name $ 1-15 age 16-18;

Raw data that can not be read by Column Input When data values are not standard numeric data: Ex. Data values having $ sign, having comma, having %, etc. Data values are not organized in a fixed columns for each variable. Numeric data values having decimal places, the decimal is not recorded in the data. Date, Time, Datetime data that are not recorded in numeric values, instead, recorded as commonly used date, such as 11/14/2010. Such as date requires a special format to read it. If it is read using Column Input, it much be read as a character data. Data values are not recorded in fixed format, for example, data values are recorded by using delimiters, such as blank /, ; tab, and so on.

A review of Standard Vs. Nonstandard Numeric Data Standard numeric data can contain only Numbers Decimal places Numbers in scientific or E-notation (ex, 4.2E3) Plus or minus signs Nonstandard numeric data includes Values contain special characters, such as %, $, comma (,), etc. Date and time values Data in fractions, integer binary, real binary, hexadecimal forms, etc.

Determine if each of the following numeric data standard or nonstandard data Standard $ Nonstandard 3,456.12Nonstandard 20DEC2010Nonstandard date 12/20/2010Nonstandard date

A review of Fixed Format Vs. Free format Fixed format means a variable occupies in a fixed range of columns from observation to observation. Free format means the data values are not in a fixed range of columns. Ex:Fixed formatFree format HIGH FHIGH F LOW F MEDIAN M

A Review of basic statements for reading External Raw data in a Data Step General form for the complete DATA step without FILEMANE statement: DATA SAS_data_set_name; INFILE ‘input-raw-data-file’; INPUT variable $ start - end...; RUN; General form for the complete DATA step with Filename statement: DATA SAS_data_set_name; FILENAME Fileref ‘input-raw-data-file’; INFILE Fielref ; INPUT variable $ start - end...; RUN;

Example: Review of Reading External Raw Data Using Column Input Read External Data salesdata.dat with FILENAME statement FILENAME sal_dat ‘C:\math707\RawData\RawData_dat\salesdata.dat’ ; DATA saleslib.sales_sasdata; INFILE sal_dat; INPUT last_name $ 1-7 sale_date $ 9-11 residential commercial 23 – 31; Read External Data salesdata.dat without FILENAME statement DATA saleslib.Sales_sasdata; INFILE ‘C:\math707\RawData\RawData_dat\salesdata.dat’ ; INPUT last_name $ 1-7 sale_date $ 9-11 residential commercial 23 – 31;

Reading External Raw Data using Formatted Input General Syntax: INPUT variable Informat.; Pointer-control: pointers to control the position of the column. Variable: variable name to be created. Informat: the format to input the variable. NOTE: Two pointer-controls to control the column position : moves the pointer to the specified column. This is the absolute column of the data record. +n : move the pointer forward n columns beginning from the current position.

Informat in the INPUT statement Informat is the SAS format used in the Input statement to READ data values. It is used in the INPUT statement, and it is called INformat. The SAS format we discussed in Chapter four DISPLAYING data values can be used as Informat in the INPUT statement as fixed formatted input.

Recall: SAS Format A format is an instruction that SAS uses to write data values. SAS formats have the following form: 12 format. Format name Total width (including decimal places and special characters) Number of decimal places Required delimiter Indicates a character format

SAS INFormats 13

SAS INFormats for Date, Time 14 Recall that a SAS date is stored as the number of days between 01JAN1960 and the specified date. If the date or time values are created using ex, 10/16/2001, 16OCT2001, in order to read the date properly from the external data set, we need to use the Informat: Date in External data set InformatData value read 10/16/2001mmddyy OCT2001Date

Some commonly used Informat PERCENTw.dDATE9.NENGOw. $BINARYw.DATETIMEw.PDw.d $VARYINGw.HEXw.PERCENTw. $w.JULIANw.TIMEw. COMMAw.dMMDDYYw.W.d

The Informat COMMAw.d COMMAw.d informat reads nonstandard numeric data and removes the embedded Blanks, commas, dashes, dollar signs, percent signs, right parenthesis, left parenthesis, which are converted to negative sign. Actual Data valueCOMMAw.dData value read 12,345.67COMMA $12,345.67COMMA COMMA COMMA (12,345.67)COMMA

Exercise Practice Informat in Input statement: The data set aug99n.dat is posted on the class website. Three observations of the data set are shown below : AUG1999 R % AUG1999 C % AUG1999 T % Write a SAS program to read this data, pay special attention to the use of Informat to read non-standard numeric values. Print the data set using proper display formats for non-standard variables. Field NameStart ColumnEnd ColumnMaximum WidthData Type ID133numeric Date5139character Item15 1character Quantity17193numeric Price21244numeric Percentsale26297numeric

Answer: There are many ways to accomplish the same goal. Here is an example data orders; infile 'C:\math707\RawData\RawData_dat\aug99n.dat'; input ID date date9. item $ 15 quantity totalcost percentsale percent4. ; proc print; format date MMDDYY8. percentsale percent6. ; run;

Salesdata.dat SMITH 10JAN DAVIS 15JAN JOHNSON 20JAN SMITH 01FEB DAVIS 12FEB JOHNSON 22FEB SMITH 10MAR DAVIS 18MAR JOHNSON 26MAR

Read Salesdata.dat data using Formatted Input Data work.sale; INFILE ‘C:\math707\RawData\RawData_dat \Salesdata.dat’; input last_name month residential commercial 9.2; run; Proc print; Run; NOTE: INFILE defines the location of the data set. $w. is the format for character moves the pointer to the column n. +n: move forward n columns from the current position. The pointer starts at column 1. After reading a variable, the pointer move the next column as the current position. Ex: After reading last_name with 7 columns, the pointer moves to column 8 as the current position. After reading residential (starting at 19, reading 9 columns), that is, residential is from 19 to 27. The pointer moves to column 28 as the current column. Hence, +1 asks the pointer move one column forward from col 28 to col 29, then, read 9 columns for commercial.

Example: Read the following data using Formatted Input The following is the scores of quizzes, test1, test2 and final of a class. Name Q1 Q2 Q3 Q4 Q5 T1 T2 Final CSA DB QC DC E F GC HD IM WB Write a SAS program to read the data by having the data included in the SAS program.

/*Program Statements */ DATA scores; /*Column Input */ INPUTName $ 1-5 Q1 6-7 Q Q Q Q TEST TEST Final 33-36; /*Formatted Input */ INPUT NAME $5. Q1 Q2 Q3 Q4 Q5 TEST1 TEST2 FINAL 4.; CSA DB QC DC E F GC HD IM WB ; RUN;

Different formatted inputs to read the same data /*Column Input */ INPUTName $ 1-5 Q1 6-7 Q Q Q Q TEST TEST Final 33-36; /*Formatted Input */ INPUT NAME $5. Q1 Q2 Q3 Q4 Q5 TEST1 TEST2 FINAL 4.; INPUT NAME $5. Q Q Q Q4 Q5 TEST TEST2 FINAL 4.; INPUT NAME $5. (Q1-Q5 TEST1 TEST2)(2. FINAL 4. ;

Exercise Open the program c5_colInp And change the Column INPUT statement using Formatted Input.

Fixed Record Length Vs. Variable Record Length In reading an external data set, the record length is the size of each record. Usually a record consists of the variables of an observation. NOTE: It is possible one record can consists of multiple observations. This will be discussed later The size of each record is usually ‘FIXED’, that is the same record size for every record. However, it may not be the case in data recording. That is, the record size may differ. When the record lengths differ, Formatted input may not read the data values correctly due to the fact that Formatted input will look for the # of columns specified for each variable. When the record lengths vary, the pointer may continue to the next record in order to read the specified # of columns for last variable (usually) in the INPUT statement. An error will occur when this situation happens.

Formatted Input when Reading Records with Variable Record Lengths Using the PAD option in the INFILE Statement One way to fix the problem is to add the blank spaces to the existing records that are short of the record length to change the record length to be ‘FIXED’. The other way is to inform SAS to ‘PAD’ the blanks to those records which are too short. Suppose the record length for the Salesdata.dat is not fixed. Example: Data work.sale; INFILE ‘C:\math707\RawData\RawData_dat \ Salesdata.dat’ pad; input last_name sale_date residential commercial 9.2; run;

Formatted PUT to created External Data Set Similar to formatted Input to read external raw data set, one can create external data set using formatted PUT statement. FILENAME fileref ‘file-location’; FILE fileref; PUT var format ……… ; RUN;

Example: Create External Data Set using Formatted PUT To create an external data for the salesdata that consists of only MARCH. Data work.sale; INFILE ‘C:\math707\RawData\RawData_dat\ Salesdata.dat’; input last_name sale_date residential commercial 9.2; Run; Data marchsale; set work.sale; FILE ‘C:\math707\RawData\RawData_dat\ Sales_March.dat’; IF MONTH(Sale_date) = 3; PUT l_name sale_date residential commercial 10.2; run;

Exercise The following is a finance data. Variables are SSN, Name, Salary, Nyear, Birthday Rudelich 55, Vincent 65, Benito 78, Sirignano $5, Harbinger 73, Phillipon $49, Gunter 57, Write a SAS program to read this data using formatted format. Practice using PAD option in the Infile statement and make sure you see and understand the difference between with PAD and without PAD

An answer /* program (b) to read variable length records - This program has error. Carefully check the errors */ data financeb; infile 'C:\math707\RawData\RawData_dat\finance3_recordlength.dat' ; input SSN $ 1-11 Name $ salary comma Nyear birthdate 5.; proc print; format birthdate date9.; title 'Errors in reading variable-length records'; run; /* Program c: use PAD option in the INFLIE statement */ data financec; infile 'C:\math707\RawData\RawData_dat\finance3_recordlength.dat' pad; input SSN $ 1-11 Name $ salary comma Nyear birthdate 5.; proc print; format birthdate date9.; title 'Use PAD option to read variable-length records'; run;