EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.

Slides:



Advertisements
Similar presentations
CC SQL Utilities.
Advertisements

The SAS ® System Additional Information on Statistical Analysis Programming.
The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Chapter 17 Read Raw Data in Fixed Format using Formatted Input Objectives Distinguish between standard and nonstandard numeric data Read standard fixed-field.
Loading Excel Double click the Excel icon on the desktop (if you have this) OR Click on Start All Programs Microsoft Office Microsoft Office Excel 2003.
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
Introduction to SPSS Allen Risley Academic Technology Services, CSUSM
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 Computer Applications in Epidemiology Dongmei Li Lecture 26 5/6/2009.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Entering Data in Excel. Entering numbers, text, a date, or a time n 1Click the cell where you want to enter data. n 2Type the data and press ENTER or.
FIRST COURSE Excel Lecture. XP 2 Introducing Excel Microsoft Office Excel 2007 (or Excel) is a computer program used to enter, analyze, and present quantitative.
Introduction to SPSS (For SPSS Version 16.0)
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Into to SAS ®. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
Creating SAS® Data Sets
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Laboratory Exercise # 13 – Font and Number Format Styles Office Productivity Tools 1 Laboratory Exercise # 13 Font and Number Format Styles Objectives:
11 Chapter 2: Working with Data in a Project 2.1 Introduction to Tabular Data 2.2 Accessing Local Data 2.3 Importing Text Files 2.4 Editing Tables in the.
Chapter 2: Working with Data in a Project
Introduction to Access By Mary Ann Chaney and Alicia Harkleroad.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Fortran 1- Basics Chapters 1-2 in your Fortran book.
Microsoft Office XP Illustrated Introductory, Enhanced A Worksheet Formatting.
Creating a Web Site to Gather Data and Conduct Research.
CHAPTER 13 Creating a Workbook Part 1. Learning Objectives Understand spreadsheets and Excel Enter data in cells Edit cell content Work with columns and.
Chapter 20 Creating Multiple Observations from a Single Record Objectives Create multiple observations from a single record containing repeating blocks.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Input, Output, and Processing
XP New Perspectives on Microsoft Access 2002 Tutorial 21 Microsoft Access Tutorial 2 – Creating And Maintaining A Database.
BMTRY 789 Lecture 2 SAS Syntax, entering raw data, etc. Lecturer: Annie N. Simpson, MSc. Readings – Chapters 1, 2, 12, & 13 Lab Problems 1.1, 1.2, 1.3,
I OWA S TATE U NIVERSITY Department of Animal Science Getting Your Data Into SAS (Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 3 September.
Chapter 17 Creating a Database.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Introduction to SAS Essentials Mastering SAS for Data Analytics
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
1 EPIB 698E Lecture 1 Notes Instructor: Raul Cruz 7/9/13.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
LOGO Chapter II Entering Excel Formulas and Formatting Data Friday, November 20, 2015.
Microsoft® Excel Key and format dates and times. 1 Use Date & Time functions. 2 Use date and time arithmetic. 3 Use the IF function. 4 Create.
Here’s another problem (see section 2.13 on page 54). A file contains two different types of records (say A’s and B’s) and we only want to read in the.
1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Data Input in SAS Many ways to get your data into SAS: –Through data entry.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
SAS for Data Management and Analysis
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,
Chapter 2 Getting Data into SAS Directly enter data into SAS data sets –use the ViewTable window. You can define columns (variables) with the Column Attributes.
1 Introduction to SAS Available at
Microsoft Access Prepared by the Academic Faculty Members of IT.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 14 & 19 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Formatting a Worksheet
Lesson 2 Topic - Reading raw data into SAS
Instructor: Raul Cruz-Cano 7/9/2012
Chapter 2: Getting Data into SAS
Chapter 1: Introduction to SAS
Instructor: Raul Cruz-Cano
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
Instructor: Raul Cruz 9/4/13
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1

Creating Data in SAS, an overview Creating datasets by hand entry (Viewtable window, CARDS, DATALINES statement) Reading dataset from external files (not SAS data, INFILE statement) Using Import/Export facility (a point-and-click approach) 2

Entering Data with Viewtable window To open viewtable window, select “Table Editor” from the Tools menu. An empty viewtable window will appear The letters at the tops of columns are default variable names. Right click on the letter and open the Column attributes window. You can replace variables name, type, etc 3

Entering Data with Viewtable window Entering you data once you have defined your columns. To save your table, select “Save as” from the File menu, then select a library and specify the name of your table (SAS dataset) To open an existing table, go to Tools  Table editor, the view table window will be opened. Then go the File menu, click Open  choose library  select Table name. To switch from browse mode (default) to edit mode, select “Edit Mode” from the Edit menu Using Viewtable you can easily create a data table by setting the columns, you can add rows to the table. However, you can not add columns once you finish defining your columns. This is a considerable disadvantage. 4

Reading Data Inline, CARDS Statement You enter the actual data points inside the PROGRAM EDITOR Example: CARDS statement data instructor; input name $ gender $ age; cards; Jane F 30 Mary F 29 Mike M 28 ; run; 5  $ sign after gender means that gender is a character variable

Examples Reading multiple observations in each line of data by adding the symbol at the end of the input statement data aaa; input x y datalines; ; run; 6

Reading Dataset from External Files: INFILE Statement Identifies the external file that contains the data and has options that control how the records in file are read into SAS Must be used before the input statement because it locates the data file to be read Syntax data data_set_name; Infile directory_and_file_name; input variable_list; run; 7

Reading raw data separated by spaces—list input List input (also called free formatted input) can read data separated by at least one space. By default, SAS assumes data values are separated by one or more blanks. Will read all the data in a record, no skipping unwanted values Any missing data must by indicated with a period Character data must be simple, no embedded spaces 8

Reading raw data separated by spaces—list input Example: data demographics; infile "C:\test.txt“; input Gender $ Age Height Weight; run; 9

Specify missing values with list input We use a period to represent missing values M M F F M M F F M M

Reading raw data separated by commas Comma separated values file (csv file) use commas as data delimiters They may or may not enclose character values in quotes. Example: test.csv "M",50,68,155 "F",23,60,101 "M",65,72,220 "F",35,65,133 "M",15,71,166 11

Reading raw data separated by commas data demographics; infile 'c:\test.csv' dsd; input Gender $ Age Height Weight; run; Dsd: means delimiter sensitive data. It has several functions. (1)change the default delimiter from a blank to a comma. (2)If there are two delimiters in a row, it assumes there is a missing value in between (3)If character values are placed in quotes, the quotes are stripped from the value 12

INFILE statement, useful options DSD It recognizes two consecutive delimiters as a missing value. Example: 20,30,,50, SAS will treat this as but with the the dsd option SAS will treat it as It allows you to include the delimiter within quoted strings. Example: a comma separated file and your data included values like "George Bush, Jr.” With the dsd option, SAS will recognize that the comma in "George Bush, Jr." is part of the name, and not a separator indicating a new variable. 13

INFILE statement (cont.) DLM= The dlm= option can be used to specify the delimiter that separates the variables in your raw data file. dlm=‘,’ indicates a comma is the delimiter (e.g., a comma separated file,.csv file). dlm='09'x indicates that tabs are used to separate your variables dlm=‘:’ indicates a colon is the delimiter 14

INFILE statement (cont.) We can use dsd and dlm at the same time: infile ‘file-description’ dsd dlm=‘:’ ; This combination of options performs all the actions requested by the DSD option, but overrides the default delimiter (comma) with a delimiter of your choice 15

INFILE statement (cont.) useful options continued… missover: if number of variables in file does not match number of variables in input file all remaining variables are set to missing. obs: specifies the last record to be read into the data set firstobs: specifies the first line of data to be read into data set. Useful if there is a header row in the dataset. CODE 16

Filename statement Filename statement identifies the file and associate it with a reference name then use this reference in your INFILE statement instead of the actual file name filename mydata 'C:\test.csv'; data demographics; infile mydata dsd; input Gender $ Age Height Weight; run; 17 CODE

The INFILE option can be the DATALINES statement data demographics; infile datalines dsd; input gender $ Age Height Weight; datalines; "M",50,68,155 "F",23,60,101 "M",65,72,220 "F",35,65,133 "M",15,71,166 ; run; 18 CODE

Reading data from fixed columns Many raw data files store specific information in fixed columns The advantage of fixed column files: (1) don’t need to worry about missing values (2) you can choose which variables to read and in what order to read them 19

Bank data  columns 00110/ 21/1955M / 18/2001F Column 1-3 : subject ID Column 4-13: Date of birth Column 14-14: gender Column 15-21: Account balance 20

Column Input data financial; infile "C:\bank.txt"; input Subj $ 1-3 DOB $ 4-13 Gender $ 14 Balance 15-21; run; 21 CODE

Formatted Input Formatted input can read both character and standard numerical values as well as nonstandard numerical values, such as numbers with dollar signs and commas, and dates. Formatted input is the most common and powerful of all input methods SAS Formats and Informats: An informat is a specification for how raw data should be read. A format is a layout specification for how a variable should be printed or displayed. 22

Formatted Input The bank data: data financial; infile "C:\bank.txt"; Subj DOB Gender Balance 7.; run; 23

Formatted Input sign the INPUT statement are called column tells SAS to go to column 4. Following variable names are SAS informats. Informats are built-in instructions that tell SAS how to read a data value 24

Formatted Input Two of the most basic informats are w.d and $w. The w.d informat reads standard numeric values. The w tells SAS how many columns to read; the optional d tells SAS that there is an implied decimal point in the value. For examples: data value is 123, With informat 3.0, SAS will save it as 123; With informat 3.1, SAS will save it as 12.3; If the data value already has a decimal in it, then SAS ignores the d option. For examples: data value is 1.23, With informat 4.1, SAS will it as 1.23; 25

Formatted Input The $w. Informat tells SAS to read w columns of character data The MMDDYY10. informat tells SAS that the date you are reading is in the mm/dd/yyyy form. SAS reads the date and converts the value into a SAS date. SAS stores dates as numeric values equal to the number of days from January 1, Eg, if you read 01/01/1960, SAS stores a value of 0. The data 01/02/1960 is stored as a value of CODE

The format statement The format statements are built-in SAS command that allow you to display data in easily readable ways. All SAS formats command ends either in a period or in a period follows by a number. title "Listing of FINANCIAL"; proc print data=financial; format DOB mmddyy10. Balance dollar11.2; The dollar11.2 tells SAS to put a $ sign in front of the number, and allow up to 11 columns to print the balance values, the 2 tells SAS to include two decimal places after the decimal points. 27 CODE

Using a format/informat statement in a DATA step It is usually more useful to place your format statement with a SAS data step. There is a permanent association of the formats and variables in the data set. You can override any permanent format by placing a FORMAT statement in a particular procedure. 28 CODE

A informat statement with list input Following the key word informat, you list each variable and information you want to use to read each variable data list_example; informat Subj $3. Name $20. DOB mmddyy10. Salary dollar8.; infile 'c:\list.csv' dsd; input Subj Name DOB Salary; format DOB date9.; 29 CODE

data list_example; infile 'c:\list.csv ' dsd; input Subj : $3. Name : $20. DOB : mmddyy10. Salary : dollar8.; format DOB date9.; run; (1)there is a colon (called an informat modifier) preceding of each informat. It tells SAS to use informat supplied but to stop reading the values for this variable when a delimiter is met. (2) Without it, SAS may read past a delimiter to satisfy the width specified in the informat. 30

What are the Differences Informats used at input  Usually reading external data Formats used during output cycle  Write formatted value to output

Selected date informats InformatsData formInput dataInput statement Datew.ddmmmyyyy1Feb1961Input date Date9. ddmmmyy1Feb61Input date Date7. DDMMYYw.ddmmyy Input date ddmmyy8. ddmmyyyy01/02/1961Input date ddmmyy10. MMDDYYw.mmddyy Input date mmddyy8. mmddyy Input date mmddyy10.

Selected date formats formatsData displayed Input dataformat statement Results Datew.ddmmmyyyy366format date Date9. 01Jan1961 ddmmmyy366format date Date7. 01Jan61 MMDDYYw.mmddyy366format date mmddyy8. 01/01/61 mmddyyyy366format date mmddyy10. 01/01/1961

More examples of informat for numeric data InformatDefinitionInput data INPUT statement Results COMMAw.dRemoves embedded commas and $, converts left parentheses to minus sign $1,000Input income comma6.0; 1000 (1,234)Input income comma7.0; Percentw.Converts percent to numbers (20%)Input value Percent5.; -0.2 w.dReads standard numeric data -12.3Input value 5.1; -12.3

More examples of informat for character data InformatDefinitionInput dataINPUT statement Results $CHARw.Reading character data, do not trim leading or trailing blanks my catInput animal $char10.; my cat Input animal $char10.; my cat $w.Read character data, trim blanks my catInput animal $char10.; my cat Input animal $char10.; my cat CODE

Import/Export Data To Export SAS datasets  Go to the File menu and select “Export Data”  Choose the data file ( from the library Work)  Locate and select file type using the browse button  Save the data set and finish  Check the log to make sure the data set was created  This method does not require a data step, but any modification may require a data step  Convenient for Excel file Import a SAS data set follows similar step 36

37 Read data in Excel file (1) Use Import procedure File  Import data  Choose Microsoft Excel  Click “Next”  Select work sheet using browse  select “Table” you want to import  Click “Next”  Select Library and assign a file name  Click “Next” if you want Proc import SAS code generated, otherwise click “Finish”  Check the log window to make sure the data was created successfully. (2) Save Excel file as a csv file, then read in using the infile statement with dsd option.