Presentation is loading. Please wait.

Presentation is loading. Please wait.

EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.

Similar presentations


Presentation on theme: "EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1."— Presentation transcript:

1 EPIB 698C Lecture 2 Notes Instructor: Raul Cruz raulcruz@umd.edu 2/14/11 1

2 Creating Data in SAS, an overview Creating datasets by hand entry (Viewtable window, CARDS, DATALINES statement) Reading dataset from external files (not SAS data, INFILE statement) Using Import/Export facility (a point-and-click approach) 2

3 Entering Data with Viewtable window To open viewtable window, select “Table Editor” from the Tools menu. An empty viewtable window will appear The letters at the tops of columns are default variable names. Right click on the letter and open the Column attributes window. You can replace variables name, type, etc 3

4 Entering Data with Viewtable window Entering you data once you have defined your columns. To save your table, select “Save as” from the File menu, then select a library and specify the name of your table (SAS dataset) To open an existing table, go to Tools  Table editor, the view table window will be opened. Then go the File menu, click Open  choose library  select Table name. To switch from browse mode (default) to edit mode, select “Edit Mode” from the Edit menu Using Viewtable you can easily create a data table by setting the columns, you can add rows to the table. However, you can not add columns once you finish defining your columns. This is a considerable disadvantage. 4

5 Reading Data Inline, CARDS Statement You enter the actual data points inside the PROGRAM EDITOR Example: CARDS statement data instructor; input name $ gender $ age; cards; Jane F 30 Mary F 29 Mike M 28 ; run; 5  $ sign after gender means that gender is a character variable

6 Examples Reading multiple observations in each line of data by adding the @@ symbol at the end of the input statement data aaa; input x y @@; datalines; 1 12.4 1 11.3 1 1.4 2 2.1 2 19.4 1 10.0 2 13.2 1 11.6 ; run; 6

7 Reading Dataset from External Files: INFILE Statement Identifies the external file that contains the data and has options that control how the records in file are read into SAS Must be used before the input statement because it locates the data file to be read Syntax data data_set_name; Infile directory_and_file_name; input variable_list; run; 7

8 Reading raw data separated by spaces—list input List input (also called free formatted input) can read data separated by at least one space. By default, SAS assumes data values are separated by one or more blanks. Will read all the data in a record, no skipping unwanted values Any missing data must by indicated with a period Character data must be simple, no embedded spaces 8

9 Reading raw data separated by spaces—list input Example: data demographics; infile "C:\test.txt“; input Gender $ Age Height Weight; run; 9

10 Specify missing values with list input We use a period to represent missing values M 50 168 155 M 50 168 155 F 23 160 101 F. 160 101 M 65 172 220 M 65 172 220 F 35 165 133 F 35 165 133 M 15 171 166 M 15 171 166 10

11 Reading raw data separated by commas Comma separated values file (csv file) use commas as data delimiters They may or may not enclose character values in quotes. Example: test.csv "M",50,68,155 "F",23,60,101 "M",65,72,220 "F",35,65,133 "M",15,71,166 11

12 Reading raw data separated by commas data demographics; infile 'c:\test.csv' dsd; input Gender $ Age Height Weight; run; Dsd: means delimiter sensitive data. It has several functions. (1)change the default delimiter from a blank to a comma. (2)If there are two delimiters in a row, it assumes there is a missing value in between (3)If character values are placed in quotes, the quotes are stripped from the value 12

13 INFILE statement, useful options DSD It recognizes two consecutive delimiters as a missing value. Example: 20,30,,50, SAS will treat this as 20 30 50. but with the the dsd option SAS will treat it as 20 30. 50. It allows you to include the delimiter within quoted strings. Example: a comma separated file and your data included values like "George Bush, Jr.” With the dsd option, SAS will recognize that the comma in "George Bush, Jr." is part of the name, and not a separator indicating a new variable. 13

14 INFILE statement (cont.) DLM= The dlm= option can be used to specify the delimiter that separates the variables in your raw data file. dlm=‘,’ indicates a comma is the delimiter (e.g., a comma separated file,.csv file). dlm='09'x indicates that tabs are used to separate your variables dlm=‘:’ indicates a colon is the delimiter 14

15 INFILE statement (cont.) We can use dsd and dlm at the same time: infile ‘file-description’ dsd dlm=‘:’ ; This combination of options performs all the actions requested by the DSD option, but overrides the default delimiter (comma) with a delimiter of your choice 15

16 INFILE statement (cont.) useful options continued… missover: if number of variables in file does not match number of variables in input file all remaining variables are set to missing. obs: specifies the last record to be read into the data set firstobs: specifies the first line of data to be read into data set. Useful if there is a header row in the dataset. CODE 16

17 Filename statement Filename statement identifies the file and associate it with a reference name then use this reference in your INFILE statement instead of the actual file name filename mydata 'C:\test.csv'; data demographics; infile mydata dsd; input Gender $ Age Height Weight; run; 17 CODE

18 The INFILE option can be the DATALINES statement data demographics; infile datalines dsd; input gender $ Age Height Weight; datalines; "M",50,68,155 "F",23,60,101 "M",65,72,220 "F",35,65,133 "M",15,71,166 ; run; 18 CODE

19 Reading data from fixed columns Many raw data files store specific information in fixed columns The advantage of fixed column files: (1) don’t need to worry about missing values (2) you can choose which variables to read and in what order to read them 19

20 Bank data 123456789012345678901  columns 00110/ 21/1955M 1145 00211/ 18/2001F 18722 Column 1-3 : subject ID Column 4-13: Date of birth Column 14-14: gender Column 15-21: Account balance 20

21 Column Input data financial; infile "C:\bank.txt"; input Subj $ 1-3 DOB $ 4-13 Gender $ 14 Balance 15-21; run; 21 CODE

22 Formatted Input Formatted input can read both character and standard numerical values as well as nonstandard numerical values, such as numbers with dollar signs and commas, and dates. Formatted input is the most common and powerful of all input methods SAS Formats and Informats: An informat is a specification for how raw data should be read. A format is a layout specification for how a variable should be printed or displayed. 22

23 Formatted Input The bank data: data financial; infile "C:\bank.txt"; input @1 Subj $3. @4 DOB mmddyy10. @14 Gender $1. @15 Balance 7.; run; 23

24 Formatted Input The @ sign the INPUT statement are called column pointers. @4 tells SAS to go to column 4. Following variable names are SAS informats. Informats are built-in instructions that tell SAS how to read a data value 24

25 Formatted Input Two of the most basic informats are w.d and $w. The w.d informat reads standard numeric values. The w tells SAS how many columns to read; the optional d tells SAS that there is an implied decimal point in the value. For examples: data value is 123, With informat 3.0, SAS will save it as 123; With informat 3.1, SAS will save it as 12.3; If the data value already has a decimal in it, then SAS ignores the d option. For examples: data value is 1.23, With informat 4.1, SAS will it as 1.23; 25

26 Formatted Input The $w. Informat tells SAS to read w columns of character data The MMDDYY10. informat tells SAS that the date you are reading is in the mm/dd/yyyy form. SAS reads the date and converts the value into a SAS date. SAS stores dates as numeric values equal to the number of days from January 1, 1960. Eg, if you read 01/01/1960, SAS stores a value of 0. The data 01/02/1960 is stored as a value of 1. 26 CODE

27 The format statement The format statements are built-in SAS command that allow you to display data in easily readable ways. All SAS formats command ends either in a period or in a period follows by a number. title "Listing of FINANCIAL"; proc print data=financial; format DOB mmddyy10. Balance dollar11.2; The dollar11.2 tells SAS to put a $ sign in front of the number, and allow up to 11 columns to print the balance values, the 2 tells SAS to include two decimal places after the decimal points. 27 CODE

28 Using a format/informat statement in a DATA step It is usually more useful to place your format statement with a SAS data step. There is a permanent association of the formats and variables in the data set. You can override any permanent format by placing a FORMAT statement in a particular procedure. 28 CODE

29 A informat statement with list input Following the key word informat, you list each variable and information you want to use to read each variable data list_example; informat Subj $3. Name $20. DOB mmddyy10. Salary dollar8.; infile 'c:\list.csv' dsd; input Subj Name DOB Salary; format DOB date9.; 29 CODE

30 data list_example; infile 'c:\list.csv ' dsd; input Subj : $3. Name : $20. DOB : mmddyy10. Salary : dollar8.; format DOB date9.; run; (1)there is a colon (called an informat modifier) preceding of each informat. It tells SAS to use informat supplied but to stop reading the values for this variable when a delimiter is met. (2) Without it, SAS may read past a delimiter to satisfy the width specified in the informat. 30

31 What are the Differences Informats used at input  Usually reading external data Formats used during output cycle  Write formatted value to output

32 Selected date informats InformatsData formInput dataInput statement Datew.ddmmmyyyy1Feb1961Input date Date9. ddmmmyy1Feb61Input date Date7. DDMMYYw.ddmmyy01-02-61Input date ddmmyy8. ddmmyyyy01/02/1961Input date ddmmyy10. MMDDYYw.mmddyy02-01-61Input date mmddyy8. mmddyy02-01-1961Input date mmddyy10.

33 Selected date formats formatsData displayed Input dataformat statement Results Datew.ddmmmyyyy366format date Date9. 01Jan1961 ddmmmyy366format date Date7. 01Jan61 MMDDYYw.mmddyy366format date mmddyy8. 01/01/61 mmddyyyy366format date mmddyy10. 01/01/1961

34 More examples of informat for numeric data InformatDefinitionInput data INPUT statement Results COMMAw.dRemoves embedded commas and $, converts left parentheses to minus sign $1,000Input income comma6.0; 1000 (1,234)Input income comma7.0; -1234 Percentw.Converts percent to numbers (20%)Input value Percent5.; -0.2 w.dReads standard numeric data -12.3Input value 5.1; -12.3

35 More examples of informat for character data InformatDefinitionInput dataINPUT statement Results $CHARw.Reading character data, do not trim leading or trailing blanks my catInput animal $char10.; my cat Input animal $char10.; my cat $w.Read character data, trim blanks my catInput animal $char10.; my cat Input animal $char10.; my cat CODE

36 Import/Export Data To Export SAS datasets  Go to the File menu and select “Export Data”  Choose the data file ( from the library Work)  Locate and select file type using the browse button  Save the data set and finish  Check the log to make sure the data set was created  This method does not require a data step, but any modification may require a data step  Convenient for Excel file Import a SAS data set follows similar step 36

37 37 Read data in Excel file (1) Use Import procedure File  Import data  Choose Microsoft Excel  Click “Next”  Select work sheet using browse  select “Table” you want to import  Click “Next”  Select Library and assign a file name  Click “Next” if you want Proc import SAS code generated, otherwise click “Finish”  Check the log window to make sure the data was created successfully. (2) Save Excel file as a csv file, then read in using the infile statement with dsd option.


Download ppt "EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1."

Similar presentations


Ads by Google