Presentation is loading. Please wait.

Presentation is loading. Please wait.

11 Chapter 3: Reading and Processing Data 3.1 Processing SAS Data Sets 3.2 Processing External Files.

Similar presentations


Presentation on theme: "11 Chapter 3: Reading and Processing Data 3.1 Processing SAS Data Sets 3.2 Processing External Files."— Presentation transcript:

1 11 Chapter 3: Reading and Processing Data 3.1 Processing SAS Data Sets 3.2 Processing External Files

2 22 Chapter 3: Reading and Processing Data 3.1 Processing SAS Data Sets 3.2 Processing External Files

3 3 Objectives 3 Use SAS file I/O functions to manipulate SAS data sets. Retrieve metadata.

4 4 Managing SAS Data Sets The Orion Star programmers need macros to perform the following data management tasks: 1.Test the existence of a data set. 2.Determine the number of observations in a data set. 3.Determine the age of a data set. 4.Archive a data set. 5.Create a data set for every worksheet in an Excel workbook. They decided to use the SAS File I/O functions and metadata to accomplish these tasks. 4

5 5 Using Functions to Manipulate Files SAS supports different ways to manipulate and obtain information about SAS files and other files. Many of these techniques require a DATA step or PROC step to be part of the SAS code. Some functions, generally used in the DATA step and SCL, permit direct access to files. These functions, when used with the macro facility, enable the same direct access without introducing additional program steps. The functions can be categorized into two groups: SAS file I/O functions external file functions 5

6 6 SAS File I/O Functions Functions to access a SAS data set: EXIST OPEN CLOSE Functions to access data set descriptor information: DSNAME VARNUM ATTRC ATTRN Functions to access data library information: LIBREF PATHNAME 6

7 7 Task 1: Determine Data Set Existence Use the EXIST function to test for the existence of a data set before progressing further into a macro program. 7 %macro printds(dset); %if %sysfunc(exist(&dset))= 0 %then %do; %put ERROR: Data set &dset does not exist.; %put ERROR- Macro will terminate now.; %return; %end; proc print data=&dset (obs=10) noobs; title "First 10 Observations from &dset"; run; %mend printds; m203d01

8 8 Task 1: Determine Data Set Existence Partial SAS Log 8 %printds(orion.daily_sales) NOTE: There were 10 observations read from the data set ORION.DAILY_SALES. NOTE: PROCEDURE PRINT used: real time 0.01 seconds cpu time 0.00 seconds 29 %printds(orion.daily) ERROR: Data set orion.daily does not exist. Macro will terminate now. m203d01

9 9 Task 2: Obtain Attribute Information The Orion Star programmers found that many times a data set might exist but is empty. They want to verify that a data set is not empty before performing further processing. The following steps provide data set attribute information: 1.Open the data set using the OPEN function. 2.Retrieve a numeric attribute using the ATTRN function. 3.Retrieve a character attribute using the ATTRC function. 4.Close the data set using the CLOSE function. 9

10 10 Step 1: Open the SAS Data Set The OPEN function opens a SAS data set and returns a unique numeric data set identifier. The data set identifier, a nonzero positive number, is used in most other SAS File I/O functions. The OPEN function returns 0 if the data set cannot be opened. General form of the OPEN function: Partial SAS Log 10 OPEN(data-set-name) 4 %let dsid=%sysfunc(open(orion.daily_sales)); 5 %put dsid=&dsid; dsid=1

11 11 Step 2: Use the ATTRN Function The ATTRN function returns the value of a numeric attribute of a data set. General form of the ATTRN function: Selected attribute-name values and descriptions: 11 ATTRN(data-set-identifier, attribute-name) CRDTE creation date (SAS datetime value) MODTE the last modified date (SAS datetime value) NVARS number of variables ISINDEX whether a data set is indexed (0 or 1) NLOBS number of non-deleted observations

12 12 Step 3: Use the ATTRC Function The ATTRC function returns the value of a character attribute of a data set. General form of the ATTRC function: Selected attribute-name values and descriptions: 12 ATTRC(data-set-identifier, attribute-name) SORTEDBY BY variables (if data set is sorted) LABEL data set label MEM data set name LIB current libref for the data set

13 13 Step 4: Close the SAS Data Set The CLOSE function closes a SAS data set. The CLOSE function returns 0 if the operation was successful and returns a nonzero value if it was not successful. General form of the CLOSE function: Partial SAS Log It is important to close all SAS data sets as soon as they are no longer needed by the application. 13 CLOSE(data-set-identifier) 6 %let dsidc=%sysfunc(close(&dsid)); 7 %put dsidc=&dsidc; dsidc=0

14 14 Obtaining Number of Observations Use the NLOBS attribute to obtain the number of observations in a data set and assign this value to a macro variable. 14 %macro numobs(dsn); %local dsid nobs dsidc; %let dsn=%upcase(&dsn); %let dsid=%sysfunc(open(&dsn)); %let nobs=%sysfunc(attrn(&dsid,nlobs)); %let dsidc=%sysfunc(close(&dsid)); %if &nobs=0 %then %do; %put ERROR: &dsn contains 0 Observations.; %put ERROR- PROC PRINT will not execute.; %return; %end; proc print data=&dsn (obs=10) noobs; title "First 10 Observations"; title2 "&dsn Contains &nobs Observations"; run; %mend numobs; m203d02

15 15 Obtaining Number of Observations Partial SAS Log 15 231 %numobs(orion.daily_sales) NOTE: There were 10 observations read from the data set ORION.DAILY_SALES. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 232 %numobs(orion.no_rows) ERROR: ORION.NO_ROWS contains 0 Observations. PROC PRINT will not execute. m203d02

16 16 Obtaining Number of Observations PROC PRINT Output 16 First 10 Observations ORION.DAILY_SALES Contains 58 Observations Total_ Product_ID Product_Name Retail_Price 220200200024 Pro Fit Gel Gt 2030 Women's Running Shoes $178.50 220200100092 Big Guy Men's Air Terra Sebec Shoes $83.00 240200100043 Bretagne Performance Tg Men's Golf Shoes L. $282.40 220100700024 Armadillo Road Dmx Women's Running Shoes $99.70 220200300157 Hardcore Men's Street Shoes Large $220.20 240200100051 Bretagne Stabilites 2000 Goretex Shoes $420.90 220200100035 Big Guy Men's Air Deschutz Viii Shoes $125.20 220200100090 Big Guy Men's Air Terra Reach Shoes $177.20 220200200018 Lulu Men's Street Shoes $132.80 240200100052 Bretagne Stabilities Tg Men's Golf Shoes $99.70 m203d02

17 17

18 18 3.01 Quiz 1.Open the program m203a01. 2.Add the syntax to create the macro variable SORTED that contains the SORTEDBY= attribute for the data set orion.staff. What is the value of &SORTED? 18 m203a01 %let dsn=orion.staff; %let openrc=%sysfunc(open(&dsn)); %let sorted= ; %let closerc=%sysfunc(close(&openrc)); %put Data set &dsn is sorted by &sorted..;

19 19 3.01 Quiz – Correct Answer 1.Open the program m203a01. 2.Add the syntax to create the macro variable SORTED that contains the SORTEDBY= attribute for the data set orion.staff. What is the value of &SORTED? Employee_ID 19 m203a01 %let dsn=orion.staff; %let openrc=%sysfunc(open(&dsn)); %let sorted=%sysfunc(attrc(&openrc,sortedby)); %let closerc=%sysfunc(close(&openrc)); %put Data set &dsn is sorted by &sorted..;

20 20 Task 3: Determine the Age of a SAS Data Set The Orion Star programmers need a way to determine when to refresh a data set. They decided to use the CRDTE attribute to calculate the age of a data set. 20 m203d03 %macro age(dsn); %local dsid crdate dsidc days; %let dsid=%sysfunc(open(&dsn)); %let crdate=%sysfunc(attrn(&dsid,crdte)); %let dsidc=%sysfunc(close(&dsid)); %let days=%sysevalf("&sysdate9"d -%sysfunc(datepart(&crdate))); %if &days > 0 %then %do; %put WARNING: &dsn is &days day(s) old. It is being recreated.; data &dsn; infile 'orders03.dat'; input Order_ID Order_Type Order_Date : date9.; format Order_Date date9.; run; %end; %else %put NOTE: &dsn is current.; %mend age;

21 21 Task 3: Determine the Age of a SAS Data Set Partial SAS Log 21 22%age(orion.orders03) WARNING: orion.orders03 is 1 day(s) old. It is being recreated. NOTE: The infile 'orders03.dat' is: Filename=C:\workshop\orders03.dat, RECFM=V,LRECL=256,File Size (bytes)=2496, Last Modified=31Jan2008:18:09:56, Create Time=16Jun2008:17:09:05 NOTE: 104 records were read from the infile 'orders03.dat'. The minimum record length was 22. The maximum record length was 22. NOTE: The data set ORION.ORDERS03 has 104 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.14 seconds cpu time 0.06 seconds 23%age(orion.orders03) NOTE: orion.orders03 is current. m203d03

22 22 Task 4: Archive a SAS Data Set Because many of Orion Star’s macro applications refresh SAS data sets, the programmers want to archive the current data set before the data set is refreshed. They decided to concatenate today’s date to the end of the data set name, using the RENAME and TODAY functions. General form of the RENAME function: General form of the TODAY function: 22 RENAME(old-name, new-name) TODAY( )

23 23 Task 4: Archive a SAS Data Set Partial PROC CONTENTS Output 23 m203d04 %let newname=daily_sales_%sysfunc(today(), date9.); %let rc=%sysfunc(rename(orion.daily_sales, &newname)); proc contents data=orion._all_ nods; run; Member File # Name Type Size Last Modified 1 COUNTRY DATA 17408 01Jul08:23:11:48 COUNTRY INDEX 17408 01Jul08:23:11:48 2 CUSTOMER DATA 33792 30Jul08:22:28:42 3 CUSTOMER_DIM DATA 33792 14Dec07:09:05:44 4 CUSTOMER_TYPE DATA 17408 30Jul08:01:29:54 CUSTOMER_TYPE INDEX 9216 30Jul08:01:29:54 5 DAILY_SALES_07OCT2008 DATA 9216 21Aug08:14:18:18 6 ORDER_FACT DATA 66560 10Jul08:19:45:26 7 SALES DATA 25600 27Jul08:21:40:55

24 24 Task 5: Create Data Sets from Worksheets The Orion Star programmers need a macro to import every worksheet in a given Excel workbook. 24 %READXLS Australia$United States$ Sales.xls AustraliaUnitedStates

25 25 Task 5: Create Data Sets from Worksheets The programmers will use SAS session metadata that is available via PROC SQL DICTIONARY tables or Sashelp views. The metadata includes information on the following: SAS files external files macro variables system options, titles, and footnotes 25

26 26 Task 5: Create Data Sets from Worksheets The macro will incorporate these elements: SAS/ACCESS LIBNAME statement sashelp.vtable an iterative %DO loop indirect macro variable references 26

27 27 Reading Excel Files Using the LIBNAME Statement The SAS/ACCESS LIBNAME statement extends the LIBNAME statement to support assigning a library reference name (libref) to Microsoft Excel workbooks. This enables you to reference worksheets directly in a DATA step or SAS procedure. Each worksheet in the Excel workbook is treated as though it were a SAS data set. 27 libname xlsdata 's:\workshop\c3\sales.xls'; proc contents data=xlsdata._all_; run; m203d05

28 28 Partial PROC CONTENTS Output 28 The CONTENTS Procedure Directory Libref XLSDATA Engine EXCEL Physical Name sales.xls User Admin DBMS Member Member # Name Type Type 1 Australia$ DATA TABLE 2 UnitedStates$ DATA TABLE Reading Excel Files Using the LIBNAME Statement

29 29 Reading Excel Files Using the LIBNAME Statement All worksheets will be referenced with a SAS two-level name, that is, libref.data-set-name. If the worksheet name contains special characters, you must use the SAS name literal construct of "name"n. 29 data australia; set xlsdata.'Australia$'n; run; m203d05

30 30 Using SAS Session Metadata Use sashelp.vtable to create a series of macro variables that contain the member names. Partial SAS Log 30 data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname); if last then call symputx('n', _n_); run; 473 %put _user_; GLOBAL SHEET1 Australia$ GLOBAL SHEET2 UnitedStates$ GLOBAL N 2 m203d05

31 31

32 32 3.02 Quiz Open the program m203a02 and replace the question marks in the SYMPUTX routine so that it creates macro variables containing the names of all of the data sets in the ORION library. 32 data _null_; set sashelp.vtable; where libname='ORION'; call symputx(cats('dsn', _N_), ??????????); run; %put _user_; m203a02

33 33 3.02 Quiz – Correct Answer Open the program m203a02 and replace the question marks in the SYMPUTX routine so that it creates macro variables containing the names of all of the data sets in the ORION library. 33 data _null_; set sashelp.vtable end=last; where libname='ORION'; call symputx(cats('dsn', _N_), memname); run; m203a02

34 34 Iterative %DO Loops (Review) The iterative %DO statement executes a section of a macro repetitively, based on the value of an index variable. General form of the iterative %DO statement: 34 %DO index-variable=start %TO stop ; text %END; %DO index-variable=start %TO stop ; text %END; %macro putloop; %do i=1 %to &n; %put Sheet&i is &&sheet&i; %end; %mend putloop; m203d05

35 35 The indirect reference causes a second scan of the macro variable reference. Partial Symbol Table 35 Indirect Macro Variable References (Review) reference 1st scan &&sheet&i &sheet1 Australia$ 2nd scan VariableValue I1 SHEET1Australia$ SHEET2UnitedStates$

36 36

37 37 3.03 Quiz How would you use indirect references to refer to the macro variables created in m203a02 so that you can use them in the following DO loop? 37 %do i=1 %to &n; %put The values of the macro variables are __________ ; %end;

38 38 3.03 Quiz – Correct Answer How would you use indirect references to refer to the macro variables created in m203a02 so that you can use them in the following DO loop? 38 do i=1 %to &n; %put The names of the macro variables are &&dsn&i ; %end;

39 39 Processing a Data Library Use a %DO loop to generate a DATA step and a PROC PRINT step for every worksheet in an Excel workbook. 39 m203d06 %macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear; %mend readxls; %readxls(sales.xls)...

40 40 Processing a Data Library The %LENGTH function is used to return the number of characters in &&SHEET&I. 40 %macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear; %mend readxls; %readxls(sales.xls)... m203d06

41 41 Processing a Data Library The %EVAL function enables subtraction of 1 from that length to create a macro variable LEN that is the length of the spreadsheet name without the $. 41 %macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear; %mend readxls; %readxls(sales.xls)...

42 42 Processing a Data Library The %SUBSTR function creates a macro variable DSN, beginning at position 1, for the length number of characters in the macro variable LEN. 42 %macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear; %mend readxls; %readxls(sales.xls)

43 43 Processing a Data Library Partial SAS Log 43 m203d06 %readxls(sales.xls) NOTE: There were 63 observations read from the data set XLSDATA.'Australia$'n. NOTE: The data set WORK.AUSTRALIA has 63 observations and 9 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds NOTE: There were 63 observations read from the data set WORK.AUSTRALIA. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds NOTE: There were 102 observations read from the data set XLSDATA.'UnitedStates$'n. NOTE: The data set WORK.UNITEDSTATES has 102 observations and 9 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds NOTE: There were 102 observations read from the data set WORK.UNITEDSTATES. NOTE: PROCEDURE PRINT used (Total process time): real time 0.01 seconds cpu time 0.01 seconds

44 44 Exercise This exercise reinforces the concepts discussed previously. 44

45 45 Chapter 3: Reading and Processing Data 3.1 Processing SAS Data Sets 3.2 Processing External Files

46 46 Objectives Use external file functions to examine files that are not SAS files. 46

47 47 Processing External Files The Orion Star programmers want to reduce redundant code when reading multiple external files into SAS data sets. The applications should be able to process the files in a given directory and subdirectory in order to accomplish the following tasks: 1.Process all DAT files. 2.Import all CSV files. 3.Read every worksheet in all of the Excel workbooks. They decided to use the external file functions to accomplish these three tasks. 47

48 48 External File Functions Functions to access a directory: DOPEN DNUM DREAD DCLOSE Functions to access an external file: FILEEXIST and FEXIST FILENAME FOPEN FCLOSE Functions to read from or write to an external file: FREAD FGET FPUT and FWRITE 48

49 49 Processing External Files 1.Use the FILENAME function to assign a fileref to the directory. 2.Use the DOPEN function to open the directory. 3.Use the DNUM function to identify how many members are in the directory. 4.Use the DREAD function to extract each member name. 5.Process the external files. 6.Use the DCLOSE function to close the directory. 49 %SYSFUNC is required to execute these functions within the macro facility. The DOPEN, DNUM, and DREAD functions enable access to all the external files found in a given directory. Use these steps for processing files from a directory:

50 50 Steps 1 and 2: Access a Directory For applications to extract information about a directory and its contents, it is necessary to first open the directory using the DOPEN function. If it is successful, the function returns a directory identifier. 50 m203d07 %macro direxist(dir); %local fileref rc did didc; %let rc=%sysfunc(filename(fileref,&dir)); %let did=%sysfunc(dopen(&fileref)); %if &did=0 %then %do; %put ERROR: Directory does not exist; %return; %end; %put NOTE: Directory ID is &did ; %let didc=%sysfunc(dclose(&did)); %let rc=%sysfunc(filename(fileref)); %mend direxist; %dirlist(s:\workshop)

51 51 Steps 1 and 2: Access a Directory Partial SAS Log 51 50 %direxist(s:\workshop) NOTE: Directory ID is 1 51 52 %direxist(s:\bad directory) ERROR: Directory does not exist m203d07

52 52 Steps 3 and 4: Identify Members in a Directory To extract a list of member names, use the DNUM and DREAD functions. 52 %macro dirlist(dir); %local fileref rc did dnum dmem memname didc; %let rc=%sysfunc(filename(fileref,&dir)); %let did=%sysfunc(dopen(&fileref)); %if &did=0 %then %do; %put ERROR: Directory does not exist; %return; %end; %let dnum=%sysfunc(dnum(&did)); %do dmem=1 %to &dnum; %let memname=%sysfunc(dread(&did,&dmem)); %put &memname; %end; %let didc=%sysfunc(dclose(&did)); %let rc=%sysfunc(filename(fileref)); %mend dirlist; %dirlist(s:\workshop) m203d08

53 53

54 54 3.04 Quiz Open the program m203d08, submit it, and investigate the log. 1.Are the extensions of the raw data files in uppercase or lowercase? 2.Are the extensions of the Excel workbooks in uppercase or lowercase? 54

55 55 3.04 Quiz – Correct Answer Open the program m203d08, submit it, and investigate the log. 1.Are the extensions of the raw data files in uppercase or lowercase? The extension is DAT in lowercase. 2.Are the extensions of the Excel workbooks in uppercase or lowercase? The extension is XLS in lowercase. 55

56 56 Steps 3 and 4: Identify Members in a Directory Partial SAS Log 56 m203d08 442 %dirlist(s:\workshop) age.sas attrc.sas attrn.sas between.sas C2 C3 C4 C5 charlist.sas club_members.sas7bdat country.sas7bdat country_lookup.sas7bdat customer.sas7bdat customer_dim.sas7bdat customer_type.sas7bdat daily_sales.sas7bdat daily_sales.xls delsql.sas delvars.sas

57 57 Task 1: Reading All DAT Files in a Directory This demonstration illustrates reading each raw data file in a directory into a SAS data set. 57 m203d09

58 58 Task 2: Read All Excel Workbooks The Orion Star programmers want a single macro to import all Excel files found in a given directory. 58 %READXLS order_type.xls customertype.xlsdaily_sales.xlsOrderFact.xlsSales.xls order_typecustomertypesalesorderfactdaily_sales

59 59 Task 2: Read All Excel Workbooks Currently the READXLS macro accepts a single workbook name as a parameter. The programmers want to enhance the macro to read all workbooks in a directory. Partial SAS Code 59 %macro readxls(dir); %local fileref rc did dnum dmem memname len dsn didc; %let rc=%sysfunc(filename(fileref,&dir)); %let did=%sysfunc(dopen(&fileref)); %if &did=0 %then %do; %put ERROR: Directory does not exist; %return; %end; %let dnum=%sysfunc(dnum(&did)); %do dmem=1 %to &dnum; %let memname=%sysfunc(dread(&did,&dmem)); %if %upcase(%scan(&memname,-1,.))=XLS %then %do; m203d10

60 60 Task 2: Reading All Excel Files in a Directory This demonstration illustrates reading all of the worksheets in all workbooks in a directory into a SAS data set. 60 m203d10

61 61 Task 3: Read All Excel Files in Subdirectories To implement subdirectory recursion, use the %SCAN function to extract the second word of the member name where the period is the delimiter. n If the second word is XLS, then read the Excel spreadsheet. n If the second word resolves to null, there is no extension, so the first word identifies a subdirectory. Therefore, call the macro again. 61 m203d11 %else %if %scan(&memname,2,.)= %then %readxls(&dir\&memname); Partial SAS Code

62 62 Task 3: Reading All Excel Files in Subdirectories This demonstration illustrates reading all of the worksheets in all workbooks in a directory and a subdirectory into a SAS data set. 62 m203d11

63 63 Exercise This exercise reinforces the concepts discussed previously. 63


Download ppt "11 Chapter 3: Reading and Processing Data 3.1 Processing SAS Data Sets 3.2 Processing External Files."

Similar presentations


Ads by Google