Presentation on theme: "Chapter 2: referencing Files and Setting Options."— Presentation transcript:
Chapter 2: referencing Files and Setting Options
SAS Libraries Every SAS file is stored in a SAS library. SAS data set is one type of SAS file. In some operating environment, a library is a physical collection of files. In others, such as Windows and Unix environments, a library is a logical name consisting of a group of files that are stored in a physical location in a storage space. Library can be Temporary or Permanent. A SAS library must be prepared in order for a SAS program to reach the directory to either read or output a SAS data set. SAS program only need to recognize the Library reference name. Hard Drive A Library Name Path to the physical HD location
Reference a SAS file in a SAS Library A SAS library name has two-levels: LIBREF.Filename Libref is the the SAS Library name that is connected to a physical directory in a storage location in your computer. fielname is a file stored in the directory referred to the Libref.
Two types of SAS Library (A) Temporary SAS Library for hosting temporary SAS data sets: The LIBREF is always WORK, which is already available in the Libraries folder in Explore Panel of the SAS working environment. Example: WORK.admit is a temporary SAS data set. WORK is the LIBREF and the data set name is admit NOTE: All of the SAS data sets stored in the WORK library will disappear after log off the SAS session. NOTE: one can ignore ‘WORK’ and specify the data set as admit, if it is stored in the WORK library as temporary library. Fro example, in the DATA step: DATA admit2; is the same as DATA work.admit2;
(B) Permanent SAS Library: The data sets hosted in the permanent SAS library remains in the SAS session, but the files are stored physically in the HD as defined. The Libref is defined by the user. For example: Mylib.admit refers to a SAS data set admit which is stored in the library named Mylib. Mylib is user defined SAS library. Admit is a file stored in the corresponding physical location in the hard drive.
How to assign a SAS Library If you want to use the WORK library to store your file, there is no need to define WORK library. It is already created by SAS when you login. If you want to create you own library, there are two ways: (1)By the pull-down menu, as described in the SAS Window Environment document.
(2) By using a SAS statement as below to define a SAS library: LIBNAME libref ‘the path link to the physical folder in HD’; NOTE: libref is a logical name for the entire folder in HD. The folder can have many data sets. Each data set in the folder will be called: libref.dasetname Example: you store data sets: admit, budget, tuition in the folder ‘UNIVERSITY’ in C-drive. You define a SAS library ‘COST’ link to these files by: LIBNAME cost ‘C:\university’; The data sets will be named in your SAS program as: Cost.admit, cost.budget, cost.tuition NOTE: the names can be upper or lower cases.
Rules required for a Valid SAS Library name are limited to 8 characters must start with a letter or underscore can contain only letters, numbers, or underscores. Example: s575, _s575, s575_ s575_ are valid LIBREF S-575, sta575_online are not valid
How Long Libref remains in effect The LIBNAME statement is a global statement. A global statement will remain in effect until you modify them, cancel them or end your SAS session. Although we say the library is permanent, this means your data set in the SAS library (in the physical storage) is permanent, but not the LIBREF. You still need to assign a libref to each permanent library in order to access these data sets in each SAS session. NOTE: If you use the Pull-Down menu to create your permanent and check ‘Enable at Startup’, then, the LIBREF will be available when you login without LIBNAME statement.
Referencing files in other formats You can use LIBNAME statement to reference not only SAS files, but also files created by other software products, such as database management systems. SAS uses appropriate SAS engine designed to connect to these specific software products. Files from non-SAS software EngineSAS data library LIBNAME Libref Engine ‘path to the physical location’; Some available engines are BMDP, SPSS, OSIRIS Allows read-only access to BMDP, SPSS, OSIRIS files See Help document for more details, if needed.
Where to find the Library created and the contents in the library and in each data set? Once the library is created, it appears in the folder called ‘Libraries’ on the left panel (Explore Panel) of the SAS working interface. To see the content of a SAS data set, click on the data set to open the data set in ‘Tableview’ window. Close the Tableview window afterwards. One can use SAS statements to view the contents of a SAS library and the detailed DATA descriptor information of any SAS data set.
Exercise Write a SAS program to read the following SAS data set located in the class webiste, Pilots.sas7bdat This data consists of pilots employed at an airline. The variables are VariableTypeLengthDescription IDchar4ID number LastNamechar10last name FirstNamechar9first name Citychar12city Statechar2state Genderchar1gender JobCodechar3job code Salarynum8current salary Birthnum8birth date Hirednum8date hired HomePhonechar12home phone number In this program, you will do the following tasks: (1)Create a SAS library, mylib that connects to the folder in which Pilots data set is stored. (2)Read the SAS data set, Pilots (3) Create a new SAS data set, call: Pilotsnew, and store it in another SAS library call: mylib1 that connect to the folder, DataEx, inside Math707 folder. (4) Print the data. Save the SAS program, name it C2_readSASData to your C-drive in a new folder, SASEx inside Math707,
Answer to Exercise Libname mylib ‘c:\math707\sasdata’; Libname mylib1 ‘c:\math707\ dataex’; Data mylib1.pilotsnew; Set mylib.pilots; Run; Proc print data = mylib1.pilotsnew; Run;
View contents of entire Library and/or Data descriptor of a data set In practical situation, a SAS library often consists of many data sets shared by different users. Therefore, it is a good practice to find out the contents in the library. SAS has two SAS procedures to display the contents in the library as well as for each SAS data set: PROC CONTENTS ; RUN; PROC DATASETS ; CONTENTS ; QUIT;
View the contents in the entire library without data descriptor /* To display all SAS data sets in Mylib library */ p roc contents data=mylib._all_ nods; run; /*Or use the following procedure */ proc datasets; contents data=mylib._all_ nods; Quit; NOTE: the filename _all_ is a SAS designated variable name referring to all files in the mylib library. NODS: is a key word referring to NO Data Descriptor details NOTE: The statement inside /* */ is a comment statement.
View detail data descriptor information of a data set /*view the data descriptor information for the SAS data set admit */ PROC CONTENTS data=mylib.admit; run; /* One can also use the following procedure * PROC DATASETS; CONTENTS data=mylib.admit; QUIT; NOTE: The variables are listed in alphbetic order by default.
View detail data descriptor information of a data set in table column order for the variables in the data set One can list the variable order based on the order it created in the SAS data set by using the option: VARNUM PROC CONTENTS data=mylib.admit varnum; Or PROC DATSETS ; CONTENTS DATA=mylib.ADMIT VARNUM; QUIT;
Exercise Open the SAS program C2_readSASdata program, and use PROC CONTENTS as well as PROC DATASETS to (1)View only the SAS data sets in mylib library. (2)View the detailed data descriptor for the SAS data set pilots in mylib. (3)View the detailed data descriptor for the SAS data set pilots with the table column variable order. (4)Save the SAS program, name it C2_Contents, to your SASEx folder.
Answer to Exercise /* use proc contents, display all sas data sets in mylib*/ Libname mylib ‘c:\math707\sasdata’; Proc contents data = mylib._all_ nods; Run; /* use proc datasets, display all sas data sets in mylib */ proc datasets; contents data=mylib._all_ nods; Quit; /* use proc contents, display details of sas data set pilots with variables in alphabetic order */ Proc contents data = mylib.pilots; Run; proc datasets; contents data=mylib.pilots; Quit; /* use proc contents, display details of sas data set pilots with variables in table column order */ Proc contents data = mylib.pilots varnum; run; proc datasets; contents data=mylib.pilots varnum; Quit;
Setting SAS System Options SAS system options for each window can be set using Tools, Options, System to set the system options using Pull-down menu, or use SAS statement to specify System options: NOTE: One can set system options for SAS Listing output regarding to Line size, page size, the page number, the date and time to be displayed, and many others. These options will not affect the HTML output format.
Setting System Options The general syntax: OPTIONS options; Some useful options are: DATE|NODATE: to print date and time or not (Default is DATE) NUMBER\NONUMBER: to print page # or not. Default is number and all numbers are cumulated until renumbered. PAGENO = n: by default, page # are cumulated. Use PAGENO=n to reset the starting page #. For example, PAGENO=3 will reset the page # starting at page 3, and begin cumulating from that point on. PAGESIZE = n|max LINESIZE=n|max: Note: If an observation need more than one line, it continues on to next line. NOTE: OPTIONS statement is a global statement. Can appear anywhere in your program to change the setting from that point on. NOTE: It is a good practice to place OPTIONS statements outside the DATA or PROC steps.
Exercise Open C2_Contents program, and practice the following SAS system options using OPTIONS statement. Delete all RPOC DATASETS procedures. Add options statement at the end of this program with the following options: Change options to NODATE, Set PAGENO starting at 1 for the output Set PAGESIZE to be 50 Set LINESIZE to be 80 Use proc contents to see the descriptor of admit data in mylib Use proc print statement to print admit data. Check results to see the effects of these options. Add another OPTIONS statement and change options back to DATE, PAGESIZE=max, LINESIZE=max, then, Use proc print to print PILOTS data in the mylib. Check the results to see the effect of the options. Save the program, named C2_SYSOptions to your SASEx folder
Answer to Exercise Libname mylib ‘c:\math707\sasdata’; Proc contents data = mylib._all_ nods; Run; Proc contents data = mylib.admit; run; Options nodate pageno=1 pagesize=50 linesize=80; Proc print data = mylib.admit; run; Options date pagesize=max linesize=max; Proc print data = mylib.pilots; run;
Handling two-digit years using System OPTIONS statement Many data use two-digit year such as 94 for for There is no confuse for 1994 using 94 now, but year 10 can be 1910 or This is Year 2000 Compliance problem. SAS uses OPTIONS YEARCUTOFF = year; to control the 2000 year compliance issue. This specifies the 100 year span for interpret two-digit year. The default yearcutoff = 1920 (interpret the 100 years span from 1920 to 2019 for the two-digit year. OPTIONS YEARCUTOFF = 1940; interpret 1940 to 2039 as 100 year span for two-digit year.
How does YEARCUTOFF work? OPTIONS YEARCUTOFF=1940; Interpret the 100 year from 1940 to 2039 OPTIONS YEARCUTOFF=1960 Date in the data setInterpreted as 8/26/158/26/ /25/6512/25/1965 5/7/905/7/1990 8/30/488/30/1948 Date in the data setInterpreted as 8/26/15 12/25/65 5/7/90 8/30/48
Specifying observations of SAS data set to be processed using OPTIONS statement In many applications, the # of observations (cases) is very large. It is important that a SAS program is correct before processing the entire data set. However, one needs to test if the program correctly process the data, one can specify only a small part of the data to be processed for testing purpose. This can be done by using OPTIONS statement. OPTIONS FIRSTOBS = n1 OBS= n2 ; FIRSTOBS = n1 will read the data starting at the n1th observation. OBS=n2 will read the data set ending at the n2th observation. Example: OPTIONS FIRSTOBS=5 OBS=15; Will read from the 5 th observations until the 15 th observations. Default n1 and n2 are: FIRSTOBS=1 and OBS=MAX To reset reading the entire data set, use OPTIONS FIRSTOBS = 1 OBS =MAX;
GLOBAL statement Vs. Local Statement SAS defines some statements as global statements such as LIBNAME statement, OPTIONS statement. They take effect once it is defined and overwritten by the next statement in the same program during the same SAS session. Most of SAS statements are local, meaning it takes in effect only at the time it appears. If the same task defined in a global and in a local statement, the local statement overwrites the global statement at the point, but return to the global statement afterwards.
Exercise Write a program to (1) Read and print the sas data set Admit using the following options: Pageno=1, firstobs=5 and obs = 15 (2) Add another options statement to the program with the options: Firstobs=3 and obs=8 And print the data set Admit again. Observe the output and make sure you understand the reason for getting the output. (3) Reset the options with Pageno=1, firstobs=1 and obs=max, then print the Admit data. (4) Save the program as C2_sysoptions2 to SASEx folder
Answer Libname mylib ‘c:\math707\sasdata’; Options pageno=1 firstobs=5 obs=15; Data admitn; set mylib.admit; Proc print data=admitn; run; Proc print data = mylib.admit; run; Options firstobs=3 obs=8; Proc print data = admitn; run; Proc print data = mylib.admit; run; Options firstobs=1 obs=max pageno=1; Proc print data=admitn; run; Proc print data = mylib.admit; run;
FIRSTOBS=, OBS= as local options in a PROC PRINT procedure PROC PRINT procedure is the most common procedure to print the data. The general syntax is: PROC PRINT ; RUN; The following examples use Local options in PROC PRINT to specify observations: PROC PRINT data=mylib.admit (FIRSTOBS=5 OBS=15); Will print 5 th observations to 15 th observations.
More on Local Options Vs. Global Options in PROC PRINT OPTIONS FIRSTOBS=10 OBS=18; /* Uses the global OPTIONS. Since there is no local option*/ proc print data = mylib.admit; title 'print 10th to 18th cases'; /*Uses local option for Firstobs = 15, and use global option for obs=18 */ PROC PRINT data=mylib.admit (firstobs=15); title 'prints cases 15 to 20'; run; /*uses local option for Firstobs = 12, and obs=16. Since local options overwrite global option for the specific procedure.*/ PROC PRINT data=mylib.admit (firstobs=12 OBS=16); title 'prints cases 12 to 16 '; run; /*Uses local option for Firstobs = 5, and obs=20. Since local options overwrite global option for the specific procedure.*/ PROC PRINT data=mylib.admit(firstobs=5 obs=20); title 'prints 5 to 20 '; run;
More System Options See SAS Help Documents and a few additional options in textbook.
Exercise Write a SAS program to do the following: (1)Create the library Mylib to connect to the SASData folder as usual. (2)Use options: pageno=1 firstobs=5 obs=15 (3)Print data set admit in mylib (4)Print data set admit using local options (firstobs = 3 obs =12) in proc print statement. (5)Add system options statement with firstobs =1 and obs =15. (6)Print data set admit using local options (firstobs = 10 obs =20) in proc print statement. (7)Add system options statement with firstobs =1 and obs =max. (8)Print data set admit using local options (firstobs = 3 obs =12) in proc print statement. Save the program as c2_glob_loc_options to SASEx folder