Presentation on theme: "Statistics in Science Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy."— Presentation transcript:
Statistics in Science Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy
Statistics in Science Need to know –SAS environment –SAS files (datasets, catalogs etc) & libraries –SAS programs How to: Get data in Manipulate data Get results out
Statistics in Science SAS software environment
Statistics in Science SAS Windows (SAS 9)
Statistics in Science Some (!) SAS windows –Editor Where code is written or imported, and submitted –Log What happened, including what went wrong –Output Results of program procedures that produce output –Explorer Shows libraries (SAS & Windows), their files, and where you can see data, graphs –Results Shows how the output is made up of tables, graphs, datasets etc –Notepad A useful place to keep bits of code
Statistics in Science SAS software programs
Statistics in Science SAS Programs data one; input x y; datalines; ; run; proc print data = one (obs = 5); run; proc means data = one; run; DATA step creates SAS data set PROC steps process data in data set
Statistics in Science SAS steps begin with a DATA statement PROC statement. SAS detects the end of a step when it encounters a RUN statement (for most steps) a QUIT statement (for some procedures) the beginning of another step (DATA statement or PROC statement). Recommendation: use RUN; at end of each step Step Boundaries
Statistics in Science data seedwt; input oz $ rad wt; datalines; Low High Low run; proc print data = two; proc means data = seedwt; class oz; var rad wt; run; Step Boundaries
Statistics in Science When you execute a SAS program, the output generated by SAS is divided into two major parts: SAS log contains information about the processing of the SAS program, including any warning and error messages. SAS output contains reports generated by SAS procedures and DATA steps. Submitting a SAS Program
Statistics in Science 1)Submit all (or selected) code by F4 Click on the runner in the toolbar 2)Read log 3)Look in output window if you expect code to produce output 4)Problems Bad syntax Missing ; at end of line Missing quote ’ at end of title (nasty!) Recommended steps!
Statistics in Science Improved output - HTML Tools Options Preferences Results Do this & resubmit code Check HTML output in Results Window
Statistics in Science SAS data sets
Statistics in Science SAS data sets SAS procedures ( PROC … ) process data from SAS data sets Need to know (briefly!) –What a SAS data set looks like –How to get out data into a SAS data set
Statistics in Science SAS data sets live in libraries have a descriptor part (with useful info) have a data part which is a rectangular table of character and/or numeric data values (rows called observations) have names with syntax datasetname libname defaults to work if omitted
Statistics in Science work library SAS data sets with a single part name like oz, wp or mybestdata99 1)are stored in the work library 2)can be referenced e.g. as mybestdata99 or work.mybestdata99 3)are deleted at end of SAS session!
Statistics in Science Don’t loose your data! Keep the SAS program that read the data from its original source... More later!
Statistics in Science Viewing descriptor & data /* view descriptor part */ proc contents data = wp; run; /* view data part */ proc print data = work.wp; run; Alternatively: Use SAS Explorer: Open (for data) Properties (for descriptor) Properties is not as clear as CONTENTS
Statistics in Science SAS variables There are two types of variables: charactercontain any value: letters, numbers, special characters, and blanks. Character values are stored with a length of 1 to 32,767 bytes (default is 8). One byte equals one character. numericstored as floating point numbers in 8 bytes of storage by default. Eight bytes of floating point storage provide space for 16 or 17 significant digits. You are not restricted to 8 digits. Don’t change the 8 byte length!
Statistics in Science SAS variables The CONTENTS Procedure Alphabetic List of Variables and Attributes # Variable Type Len 1 oz Char 8 2 rad Num 8 3 wt Num 8 OUTPUT
Statistics in Science SAS names – for data sets & variables can be 32 characters long. can be uppercase, lowercase, or mixed-case but are not case sensitive! must start with a letter or underscore. Subsequent characters can be letters, underscores, or numeric digits - no or spaces.
Statistics in Science LastName FirstName JobTitle Salary TORRES JAN Pilot LANGKAMM SARAH Mechanic SMITH MICHAEL Mechanic. WAGSCHAL NADJA Pilot TOERMOEN JOCHEN A value must exist for every variable for each observation. Missing values are valid values. A numeric missing value is displayed as a period. A character missing value is displayed as a blank. Missing Data Values
Statistics in Science SAS syntax Not case sensitive Each ‘line’ usually begins with keyword and ends with ; Common Errors: –Forget ; –Miss-spelt or wrong keyword –Missing final quote in title title ‘Woodpecker Habitat; /* quote mark missing */ title ‘Woodpecker Habitat’;
Statistics in Science Comments 1.Type /* to begin a comment. 2.Type your comment text. 3.Type */ to end the comment. To comment selected typed text remember: Ctrl+/ Alternative: * comment ;
Statistics in Science SAS Creating a SAS data set
Statistics in Science Getting data in! Consider 2 methods 1)Data in program (briefly!) 2)Data in Excel workbook
Statistics in Science Getting data in! Data in program file: data oz; input oz $ rad wt; datalines; Low High Low ; run; Note: 1.oz is text variable so requires $ 2.No missing values 3.Values of oz don’t contain spaces are at most 8 character long
Statistics in Science Getting data in! from Excel Use IMPORT wizard saving program to reduce future clicking!
Statistics in Science Creating new variables Adding a new variable to an existing SAS data set (say work.old) 1.Use set 2.Give definition of new variable data new; /* read data from work.old */ set old; y2 = y**2; ly = log(y); ly_base10 = log10(y); t1 = (treat = 1); run;
Statistics in Science Data set: work.new Obstreatyysquaredlogylogy_base10t1 1A A B B B
Statistics in Science Data Screening
Statistics in Science Data Screening checking input data for gross errors Use PRINT procedure to scan for obvious anomalies Use MEANS procedure & examine summary table –MAXIMUM, MINIMUM – reasonable? –MEAN - near middle of range? –MISSING VALUES - input or calculation error e.g. log(0)? –CV (= 100*std.dev/mean) - 50% implies skewness for any positive variable
Statistics in Science SAS syntax MEANS syntax What else should go here?
Statistics in Science Dealing with data errors Check original records Change mistakes in recording where the correct value is beyond question Regenerate observations where possible – e.g. reweigh sample, redo chemical analysis With a large body of data in an unbalanced design err on the side of omitting questionable data Do not proceed until data has been properly cleaned – if necessary perform a number of screening runs