Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 4: Sorting, Printing, Summarizing

Similar presentations


Presentation on theme: "Chapter 4: Sorting, Printing, Summarizing"— Presentation transcript:

1 Chapter 4: Sorting, Printing, Summarizing
PROC statements have required statements and optional statements The DATA statement is optional; it specifies which SAS data set to use. The default is to use the most recently created data set. PROC .. DATA=..; We skim PROC MEANS and PROC FREQ; we may return to PROC TABULATE and PROC REPORT at the end of class. © Fall 2011 John Grego and the University of South Carolina

2 BY Variable BY varname.. is a common option
It tells SAS to do analyses separately for each value of the specified variable In PROC SORT, the BY statement is required You must sort the data before using the BY option in other procedures Run Artists.sas

3 TITLE and FOOTNOTE TITLE (and FOOTNOTE) print text at the top (and bottom, respectively) of the output pages Use double quotes when a single quote appears in the TITLE statement TITLE ‘Some Statistical Output’; TITLE2 “This is the Stat Dept’s Output”; FOOTNOTE ‘See page 3 of SAS manual’; TITLE should become a habit

4 Null TITLE The null statement TITLE; cancels all previous titles (same with footnotes) I prefer annotated source code for documentation, but TITLE is good for documenting the action of individual PROC paragraphs It’s a good practice to follow every TITLE “titlename” command with TITLE;

5 LABEL Statement The LABEL option allows you to improve output variable names. In the DATA step, a label is a part of the data set. In the PROC step, it is only in effect for that procedure

6 Labels in PROC PRINT It must be accompanied by a separate statement in the PROC step itself PROC PRINT DATA=dataset LABEL; LABEL var1=“var1 label”;

7 Subsetting with WHERE We’ve seen how to extract subsets of data in the data step using IF statements You can have procedures analyze only a subset of the dating using WHERE..logical condition; A new SAS data set is not created WHERE works with any SAS procedure Note BETWEEN/AND operator. WHERE is useful, though it took a while to become part of my regular toolbox. It’s more efficient than a subsetting IF, though IF is more versatile

8 Sorting Data PROC SORT creates a sorted data set that puts observations in order according to the values of one or more variables Also think of it as organizing the data in groups, rather than sorting data. It’s a necessary “evil” before processing data group by group Run first part of meandemo.sas

9 Saving sorted data The sorted data set is stored with a name specified in the OUT option If OUT is not specified, the original data set is overwritten with the sorted data set The required statement BY..; tells SAS which variable(s) to use in sorting the data

10 Sorting on multiple variables
With one BY variable, the data are sorted by the values of that variable With more than one BY variable, observations are sorted by the first variable, then the second variable within the first, etc

11 Sorting options By default, this sorts in ascending order.
To sort in descending order, enter the keyword DESCENDING before the appropriate BY variable. Missing values are considered “low” You can eliminate duplicate observations in the BY variables with the NODUPKEY option. This can be useful for procedures that have difficulty with “ties” Run EdistoArima.sas, Lidar.sas

12 Sorting Character Data
Dissimilar character data (numeric, case) can be sorted in ASCII, EBCDIC and linguistic order using SORTSEQ= Numeric characters can be ordered as numbers (1,2,10,20) rather than characters (1,10,2,20) using (NUMERIC_COLLATION=ON) in linguistic sorting. ASS-KEY, EB-SEE-DIK. Linguistic order removes case—I didn’t need (STRENGTH=PRIMARY). Run char_order.sas

13 Printing Data with PROC PRINT
PROC PRINT prints data sets to the output window. We’ve already seen PROC PRINT many times, but it has some useful options NOOBS suppresses printing of observation numbers LABEL prints labels (if defined) instead of variable names These appear in the PROC PRINT statement Covered already in artists.sas

14 PROC PRINT options Various optional statements can follow the initial line BY varname prints data separately for each value of a specified variable—data need to be sorted by that variable first ID var1 var2 puts var1, var2, etc at left of page instead of observation numbers

15 More PROC PRINT options
(OBS=n) appears as an argument in the DATA option for the main PROC PRINT statement. It prints the first n variables in a data set. Superceded somewhat by the ability to inspect a data sheet, but still good for detecting anomalies in variable names VAR var1 var2 var3.. tells SAS which variables to print and in what order (by default, all variables are printed) (OBS=n) is really useful, though the syntax is hard to remember. Plot17lidar.txt is comma-delimited

16 FORMAT Statement You can specify the format in which your data is printed with a FORMAT statement in PROC PRINT We’ve seen several formats for dates. Monetary values can be written with dollar signs: DOLLAR8. DOLLAR8.2 Run pact.sas We’ve seen some of this already with date.sas

17 FORMAT statement options
Scientific notation: E8. Other formats are given on pp Used in the DATA step. FORMAT sets the format permanently Used in a PROC step, FORMAT only works for that procedure

18 PUT statement Use of the PUT statement is similar to PROC PRINT, but it is used for writing data to a raw data file With PROC FORMAT, you can create your own formats—good for printing coded data in a way that’s easy to read. Saw PUT already in date.sas. Show part of Assoc.sas (PROC FREQ code). Class Exercise 8.

19 PROC FORMAT PROC FORMAT is particularly useful for output from procedures that rely heavily on nominal or ordinal data (e.g., PROC FREQ). SPSS-X is the gold standard for labeling formats. Specify the name of your created format after the keyword VALUE Run freqgroup.sas to see how to “convert” a numerical range to factor levels

20 Simple Custom Data sets with FILE and PUT
FILE is the reverse of INFILE: it prints to an external file PUT (opposite of INPUT) specifies exactly what is printed to that file PUT _PAGE_; skips to the next page after each report FILE ‘filename’ PRINT; PUT .. ..; Covered in Class Exercise 8. I’ve done this already in date.sas. You also have the option of exporting to Excel instead. The PRINT option allows you to include carriage returns and page breaks.

21 PUT options Spacing options for PUT are the same as those for INPUT
@n—moves to position n +n—skips n spaces / -- skips to next line #n—skips to nth line @--holds current line

22 Printing data to the Log window
PUT can put variable values and text in specified locations. The format is specified the same way as with the FORMAT statement If no FILE statement is given, the report is printed to the Log window

23 Summary Statistics with PROC MEANS
PROC MEANS can give a variety of summary statistics for each variable. By default, SAS computes n, mean, stddev, min, max You can specify others: median, nmiss, range, sum, etc (see list on page 120) Show meandemo.sas (includes sort)

24 PROC MEANS options Optional statements
BY calculates summary statistics for each level of the BY variable—data must be sorted first CLASS is similar to BY, but the data does not need to be sorted VAR tells SAS specifically for which variables to calculate summary statistics (the default is to use all numeric variables The CLASS statement is under-used.

25 Outputting Summary Statistics
You can write the summary statistics to another SAS data set using an OUTPUT statement This can be a first step in merging the summary data set with the original data set OUTPUT OUT=OUTA MEAN=..; OUTPUT OUT=OUTA MEAN(varlist)= new_var1 new_var2…;

26 Counting Data with PROC FREQ
PROC FREQ provides one-way, two-way (or more) frequency tables for data with counts and percentages Generally used with categorical variables PROC FREQ DATA=..; TABLES var1; TABLES var1*var2; TABLES var1*var2*var3;

27 PROC FREQ options Options specified after slash in TABLES statement (see page 124 for options) Use MISSING to include missing values as a category in your table TABLES var1*var2 / MISSING; Finish assoc.sas. Run freqgroup.sas (convert continuous data to factors)

28 PROC TABULATE PROC TABULATE is similar to PROC FREQ, but produces cleaner-looking output CLASS specifies classification variables TABLES specifies desired type of table PROC TABULATE; CLASS ..; TABLE ..; Like Pivot Table in Excel. Run simple.sas (ignore error message). Don Edwards borrowed from text, but tweaked margins and format. Work is needed to clean up row and column headers.

29 PROC REPORT PROC REPORT has even more functionality than PROC TABULATE
Exploring either of these procedures would make a useful class project


Download ppt "Chapter 4: Sorting, Printing, Summarizing"

Similar presentations


Ads by Google