Producing Descriptive Statistics

Slides:



Advertisements
Similar presentations
I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Advertisements

Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Processing and Fundamental Data Analysis CHAPTER fourteen.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Processing and Fundamental Data Analysis CHAPTER fourteen.
Introduction to SPSS Allen Risley Academic Technology Services, CSUSM
Quick Data Summaries in SAS Start by bringing in data –Use permanent data set for these examples Proc Tabulate –Produces summaries very quickly and easily.
15b. Accessing Data: Frequencies in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Quantifying Data.
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Chapter 8 Producing Summary Reports. Section 8.1 Introduction to Summary Reports.
SAS PROC REPORT PROC TABULATE
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
USING SAS PROCEDURES SAS System Options OPTIONS Statement
XP 1 Excel Tables Purpose of tables – Process data in a group – Used to facilitate calculations – Used to enhance readability of output Types of tables.
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Lecture 3 Describing Data Using Numerical Measures.
What is SPSS  SPSS is a program software used for statistical analysis.  Statistical Package for Social Sciences.
1 Filling in the blanks with PROC FREQ Bill Klein Ryerson University.
T T03-01 Calculate Descriptive Statistics Purpose Allows the analyst to analyze quantitative data by summarizing it in sorted format, scattergram.
XP. Objectives Sort data and filter data Summarize an Excel table Insert subtotals into a range of data Outline buttons to show or hide details Create.
June 21, Objectives  Enable the Data Analysis Add-In  Quickly calculate descriptive statistics using the Data Analysis Add-In  Create a histogram.
Priya Ramaswami Janssen R&D US. Advantages of PROC REPORT -Very powerful -Perform lists, subsets, statistics, computations, formatting within one procedure.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
1 EPIB 698C Lecture 4 Raul Cruz-Cano Summer 2012.
Statistics 1: Introduction to Probability and Statistics Section 3-2.
Lesson 8 - Topics Creating SAS datasets from procedures Using ODS and data steps to make reports Using PROC RANK Programs in course notes LSB 4:11;5:3.
DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
Descriptive Statistics Research Writing Aiden Yeh, PhD.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
If sig is less than 0.05 (A) then the test is significant at 95% confidence (B) then the test is significant at 90% confidence (C) then the test is significant.
EXCEL CHAPTER 6 ANALYZING DATA STATISTICALLY. Analyzing Data Statistically Data Characteristics Histograms Cumulative Distributions Classwork: 6.1, 6.6,
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Session 1 Retrieving Data From a Single Table
Introduction to Spreadsheets
Applied Business Forecasting and Regression Analysis
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Chapter 3 Describing Data Using Numerical Measures
Introduction to SPSS.
Lesson 3 Overview Descriptive Procedures Controlling SAS Output
Chapter 5: Aggregate Functions and Grouping of Data
Microsoft Office Illustrated
Lecture 2 Topics - Descriptive Procedures
Performing What-if Analysis
Chapter 3 Describing Data Using Numerical Measures
Instructor: Raul Cruz-Cano
Lesson 8 - Topics Creating SAS datasets from procedures
Tamara Arenovich Tony Panzarella
Chapter 4: Sorting, Printing, Summarizing
Chapter 2 Presenting Data in Tables and Charts
Quick Data Summaries in SAS
Contingency Tables and Association
Relations in Categorical Data
Lecture 2 Topics - Descriptive Procedures
Organizing and Visualizing Data
Statistics 1: Introduction to Probability and Statistics
Descriptive Analysis and Presentation of Bivariate Data
Introduction to SAS Essentials Mastering SAS for Data Analytics
Quantitative Data Who? Cans of cola. What? Weight (g) of contents.
A Brief Introduction to Stata(2)
Introduction to SAS Essentials Mastering SAS for Data Analytics
Excel for Educators Class Notes
Lecture 2 Topics - Descriptive Procedures
Presentation transcript:

Producing Descriptive Statistics Statistical Analysis System

Producing Descriptive Statistics Introduction Computing Statistics Using PROC MEANS Creating a Summarized Data Set Using PROC SUMMARY Producing Frequency Tables Using PROC FREQ

Introduction PROC REPORT is the ability to summarize large amounts of data by producing descriptive statistics. However, there are SAS procedures that are designed specifically to produce various types of descriptive statistics and to display them in meaningful reports If the data values that you want to describe are continuous numeric values , then you can use the MEANS procedure or the SUMMARY procedure to calculate statistics such as the mean, sum, minimum, and maximum. If the data values that you want to describe are discrete then you can use the FREQ procedure to show the distribution of these values.

Computing Statistics Using PROC MEANS(Page 1) The MEANS procedure provides descriptive statistics such as the mean, minimum, and maximum provide useful information about numeric data. Procedure Syntax PROC MEANS <DATA=SAS-data-set> <statistic-keyword(s)> <option(s)>; RUN; Where SAS-data-set is the name of the data set to be used statistic-keyword(s) specify the statistics to compute option(s) control the content, analysis, and appearance Example proc means data=perm.survey; run;

Computing Statistics Using PROC MEANS(Page 2) Selecting Statistics Consider that you want to see the median and range of Perm. Survey numeric values, add the MEDIAN and RANGE keywords as options. Example proc means data=perm.survey median range; run; The following keywords can be used with PROC MEANS to compute statistics: Keyword Description MAX Maximum value MEAN Average MODE Value that occurs most frequently MIN Minimum value VAR Variance

Computing Statistics Using PROC MEANS(Page 3) Limiting Decimal Places To limit decimal places, use the MAXDEC= option in the PROC MEANS statement, and set it equal to the length that you prefer. Syntax PROC MEANS <DATA=SAS-data-set> <statistic-keyword(s)> MAXDEC=n; where n specifies the maximum number of decimal places. Example proc means data=clinic.diabetes min max maxdec=0; run;

Computing Statistics Using PROC MEANS(Page 4) Specifying Variables in PROC MEANS To specify the variables that PROC MEANS analyzes, add a VAR statement and list the variable names. General form, VAR statement: VAR variable(s); Example proc means data=clinic.diabetes min max maxdec=0; var age height weight; run; In addition to listing variables separately, you can use a numbered range of variables var item1-item5;

Computing Statistics Using PROC MEANS(Page 5) Group Processing Using the CLASS Statement To produce separate analyses of grouped observations, add a CLASS statement to the MEANS procedure. General form, CLASS statement: CLASS variable(s); where variable(s) specifies category variables for group processing. CLASS variables can be either character or numeric, but they should contain a limited number of discrete values that represent meaningful groupings. Example proc means data=clinic.heart maxdec=1; var arterial heart cardiac urinary; class survive sex; run;

Computing Statistics Using PROC MEANS(Page 6) Group Processing Using the BY Statement Like the CLASS statement, the BY statement specifies variables to use for categorizing observations. General form, BY statement: BY variable(s); Difference between BY and CLASS Unlike CLASS processing, BY processing requires that your data already be sorted or indexed in the order of the BY variables. BY group results have a layout that is different from the layout of CLASS group results. Note that the BY statement in the program below creates four small tables; a CLASS statement would produce a single large table.

Computing Statistics Using PROC MEANS(Page 7) Example for BY statement. proc sort data=clinic.heart out=work.heartsort; by survive sex; run; proc means data=work.heartsort maxdec=1; var arterial heart cardiac urinary;

Computing Statistics Using PROC MEANS(Page 8) Creating a Summarized Data Set Using PROC MEANS You might want to create an output SAS data set that contains just the summarized variable. General form, OUTPUT statement: OUTPUT OUT=SAS-data-set statistic=variable(s); where OUT= specifies the name of the output data set statistic= specifies the summary statistic written out

Computing Statistics Using PROC MEANS(Page 9) Specifying the STATISTIC= Option You can specify which statistics to produce in the output data set. To do so, you must specify the statistic and then list all of the variables. The variables must be listed in the same order as in the VAR statement. You can specify more than one statistic in the OUTPUT statement. proc means data=clinic.diabetes; var age height weight; class sex; output out=work.sum_gender mean=AvgAge AvgHeight AvgWeight min=MinAge MinHeight MinWeight; run; To see the contents of the output data set, submit the following PROC PRINT step. PROC MEANS in SAS LIST PROC MEANS OUTPUT TO SAS DATASET

Computing Statistics Using PROC MEANS(Page 10) Creating only the output data set You can use the NOPRINT option in the PROC MEANS statement to prevent the default report from being created. For example, the following program creates only the output data set: Example proc means data=clinic.diabetes noprint; var age height weight; class sex; output out=work.sum_gender mean=AvgAge AvgHeight AvgWeight; run;

Creating a Summarized Data Set Using PROC SUMMARY You can also create a summarized output data set by using PROC SUMMARY. The difference between the two procedures is that PROC MEANS produces a report by default. By contrast, to produce a report in PROC SUMMARY, you must include a PRINT option in the PROC SUMMARY statement. Example proc summary data=clinic.diabetes print; var age height weight; class sex; output out=work.sum_gender mean=AvgAge AvgHeight AvgWeight; run;

Producing Frequency Tables Using PROC FREQ (Page 1) The FREQ procedure is a descriptive procedure as well as a statistical procedure. It produces one-way and n-way frequency tables,. You can use the FREQ procedure to create cross-tabulation tables that summarize data for two or more categorical variables by showing the number of observations for each combination of variable values. General form, basic FREQ procedure: PROC FREQ <DATA=SAS-data-set>; RUN; By default, PROC FREQ creates a one-way table with the frequency, percent, cumulative frequency, and cumulative percent of every value of all variables in a data set.

Producing Frequency Tables Using PROC FREQ (Page 2) Example For example, the following FREQ procedure creates a frequency table for each variable in the data set Parts. Widgets. All the unique values are shown for ItemName, LotSize, and Region. proc freq data=parts.widgets; run;

Producing Frequency Tables Using PROC FREQ (Page 3) Specifying Variables in PROC FREQ By default, the FREQ procedure creates frequency tables for every variable in your data set. To specify the variables to be processed by the FREQ procedure, include a TABLES statement. Syntax TABLES variable(s); where variable(s) lists the variables to include.

Producing Frequency Tables Using PROC FREQ (Page 4) Example Consider the SAS data set Finance.Loans. The variables Rate and Months are best described as categorical values, so they are the best choices for frequency tables. proc freq data=finance.loans; tables rate months; run;

Producing Frequency Tables Using PROC FREQ (Page 5) Creating Two-Way Tables It is often helpful to crosstabulate frequencies with the values of other variables. For example, census data is typically crosstabulated with a variable that represents geographical regions. Syntax TABLES variable-1 *variable-2 <* ... variable-n>; where (for two-way tables) variable-1 specifies table rows and variable-2 specifies table columns. When crosstabulations are specified, PROC FREQ produces tables with cells that contain column cell frequency cell percentage of total frequency cell percentage of row frequency cell percentage of frequency.

Producing Frequency Tables Using PROC FREQ (Page 6) For example, the following program creates the two-way table shown below. proc format; value wtfmt low-139='< 140' 140-180='140-180‘ 181-high='> 180'; value htfmt low-64='< 5''5"' 65-70='5''5-10"' 71-high='> 5''10"'; run; proc freq data=clinic.diabetes; tables weight*height; format weight wtfmt. height htfmt.; Tables LEGEND BOX

Producing Frequency Tables Using PROC FREQ (Page 7) Creating N-Way Tables For a frequency analysis of more than two variables, use PROC FREQ to create n-way crosstabulations. A series of two-way tables is produced, with a table for each level of the other variables. For example, suppose you want to add the variable Sex to your crosstabulation of Weight and Height in the data set Clinic.Diabetes. Add Sex to the TABLES statement, joined to the other variables with an asterisk (*). Example tables sex*weight*height; The order of the variables is important. In n-way tables, the last two variables of the TABLES statement become the two-way rows and columns.

Producing Frequency Tables Using PROC FREQ (Page 8) Example proc format; value wtfmt low-139='< 140‘ 140-180='140-180‘ 181-high='> 180'; value htfmt low-64='< 5''5"' 65-70='5''5-10"‘ 71-high='> 5''10"'; run; proc freq data=clinic.diabetes; tables sex*weight*height; format weight wtfmt. height htfmt.;

Producing Frequency Tables Using PROC FREQ (Page 9) Changing the Table Format CROSSLIST option to your TABLES statement displays cross-tabulation tables proc format; value wtfmt low-139='< 140‘ 140-180='140-180‘ 181-high='> 180'; value htfmt low-64='< 5''5"' 65-70='5''5-10"‘ 71-high='> 5''10"'; run; proc freq data=clinic.diabetes; tables sex*weight*height/crosslist; format weight wtfmt. height htfmt.;

Producing Frequency Tables Using PROC FREQ (Page 10) Creating Tables When three or more variables are specified, the multiple levels of n-way tables can produce considerable output. Such bulky, often complex crosstabulations are often easier to read as a continuous list in List Format Syntax TABLES variable-1 *variable-2 <* … variable-n> / LIST; Example proc format; value wtfmt low-139='< 140' ….; run; proc freq data=clinic.diabetes; tables sex*weight*height / list; format weight wtfmt. height htfmt.;

Producing Frequency Tables Using PROC FREQ (Page 11) Suppressing Table Information NOFREQ suppresses cell frequencies NOPERCENT suppresses cell percentages NOROW supresses row percentages NOCOL suppresses column percentages. Example proc freq data=clinic.diabetes; tables sex*weight / nofreq norow nocol; format weight wtfmt.; run;