Presentation is loading. Please wait.

Presentation is loading. Please wait.

Public Health 5415 Biostatistical Methods II Spring 2005 Greg Grandits 612-626-9033 Class Times Monday10:10am-12:05pm Wednesday10:10am-11:00am.

Similar presentations


Presentation on theme: "Public Health 5415 Biostatistical Methods II Spring 2005 Greg Grandits 612-626-9033 Class Times Monday10:10am-12:05pm Wednesday10:10am-11:00am."— Presentation transcript:

1 Public Health 5415 Biostatistical Methods II Spring 2005 Greg Grandits grand001@.umn.edu. 612-626-9033 Class Times Monday10:10am-12:05pm Wednesday10:10am-11:00am

2 Course objectives: Write and run simple SAS programs to perform common analyses. Analyze health science data using basic statistical and inferential techniques. Understand statistical methods as commonly presented in public health literature

3 Topics Covered Linear regression Logistic regression Life-table analyses Cox regression Relative risk, odds ratio, hazard ratio estimation SAS programming to do above analyses

4 SAS Usage SAS is the worlds largest privately held software company 40,000 customer sites worldwide 3.5 million users worldwide 90% of Fortune 500 companies use SAS Nearly all analyses of publications in medical research use SAS SAS invests extensive resources to R & D.

5 Why SAS? It is widely used –Industry, government, and academia It is very powerful –programming language –sophisticated analyses (better than Excel)

6 JAMA January 12, 2005 Meat Consumption and Risk of Colorectal Cancer, Chao Colon and rectal cancer incidence rate ratios (RRs) and 95% CIs by meat intake were estimated using Cox proportional hazards regression modeling. P values for linear trend were estimated by modeling meat intake (g/wk) using the median value within quintiles; these results were similar when modeled as continuous variables. All P values were 2-sided and considered significant at P<.05. All analyses were conducted using SAS version 9.0 (SAS Institute Inc, Cary, NC). Consumption of Veg/Fruits and Risk of Breast Cancer All analyses were performed using SAS version 8 (SAS Institute Inc, Cary, NC). All tests were 2-sided with an {alpha} of.05.

7 JAMA January 12, 2005 Fasting Serum Glucose Level and Cancer Risk in Korean Men and Women Age-adjusted death and cancer incidence rates were calculated for each category of fasting serum glucose level and directly standardized to the age distribution of the 1995 Korean national population. All analyses were stratified by sex. All analyses were conducted using SAS statistical software, version 8.0 (SAS Institute Inc, Cary, NC).

8 Details http://www.biostat.umn.edu/~greg-g/ph5415.html –Homework, readings, programs, data files –Class slides Lab/Office hours 4 hours per week (TA or instructor)

9 Details Text books: Applied Statistics and the SAS Programming Language, RP Cody and JK Smith (Read Chapter 1 for next week) Introductory Biostatistics, CT Le The Little SAS Book, LD Delwiche and SJ Slaughter (Chapter 1 available on website)

10 Grading Homework - 30% (half credit for late homework, can turn in no later than 2 weeks after due date) Two tests - 30% each Short project - 10% No final exam

11 Using SAS SAS is available several ways: In the Mayo A-269 (TRC) lab Other PCs with SAS From biostatistics UNIX computer via telnet Purchase from the University 152 Shepherd Labs (ADCS) 612-625-1300 $150 per year

12 What is SAS ? SAS is a programming language that reads, processes, and performs statistical analyses of data. A SAS program is made up of programming statements which SAS interprets to do the above functions.

13 Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze Data Using Statistical Procedures Data Step PROCs

14 Structure of Data Made up of rows and columns Rows in SAS are called observations Columns in SAS are called variables An observation is all the information for one entity (patient, patient visit, clinical center, county) SAS processes data one observation at a time

15 Example of Data 12 observations and 5 variables F 23 S 15 MN F 21 S 15 WI F 22 S 09 MN F 35 M 02 MN F 22 M 13 MN F 25 S 13 WI M 20 S 13 MN M 26 M 15 WI M 27 S 05 MN M 23 S 14 IA M 21 S 14 MN M 29 M 15 MN Gender Age Marital status Number of credits State of residence

16 * This is a short example program to demonstrate what a SAS program looks like. This is a comment statement because it begins with a * and ends with a semi-colon ; DATA demo; INPUT gender $ age marstat $ credits state $ ; if credits > 12 then fulltime = 'Y'; else fulltime = 'N'; if state = 'MN' then resid = 'Y'; else resid = 'N'; DATALINES; F 23 S 15 MN F 21 S 15 WI F 22 S 09 MN F 35 M 02 MN F 22 M 13 MN F 25 S 13 WI M 20 S 13 MN M 26 M 15 WI M 27 S 05 MN M 23 S 14 IA M 21 S 14 MN M 29 M 15 MN ; RUN; TITLE 'Running the Example Program'; PROC PRINT DATA=DEMO ; VAR gender age marstat credits fulltime state ; RUN;

17 Rules for SAS Statements and Variables SAS statements end with a semicolon (;) SAS statements can be entered in lower or uppercase Multiple SAS statements can appear on one line A SAS statement can use multiple lines Variable names can be from 1-32 characters and begin with A-Z or an underscore (_)

18 1 DATA demo; Create a SAS dataset called demo 2 INPUT gender $ What are the variables age marstat $ credits state $ ; 3 if credits > 12 then fulltime = 'Y'; else fulltime = 'N'; 4 if state = 'MN' then resid = 'Y'; else resid = 'N'; Statements 3 and 4 create 2 new variables

19 5 DATALINES; Tells SAS the data is coming F 23 S 15 MN F 21 S 15 WI F 22 S 09 MN F 35 M 02 MN F 22 M 13 MN F 25 S 13 WI M 20 S 13 MN M 26 M 15 WI M 27 S 05 MN M 23 S 14 IA M 21 S 14 MN M 29 M 15 MN ; Tells SAS the data is ending 6 RUN; Tells SAS to run the statements

20 Types of Data Numeric (e.g. age, blood pressure) Character (patient name, ID, diagnosis) Each type treated differently by SAS

21 TITLE 'Running the Example Program'; PROC PRINT DATA=demo ; VAR gender age marstat credits fulltime state ; RUN; * You can run additional procedures; PROC MEANS DATA=demo ; VAR age credits ; RUN; PROC FREQ DATA=demo ; TABLES gender ; RUN;

22 Files Generated When SAS Program is Submitted Log file – a text file listing program statements processed and giving notes, warnings and errors. (in UNIX the file will be named fname.log) Always look at the log file ! Tells how SAS understood your program Output file – a text file giving the output generated from the PROCs (in UNIX the file will be named fname.lst)

23 Messages in SAS Log Notes – messages that may or may not be important Warnings – messages that are usually important Errors – fatal in that program will abort (notes and warnings will not abort your program)

24 LOG FILE NOTE: Copyright (c) 1999-2001 by SAS Institute Inc., Cary, NC, USA. NOTE: SAS (r) Proprietary Software Release 8.2 (TS2M0) Licensed to UNIVERSITY OF MINNESOTA, Site 0009012001. NOTE: This session is executing on the WIN_NT platform. NOTE: SAS initialization used: real time 7.51 seconds cpu time 0.89 seconds 1 * This is a short example program to demonstrate what a 2 SAS program looks like. This is a comment statement because 3 it begins with a * and ends with a semi-colon ; 4 5 DATA demo; 6 INFILE DATALINES; 7 INPUT gender $ age marstat $ credits state $ ; 8 9 if credits > 12 then fulltime = 'Y'; else fulltime = 'N'; 10 if state = 'MN' then resid = 'Y'; else resid = 'N'; 11 DATALINES; NOTE: The data set WORK.DEMO has 12 observations and 7 variables. NOTE: DATA statement used: real time 0.38 seconds cpu time 0.06 seconds

25 25 RUN; 26 TITLE 'Running the Example Program'; 27 PROC PRINT DATA=demo ; 28 VAR gender age marstat credits fulltime state ; 29 RUN; NOTE: There were 12 observations read from the data set WORK.DEMO. NOTE: PROCEDURE PRINT used: real time 0.19 seconds cpu time 0.02 seconds 30 PROC MEANS DATA=demo N SUM MEAN; 31 VAR age credits ; 32 RUN; NOTE: There were 12 observations read from the data set WORK.DEMO. NOTE: PROCEDURE MEANS used: real time 0.25 seconds cpu time 0.03 seconds 33 PROC FREQ DATA=demo; TABLES gender; 34 RUN; NOTE: There were 12 observations read from the data set WORK.DEMO. NOTE: PROCEDURE FREQ used: real time 0.15 seconds cpu time 0.03 seconds

26 LST FILE Running the Example Program Obs gender age marstat credits fulltime state 1 F 23 S 15 Y MN 2 F 21 S 15 Y WI 3 F 22 S 9 N MN 4 F 35 M 2 N MN 5 F 22 M 13 Y MN 6 F 25 S 13 Y WI 7 M 20 S 13 Y MN 8 M 26 M 15 Y WI 9 M 27 S 5 N MN 10 M 23 S 14 Y IA 11 M 21 S 14 Y MN 12 M 29 M 15 Y MN The MEANS Procedure Variable N Sum Mean ---------------------------------------------- age 12 294.0000000 24.5000000 credits 12 143.0000000 11.9166667 ----------------------------------------------- The FREQ Procedure Cumulative Cumulative gender Frequency Percent Frequency Percent ----------------------------------------------------------- F 6 50.00 6 50.00 M 6 50.00 12 100.0


Download ppt "Public Health 5415 Biostatistical Methods II Spring 2005 Greg Grandits 612-626-9033 Class Times Monday10:10am-12:05pm Wednesday10:10am-11:00am."

Similar presentations


Ads by Google