# Statistics for clinicians l Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

## Presentation on theme: "Statistics for clinicians l Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,"— Presentation transcript:

Statistics for clinicians l Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida, College of Nursing Professor, College of Public Health Department of Epidemiology and Biostatistics Associate Member, Byrd Alzheimer’s Institute Morsani College of Medicine Tampa, FL, USA 1

SECTION 1.1 Module Overview and Introduction Introduction to biostatistics, descriptive statistics, SPSS, and Power Point.

SECTION 1.4 Introduction to SPSS

Introduction to SPSS Database structure Data view and variable view Variable names, labels, and formats Interactive menus SPSS syntax generated from interactive analyses

SECTION 1.5 Summarizing Data in Charts

Summarizing Data – Charts 1.One categorical, >1 proportion/percentage (i)Bar chart (ii)Stacked bar chart (iii)Stacked bar chart (100%) 2.One categorical, >1 continuous variable (i)Box plot (ii)High-low (iii)Line (iv)Kernel-density plots 3.Two continuous variables (i)X-Y scatter (ii)Histogram (can be used for 1 variable)

1.One categorical, >1 proportion/percentage (i)Bar chart  Rectangular bars with lengths proportional to the values that they represent.  Bars can be plotted vertically or horizontally.

1.One categorical, >1 proportion/percentage (ii)Stacked bar chart  Can be counts or percentages.  Do not sum to a specified value % Obese Age Group

1.One categorical, >1 proportion/percentage (iii)Stacked bar chart (100%) Bar Charts and Stacked Bar Charts Important to select either row versus column percentages Example:Race and blood pressure classification Usually, the row variable is the “predictor”, and the column variable is the “outcome”. SPSS: Analyze Descriptive statistics Crosstabs

Bar Charts and Stacked Bar Charts Column Percentage: SPSS-CROSSTABS /TABLES=SCR_RACECAT3 BY SCR_BP_CLASS4 /FORMAT=AVALUE TABLES /CELLS=COUNT COLUMN /COUNT ROUND CELL /BARCHART. Race * BP classification Crosstabulation BP classification Total NormalPrehypertensive Hypertensive Stage 1 Hypertensive Stage 2 RaceWhiteCount247397294951033 % within BP classification 65.2%58.3%49.8%38.0%54.4% BlackCount117262275149803 % within BP classification 30.9%38.5%46.6%59.6%42.3% OtherCount152221664 % within BP classification 4.0%3.2%3.6%2.4%3.4% TotalCount3796815902501900 % within BP classification 100.0%

Difficult to identify trends

Bar Charts and Stacked Bar Charts Row Percentage: SPSS-CROSSTABS /TABLES=SCR_RACECAT3 BY SCR_BP_CLASS4 /FORMAT=AVALUE TABLES /CELLS=COUNT ROW /COUNT ROUND CELL /BARCHART. Use row percentages in stacked bar chart (PP)

Power Point Chart Column 100% Stacked Column

Power Point Chart (Practice) Column - 100% Stacked Column Display Quality of Life from Poor to Excellent by Gender Column Percentages for QOL Row Percentages for QOL

Power Point Chart Column 100% Stacked Column

Power Point Chart Column 100% Stacked Column

2.One categorical, >1 continuous variable (i)Box plot  Also known as box-and-whisker diagram.  Displays 5 summary statistics: minimum, lower quartile (Q1), median (Q2), upper quartile (Q3), and maximum  No assumptions on underlying statistical distribution – non-parametric SPSS: Graphs Chart Builder Boxplot Example: HDL Cholesterol (continuous) distribution by gender (categorical)

2.One categorical, >1 continuous variable (i)Box plot Question: Are HDL cholesterol levels positively or negative skewed? Run SPSS frequencies procedure

2.One categorical, >1 continuous variable (i)Box plot Question: Are triglycerides positively or negative skewed? Run SPSS frequencies procedure

2.One categorical, >1 continuous variable (i)Box plot (Practice) Draw a box plot of the distribution of HDL cholesterol by ethnicity: Hispanic: Min=30, Q1=40, Q2=46, Q3=56, Max=86 Non-Hispanic:Min=21, Q1=46, Q2=56, Q3=66, Max=131 Example:

2.One categorical, >1 continuous variable (i)Box plot (Practice) Draw a box plot of the distribution of HDL cholesterol by ethnicity: Hispanic: Min=30, Q1=40, Q2=46, Q3=56, Max=86 Non-Hispanic:Min=21, Q1=46, Q2=56, Q3=66, Max=131

2.One categorical, >1 continuous variable (ii)High-low  Can “trick” Power Point to use open-high-low-close chart (i.e. used for financials) to show distributions of continuous variables  Upper and lower ends (high-low) can represent any percentiles, such as 5 th/ 95 th percentiles

White Self-Report Black WhiteBlack Admixture Defined EU>85%EU>40% EU>25% EU<40%EU<25% Total Cholesterol (mg/dl) N (753) (464) (753) (68) (201) (195) P=0.003P trend =0.009 The filled rectangles depict the interquartile range (25 th and 75 th percentile). The lower and upper limits of the vertical lines depict the 5 th and 95 th percentiles, respectively.

Total Cholesterol (mg/dl) N=594N=546N=80N=111 U.S. Black vs. Ghana Urban: P=0.0001 U.S. Black vs. Ghana Rural: P<0.0001 Ghana Urban vs. Ghana Rural: P<0.0001 The filled rectangles depict the interquartile range (25 th and 75 th percentile). The lower and upper limits of the vertical lines depict the 5 th and 95 th percentiles, respectively.

Total Cholesterol: (Practice in Power Point – first draw by hand) (mg/dl) The filled rectangles depict the interquartile range (25 th and 75 th percentile). The lower and upper limits of the vertical lines depict the 5 th and 95 th percentiles, respectively. 5%25%75%95% Male137175224271 Female153190245295

Total Cholesterol: (Practice in Power Point) (mg/dl) The filled rectangles depict the interquartile range (25 th and 75 th percentile). The lower and upper limits of the vertical lines depict the 5 th and 95 th percentiles, respectively. 5%25%75%95%“Trick” Power Point Male137175224271OpenHighLowClose Female153190245295 25%95% 5%75%

2.One categorical, >1 continuous variable (iii)Line chart  Typically represents trend in data over intervals of time (i.e. time series)  Often used to show repeated health outcome measurements over time. Prevalence of Use (%)” Crohn’s Disease Medications

In this example, the “categorical” variable is individual subject nested within each treatment arm of the trial

2. One categorical, >1 continuous variable (iv)Kernel density plots  Like a histogram, but constructs a “smooth” probability density function

3.Two continuous variables (i)X-Y scatter Body Density Body Mass Index  Shows the relationship between two sets of continuous data  Also called a scatter chart, scattergram, scatter diagram or scatter graph.

3.Two continuous variables (ii)Histogram(s)  Probability distribution of a continuous variable(s) displayed over discrete intervals (bins)  The bins contain frequency counts, or can be normalized to display relative frequencies (i.e. proportion of cases that fall into each category (bin) with total area = 1.0) # subjects

3.Two continuous variables (ii)Histogram(s)  Probability distribution of a continuous variable(s) displayed over discrete intervals (bins)  The bins contain frequency counts, or can be normalized to display relative frequencies (i.e. proportion of cases that fall into each category (bin) with total area = 1.0)

SECTION 1.6 SPSS Data Manipulation

SPSS Data Manipulation and Syntax Editor 1.Recode continuous variable into arbitrarily- defined or pre-defined categories 2.Visual binning of continuous variable 3.Transform a skewed variable 4.Using the SPSS Data Editor

SPSS Data Manipulation and Syntax Editor 1.Recode continuous variable into arbitrarily-defined or pre-defined categories Example: Define age into 3 categories (arbitrary) 45-54 55-64 65 and older SPSS Transform Recode into different variables Input variable is age Output variable Name:age_cat Label:Age in 3 categories Click on old and new values Range – specify explicitly 45-54 = value 1 54 64 = value 2 65 and older = value 3

SPSS Data Manipulation and Syntax Editor 2.Visual binning of continuous variable Example: Body mass index Put in output name for binned variable Make cutpoints Equal percentiles based on scanned cases Put in labels for frequency display in bar chart SPSS Code Visual Binning.

SPSS Data Manipulation and Syntax Editor 3.Transform a skewed variable Descriptive statistics for triglycerides in natural scale Mean, median, SD, min, max, skewness, kurtosis Chart = histogram with normal curve superimposed Triglycerides are skewed. Use a transformation to create a new variable and reduce the skew in triglycerides. SPSS Compute variable Target Variable:LOG_TRIG Numeric Expression:lg10(LAB_TRIG_VAP) SPSS Syntax:COMPUTE log_trig=lg10(LAB_TRIG_VAP).

SPSS Data Manipulation and Syntax Editor 4.Using the SPSS Data Editor SPSS:File: New (syntax) Save the file with a new name 1.Select males only (scr_sex=1) Data Select Cases If scr_sex=1 USE ALL. COMPUTE filter_\$=(SCR_SEX=1). VARIABLE LABELS filter_\$ 'SCR_SEX=1 (FILTER)'. VALUE LABELS filter_\$ 0 'Not Selected' 1 'Selected'. FORMATS filter_\$ (f1.0). FILTER BY filter_\$. EXECUTE. 2.Run descriptives for age 3.Copy code and repeat for females (scr_sex=2);

SPSS Data Manipulation and Syntax Editor 4.Using the SPSS Data Editor USE ALL. COMPUTE filter_\$=(SCR_SEX=1). VARIABLE LABELS filter_\$ 'SCR_SEX=1 (FILTER)'. VALUE LABELS filter_\$ 0 'Not Selected' 1 'Selected'. FORMATS filter_\$ (f1.0). FILTER BY filter_\$. EXECUTE. DESCRIPTIVES VARIABLES=SCR_AGE /STATISTICS=MEAN STDDEV MIN MAX. USE ALL. COMPUTE filter_\$=(SCR_SEX=2). VARIABLE LABELS filter_\$ 'SCR_SEX=2 (FILTER)'. VALUE LABELS filter_\$ 0 'Not Selected' 1 'Selected'. FORMATS filter_\$ (f1.0). FILTER BY filter_\$. EXECUTE. DESCRIPTIVES VARIABLES=SCR_AGE /STATISTICS=MEAN STDDEV MIN MAX.

Download ppt "Statistics for clinicians l Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,"

Similar presentations