XIAO WU DATA ANALYSIS & BASIC STATISTICS.

Slides:



Advertisements
Similar presentations
David Pieper, Ph.D. STATISTICS David Pieper, Ph.D.
Advertisements

October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.
Statistical Tests Karen H. Hagglund, M.S.
Midterm Review Session
QUANTITATIVE DATA ANALYSIS
Chapter 10 Simple Regression.
Independent Sample T-test Formula
Final Review Session.
PSYC512: Research Methods PSYC512: Research Methods Lecture 19 Brian P. Dyre University of Idaho.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Probability and Statistics Review
Chapter 19 Data Analysis Overview
EXPERIMENTAL DESIGN Random assignment Who gets assigned to what? How does it work What are limits to its efficacy?
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 18-1 Chapter 18 Data Analysis Overview Statistics for Managers using Microsoft Excel.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Applying Mendel’s Principles Probability, Punnett Squares, & Independent Assortment (Dihybrid Cross) Section 11.2.
Understanding Research Results
Statistical Analysis I have all this data. Now what does it mean?
Chi-Squared Test.
Selecting the Correct Statistical Test
Statistical Analysis & Techniques Ali Alkhafaji & Brian Grey.
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
PPA 501 – A NALYTICAL M ETHODS IN A DMINISTRATION Lecture 3b – Fundamentals of Quantitative Research.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
1 G Lect 6b G Lecture 6b Generalizing from tests of quantitative variables to tests of categorical variables Testing a hypothesis about a.
Statistical Analysis I have all this data. Now what does it mean?
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Research Project Statistical Analysis. What type of statistical analysis will I use to analyze my data? SEM (does not tell you level of significance)
Linear correlation and linear regression + summary of tests
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
The Statistical Analysis of Data. Outline I. Types of Data A. Qualitative B. Quantitative C. Independent vs Dependent variables II. Descriptive Statistics.
Statistical test for Non continuous variables. Dr L.M.M. Nunn.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
Chi square analysis Just when you thought statistics was over!!
Academic Research Academic Research Dr Kishor Bhanushali M
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Three Broad Purposes of Quantitative Research 1. Description 2. Theory Testing 3. Theory Generation.
Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.
Medical Statistics as a science. Меdical Statistics: To do this we must assume that all data is randomly sampled from an infinitely large population,
Principles of statistical testing
Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.
Remember You just invented a “magic math pill” that will increase test scores. On the day of the first test you give the pill to 4 subjects. When these.
Lecture 11. The chi-square test for goodness of fit.
Revision of topics for CMED 305 Final Exam. The exam duration: 2 hours Marks :25 All MCQ’s. (50 questions) You should choose the correct answer. No major.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Chapter 18 Data Analysis Overview Yandell – Econ 216 Chap 18-1.
I. ANOVA revisited & reviewed
Distributions of Nominal Variables
Statistical Modelling
Distributions of Nominal Variables
Part Three. Data Analysis
Correlation A bit about Pearson’s r.
SDPBRN Postgraduate Training Day Dundee Dental Education Centre
Introduction to Statistics
Basic Statistical Terms
Probability Key Questions
When You See (This), You Think (That)
Experimental Design Data Normal Distribution
15.1 The Role of Statistics in the Research Process
What is a χ2 (Chi-square) test used for?
Introductory Statistics
Presentation transcript:

XIAO WU DATA ANALYSIS & BASIC STATISTICS

PURPOSE OF THIS WORKSHOP Statistics as a useful tool to analyze results Basic terminology and most commonly used tests Exposure to more advanced statistical tools

WHY DO WE NEED STATISTICS?

Summary Classification Interpretation Pattern searching Abnormality identification Prediction Intrapolation Extrapolation

SUMMARY

SUMMARY Mean, median, mode Variance, standard deviation Max, min values and range Quartiles

EXAMPLE Firm A Mean: $5,800 Firm B Mean: $5,000

EXAMPLE Firm A Mean: $5,800 Median: $4,000 SD: $7,270 3 rd Quartile: $4,000 1 st Quartile: $500 Firm B Mean: $5,000 Median: $5,000 SD: $203 3 rd Quartile: $5,175 1 st Quartile: $4,825

EXAMPLE #Salary ($) #Salary ($)

CLASSIFICATION Identification of variable Independent vs. dependent Numeric vs. categorical Variable Categorical Nominal Ordinal Numeric Continuous Discrete

PATTERN SEARCHING Distribution of data Some commonly used distributions Uniform Binomial Poisson … Central limit theorem

UNIFORM Every outcome has equal chance Example: Flipping a coin Rolling a dice What if you need to flip multiple times?

BINOMIAL Two outcomes, probability p and 1- p Multiple trials: n Example: Flipping a coin 100 times Germination of multiple seeds su.edu.stat414/files/lesson09/graph_n15_p02.gif

POISSON Counts of rare, independent events Each with probability, or average rate p Example: radioactive decay

THE MOST IMPORTANT DISTRIBUTION

NORMAL DISTRIBUTION Central limit theorem Every distribution converges to a normal distribution Large sample size  normal distribution Parameters: mean standard deviation

PATTERN SEARCHING Hypothesis testing Difference between two populations Z-test or t-test? What does p-value mean? Family-wise error – Bonferroni correction More than two possibilities Chi square test Fisher’s exact test More than two variables ANOVA

EXAMPLE 1 SAT score is related to gender Null hypothesis Alternative hypothesis (3 possibilities) One or two tail? Z or T test? p=0.07, conclusion?

EXAMPLE 2 Predictors of stroke Age Hypertension Gender …

EXAMPLE 3 Genome-wide association studies Scanning markers across the DNA of many people to find genetic variations associated with certain diseases

PATTERN SEARCHING Hypothesis testing One variable Z-test or t-test? What does p-value mean? Family-wise error – Bonferroni correction Compare two categorical variables Chi square test Fisher’s exact test More than two variables ANOVA

CHI SQUARE Punnett Square A cross between two pea plants yields 880 plants, 639 green, 241 yellow Hypothesis: The green allele is dominant and both parents are heterozygous.

CHI SQUARE Gg G GG (green) Gg(green) g gg (yellow) 75% green 25% yellow

CHI SQUARE GreenYellow Observed (o) Expected (e) Deviation (d=o – e)-2121 Deviation squared (d^2) 441 d^2/e Sum2.669 Degree of freedom: number of categories – 1 = 1

CHI SQUARE

PREDICTION Regression Linear regression Multiple linear regression Accuracy vs. simplicity Validation leave-k-out U/s1600/actnactn+1.png

EXAMPLE Use brain structural measurements to predict a subject’s performance on picture vocabulary test 144 total structural measurements 521 subjects First step: eliminate unnecessary variables All zeros? Highly correlated pairs Variables that do not correlate well with performance score

EXAMPLE Run regression Validation: leave 1 out and leave 10 out Principle component analysis …

PREDICTION More complicated models: Baysian approach Use prior knowledge to update prediction Diffusion weights Use local structure to predict neighboring values

STATISTICAL TOOLS EXCEL MatLab R MiniTab …

QUESTIONS?

MY OWN RESEARCH Cost-effectiveness analysis Mathematical modeling in medicine Simulate iterations rather than actual patients

RECENT RESULTS

RESULTS

GROUP EXERCISE