Statistical Modeling and Data Analysis Given a data set, first question a statistician ask is, “What is the statistical model to this data?” We then characterize.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Multiple testing and false discovery rate in feature selection
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
1 Test for the Population Proportion. 2 When we have a qualitative variable in the population we might like to know about the population proportion of.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Chapter Seventeen HYPOTHESIS TESTING
QM Spring 2002 Business Statistics Introduction to Inference: Hypothesis Testing.
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Differentially expressed genes
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Hypothesis Testing. Introduction Always about a population parameter Attempt to prove (or disprove) some assumption Setup: alternate hypothesis: What.
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
Statistics for the Social Sciences
Today Concepts underlying inferential statistics
5-3 Inference on the Means of Two Populations, Variances Unknown
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
The problem of sampling error in psychological research We previously noted that sampling error is problematic in psychological research because differences.
False Discovery Rate (FDR) = proportion of false positive results out of all positive results (positive result = statistically significant result) Ladislav.
Choosing Statistical Procedures
Statistical Inference Dr. Mona Hassan Ahmed Prof. of Biostatistics HIPH, Alexandria University.
Overview of Statistical Hypothesis Testing: The z-Test
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Chapter 7 Using sample statistics to Test Hypotheses about population parameters Pages
Multiple testing in high- throughput biology Petter Mostad.
1 © Lecture note 3 Hypothesis Testing MAKE HYPOTHESIS ©
1 STATISTICAL HYPOTHESES AND THEIR VERIFICATION Kazimieras Pukėnas.
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Essential Statistics in Biology: Getting the Numbers Right
Statistical Decision Theory
Differential Expression II Adding power by modeling all the genes Oct 06.
January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
IMST 2008 / FIM XVI Decision theoretic Bayesian hypothesis testing with focus on skewed alternatives By Naveen K. Bansal Ru Sheng Marquette University.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
Lecture 18 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Not in FPP Bayesian Statistics. The Frequentist paradigm Defines probability as a long-run frequency independent, identical trials Looks at parameters.
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Statistical Testing with Genes Saurabh Sinha CS 466.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Hypothesis Testing Errors. Hypothesis Testing Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Hypothesis Testing and Statistical Significance
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Statistics for Decision Making Hypothesis Testing QM Fall 2003 Instructor: John Seydel, Ph.D.
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Chapter 4. Inference about Process Quality
Milwaukee, Wisconsin(USA)
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Comparing Two Proportions
Hypothesis Testing: Hypotheses
Chapter 9 Hypothesis Testing.
Chapter 9: Hypothesis Tests Based on a Single Sample
Type I and Type II Errors
Statistical Power.
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
Presentation transcript:

Statistical Modeling and Data Analysis Given a data set, first question a statistician ask is, “What is the statistical model to this data?” We then characterize and analyze the parameters of the model with an objective in mind. Example : SBP of Cancer Patients vs. Normal patients Cancer: 145, 165, 134, 120, 112, 156, 145, 133, 135, 120 Normal: 138, 120, 112, 110, 128, 134, 128, 109, 138, 140 Objective: Do cancer patients have higher SBP than the normal patients? 1

Systolic blood pressure normal cancer Does the data support this hypothesis? Population of cancer patients with a probability distribution Population of normal patients with a probability distribution 2

3

4

5

6

7

8

Image Process: 9

10

The same can be said about weather map. 11

Data Analysis Generally speaking, we perform one or more of the following tasks in data analysis (statistical inference) Estimate the model Hypothesis testing Predictive analysis Given the sample data, objective is to make inference about the population described by the probability model. All inferences are based on probability model assumed. 12

13

Sample ……-- ……-- observed ……-- 14

15

16

17

18

Systolic blood pressure normal cancer 19

20

21

22

23

Computation of the posterior There are two popular techniques of computing posterior distribution: 1. Metropolis-Hasting Algorithm 2. Gibbs Sampler These techniques can be used effectively for complex probability model and reasonable priors. 24

Frequentist vs. Bayesian FrequentistBayesian All data information isAll data information is contained in the likelihoodcontained in the likelihood function.function and the prior The estimates are viewedEstimates are viewed in in terms of how they behaveterms of where they are on the averagelocated in the posterior Estimates are generally obtainedEstimates are obtained from by maximizing the likelihoodthe posterior. Techniques function. Techniques includeinclude Gibbas Sampler, Newton-Raphson, EM-algorithm Metropolis-Hasting etc. 25

26

27

28

Analysis of Variance (ANOVA) ANOVA is one of the most popular statistical tools of analyzing data. Y Factor 1 Factor 2 Factor 3 Does Y (the response) depends on any of the factors? A Response Variable 29

Example 1: You are doing a research on mpg (miles per gallon) for a brand of automobiles. Question: What effects mpg? mpg Wind speed Air temperature Air moisture Do wind speed, air temperature, and air moisture effect mpg? 30

Example 2: Research Question: Does blood pressure (BP) depend on weight and gender? BP Weight Gender 31

Weight BP * * * * * * ****** ******* * Female * Male There is a variation in BP. Some is due to weight, and some is due to gender. 32

33

34

35

36

37

38

39

40

41

Multiple Hypotheses: Consider 1000 independent tests each at Type-error of α = Then 5% of the null hypotheses would be falsely rejected. In other words, if 50 of the hypotheses were rejected, there is no guarantee that they were not all falsely rejected. FWER: m = # of hypotheses π = P(One or more falsely rejected hypotheses) = 1 – (Bonferroni Correction)

If m is large, α would be very small. Thus the power of detecting any true positive would be very small. Sequential Bonferroni Corrections: Let be the p-values of independent tests with corresponding null hypotheses. Holm’s Method (Holm, 1979; Scand. Statist.) If, accept all nulls. If, reject ; if, accept the rest of nulls. Continue until first j such that. In that case reject all and accept the rest of nulls.

Simes Method (Biometrika, 1986): If, reject all nulls. If not, but if, reject all Continue until first. In that case reject all Note: Both Holm’s and Simes methods are designed to refine the FWER.

False Discovery Rate (FDR): Benjamini and Hochberg (1995), JRSS When the number of hypotheses m is very large (say in thousands), and if each individual hypothesis is not important, then FWER criterion is not very useful since it yields few discoveries. For example, in a microarray data analysis, the objective is to detect potential genes for future exploration. Here, each individual gene is not important. In such cases, tests with a controlled FWER would yield few discoveries.

FDR = Expected proportion of false rejections. Accept NullReject Null Total True Null U V True Alternatives T S m- R R m FDR = = Note that FWER = P(R>0)

Benjamini and Hochberg proved that the following procedure produces : Let k be the largest integer i such that, then reject all The result was proved under the assumption of independent test statistics. It was later extended to a positively correlated test statistics by Benjamini and Yekutieli, 2001; Ann. Stat.

Bayesian Interpretation (Storey, 2003, Ann. Stat.) are independently distributed. Note: pFDR is a posterior version of the Type-I error

Directional Hypothesis Problem (Three decision problem): Suppose is rejected, but it is also important to find the direction of So the problem is to find subsets

Example: Gene selection When the genes are altered under adverse condition, such as cancer, the affected genes show under or over expression in a microarray. The objective is to find the genes with under expressions and genes with over expressions.

Directional Error (Type III error): Type III error is defined as P( Selection of false direction if the null is rejected). The traditional method does not control the directional error. For example, Sarkar and Zhou (2008, JSPI) Finner ( 1999, AS) Shaffer (2002, Psychological Methods) Lehmann (1952, AMS; 1957, AMS) Main points of these work is that if the objective is to find the true direction of the alternative after rejecting the null, then a Type III error must be controlled instead of Type I error.

Bayesian Decision Theoretic Framework

Bayes Rule