Download presentation
Presentation is loading. Please wait.
Published byViolet Ryder Modified over 9 years ago
1
www.chrisbilder.com1 of 31 Turning data into knowledge to solve real world problems Christopher R. Bilder, Ph.D. Department of Statistics University of Nebraska-Lincoln www.chrisbilder.com
2
2 of 31 15 years ago… The year is 1990 Music – U2 George Bush is president TV – The Simpson’s Millard South –Senior year –Big hair –In the middle of winning state titles in basketball for 3 out of 4 years (1988, 1989, 1991) What am I going to major in at college? –Calculus I –No AP Statistics!
3
www.chrisbilder.com3 of 31 15 years ago… UNO (1990 – 1994) –Math undergraduate major – What can you do with a degree? –Planned to be an actuary –Hypothesis testing in a statistics course (junior year) Use for decision making! Scientifically prove a hypothesis or statement Kansas State University for graduate school (1994 – 2000) –Statistics graduate major in Department of Statistics –Master of Science (MS) and Doctor of Philosophy (PhD) Oklahoma State University faculty (2000 – 2003) –Department of Statistics UNL faculty (2003 – now) –NEW Department of Statistics
4
www.chrisbilder.com4 of 31Purpose Tell you a little about the statistical science Turning data into knowledge to solve real world problems –3 actual examples AP statistics exam Website (www.chrisbilder.com/statistics) for more information
5
www.chrisbilder.com5 of 31 Undergraduate teaching example for a course like AP STATs How could you determine which grocery store, Super Wal- Mart or Baker’s, has lower average prices? –Paired or dependent two sample hypothesis test for Wal-Mart - Baker’s –Sample the same items at each store Grocery store prices
6
www.chrisbilder.com6 of 31 Undergraduate teaching example for a course like AP STATs How could you determine which grocery store, Dillon’s or Food-4-Less in Manhattan, KS, has lower average prices? –Paired or dependent two sample hypothesis test for Dillon’s - Food-4-Less –Sample the same items at each store Only cereals from Fall 1998 Grocery store prices
7
www.chrisbilder.com7 of 31 Grocery store prices Sample:
8
www.chrisbilder.com8 of 31 Grocery store prices Do you think there are mean differences? 25% 75% 50%
9
www.chrisbilder.com9 of 31 Grocery store prices Paired two sample hypothesis test –H o : Dillon’s - Food-4-Less =0 H a : Dillon’s - Food-4-Less 0 –t = 4.77, p-value = 0.0002, 95% C.I.: 0.1644 < Dillon’s - Food-4-Less < 0.4274 –Reject equal mean prices If price was the only consideration, what store should one shop at? Assumptions –Normal populations –The sample was taken in 1998; what about now? –Finite populations
10
www.chrisbilder.com10 of 31 The use of the statistical science in sports Find a model to estimate the probability of success for placekicks (field goals, PATs) in the NFL Video –January 7, 1996 –Playoff game –Indianapolis Colts 10 Kansas City Chiefs 7 –Lin Elliott of KC will attempt a 42 yard field goal to tie the game and send it into overtime –Field goal videoField goal videoPlacekicking
11
www.chrisbilder.com11 of 31Placekicking What factors affect the probability of success for NFL placekicks? –Distance –Pressure – How do you quantitatively measure? –Wind –Grass vs. artificial turf –Dome vs. outdoor stadium Collect sample of >1,700 placekicks during the 1995 NFL season Find the best logistic regression model of the form where p is the probability of success x i for i=1,…,k are independent variables i measures the effect of x i on p for i=1,…,k e 2.718; ln(e) = 1
12
www.chrisbilder.com12 of 31Placekicking The i ’s are parameters which are estimated using “iteratively reweighted least squares” Estimated model –Change: lead change = 1, non-lead change = 0 –Distance: distance in yards –PAT: point after touchdown = 1, field goal = 0 –Wind: windy (speed > 15 MPH) = 1, non-windy = 0 What is the estimated probability of success for Elliott’s field goal? –Conditions: –Estimated probability of success: –90% confidence interval for probability of success: 0.6298 < p < 0.7402
13
www.chrisbilder.com13 of 31 Estimated probability of success for a field goal (PAT=0) 42 0.685
14
www.chrisbilder.com14 of 31 HCV prevalence Hepatitis C (HCV) –Viral infection that causes cirrhosis and cancer of the liver Questions: –How can people be tested in a cost effective and timely manner? Blood bank setting –What is the probability a person has HCV? What proportion of people is inflicted with HCV in a population? Prevalence in a population Individual testing –Each blood sample is tested individually –Problems: Costly Time + or -
15
www.chrisbilder.com15 of 31 Group testing –Pool the blood samples together to form n groups of size s –If the GROUP sample is negative, then all s people do not have the disease –If the GROUP sample is positive, then at least ONE of the s people have the disease May want to determine who in the group has the disease –Strategy works well when prevalence of a disease is small HCV prevalence + or - Group 1 Group 2 Group n
16
www.chrisbilder.com16 of 31 HCV prevalence Notation –p = probability an INDIVIDUAL is HCV positive (prevalence) – = probability a GROUP is HCV positive –s = group size –n = number of groups –T be a random variable denoting the number of positive GROUPS T has a binomial distribution with “n trials” and “ as the probability of success”
17
www.chrisbilder.com17 of 31 HCV prevalence How can we estimate p? –We observe information about the groups, not individuals! –Estimate with = # positive / # of groups – = P(group is positive) = P(at least one individual is positive) = 1 – P(no individuals are positive) using complement rule = 1 – P(all individuals are negative) = 1 – (1 – p) s since p = P(individual is positive) and s individuals per group –p = 1 – (1 – ) 1/s –Then
18
www.chrisbilder.com18 of 31 HCV prevalence Estimation of HCV prevalence in Xuzhou City, China –Data from Liu et al. (Transfusion, 1997) –1,875 blood donors screened for HCV There were 42 positives –In order to test the usefulness of group testing, blood samples were also pooled n = 375 groups s = 5 individuals per group t = 37 positive groups –Estimates of p, probability individual is positive Using individual data: 42/1875 = 0.0224 Using group data: –Which is easier and more cost effective? 1875 tests using individual testing 375 tests using group testing
19
www.chrisbilder.com19 of 31 HCV prevalence New research – MS/PhD research –What factors could affect p? –Include independent variables to help model p –Problem: Do not have the individual outcomes –After a group is tested positive, how can you find what individuals have the disease? Use model to help decide who to retest if get a positive group –Multiple diseases HCV HIV Other disease Simultaneously model
20
www.chrisbilder.com20 of 31 HCV prevalence Multiple vector transfer designs –Swallow (Phytopathology, 1985) –Want to estimate the probability a insect vector transfers a pathogen (virus, bacteria, etc.) to a plant Brown planthopper Whitebacked planthopper
21
www.chrisbilder.com21 of 31 HCV prevalence Multiple vector transfer designs (continued) Greenhouse Enclosed test plant Does not transmit virus Transmits virus y=0 y=1 y=0 y=1 y=0 Planthopper y = 0 if plant is negative, 1 if plant is positive T = number of plants with disease
22
www.chrisbilder.com22 of 31 Why statistics? Statistics is used in many diverse areas! –Statistics is the “science of science” –Florence Nightingale quote: the most important science in the whole world: for upon it depends the practical application of every other science and of every art: the one science essential to all political and social administration, all education, all organization based on experience, for it only gives results of our experience. Take statistics courses in college! –Of course, I want you to consider coming to UNL! –Statistics is mainly a graduate discipline, so there is no undergraduate major at UNL –Undergraduate minor in statistics can be useful for many majors –Most statisticians have an undergraduate degree (Bachelor of Science) in math
23
www.chrisbilder.com23 of 31 Why statistics? Where do statisticians work? –Pharmaceutical and medical research – Pfizer, Merck, medical centers –Marketing – Target, Hallmark –Government research labs – INEEL, Los Alamos, Sandia, Argonne –Agriculture – Pioneer Hi-Bred –Consulting firms – Quintiles –In Nebraska – ConAgra, Gallup, First National Bank, MDS Pharma, Experian, UNMC and Creighton medical center, various universities, Pfizer, Acton International, Nebraska state agencies, Union Pacific Everyone that I have known has had a job offer before they graduated! How many statisticians are there? – 20,000
24
www.chrisbilder.com24 of 31 Why statistics? Salaries –Non-academic starting (2003 American Statistical Association survey) Background needed Strong in mathematics and using computers –Majority of statisticians have Bachelor’s degrees in mathematics Good with calculus Applied math courses Take at least one statistics course Comfortable with using software packages –To actually be a “statistician”, usually need to go to graduate school to get a MS or PhD in statistics Financial support Graduate Teaching Assistantship Survey response rate was 23.5%; see salary surveys at the American Statistical Association’s website
25
www.chrisbilder.com25 of 31 Why statistics? What courses to take next in college? –AP statistics equivalent to a one semester introductory statistics course without calculus UNL: STAT 218 (Introduction to Statistics) UNO: MATH/STAT 3000 (Statistical Methods I); Business Administration 2130 (Principles of Business Statistics) –Theory – 2 semester sequence using calculus I-III UNL: STAT 462 (Distribution Theory) and STAT 463 (Statistical Inference) UNO: MATH 4740 and 4750 (Intro. to Probability and Statistics I and II) –Applications UNL: STAT 450 (Introduction to Regression Analysis) or STAT 412 (Introduction to Experimental Design) UNO: MATH /STAT 3010 (Statistical Methods II); Business Administration 3140 (Business Statistical Applications)
26
www.chrisbilder.com26 of 31 Why statistics? Other recommended UNL classes (undergraduate) –MATH 340 Numerical Analysis –MATH 314 Applied Linear Algebra –MATH 325 Elementary Analysis and MATH 425 Mathematical Analysis Helpful if go on for a PhD –Computer science programming courses Other recommended UNO classes (undergraduate) –MATH 3300 Numerical Methods –MATH 4050 Linear Algebra –MATH 4760 Topics in Modeling –MATH 4230 and 4240 Mathematical Analysis I and II Helpful if go on for a PhD –Computer science programming courses
27
www.chrisbilder.com27 of 31 AP Statistics Grading done in Lincoln! –State fair grounds –Grade the free response section of about 66,000 student exams (2004) –250 AP statistics high school teachers and college professors –June 13 to June 19, 2005 –8:30AM – 4:45PM EVERYDAY
28
www.chrisbilder.com28 of 31 I graded in 2002 –About 900 problems graded! –16 graders in a room split into two groups –Each group has a leader Answer questions CHECKS some of your grading! –Paid $1,450 Stay in dorms Free meals and snacks Grading is not fun –Evening activities –Discussions on how to teach introductory statistics better The grading rubric –An outline of how to grade a problem that must be followed! –These are put together before graders arrive through examining sample set of tests AP Statistics
29
www.chrisbilder.com29 of 31 AP Statistics Question #6 in 2002 –4 parts – (a), (b), (c), (d) –Each part is graded as E = Essentially correct P = Partially correct I = Incomplete –Graders are given a “conversion” table to show how to convert the scores into a numerical score 4 = Complete response 3 = Substantial response 2 = Developing response 1 = Minimal response 0 = No credit – 1 point given to an E, 0.5 points given to a P, 0 points given to an I Round up if (a) or (c) has the correct interpretation –Example given at end of PowerPoint file
30
www.chrisbilder.com30 of 31 For more information… E-mail me at chris@chrisbilder.com Website: www.chrisbilder.com/statistics –This PowerPoint presentation (including example question) –Links to Introductory information about being a statistician Jobs (including internships) Salary information List of all Departments of Statistics Professional societies Course websites that myself and others teach Newspaper and magazine articles about statistical applications
31
www.chrisbilder.com31 of 31 Turning data into knowledge to solve real world problems Christopher R. Bilder, Ph.D. Department of Statistics University of Nebraska-Lincoln www.chrisbilder.com
32
32 of 31 Statistics at UNL 33 rd st. Department of Statistics
33
www.chrisbilder.com33 of 31 AP Statistics
34
www.chrisbilder.com34 of 31 AP Statistics
35
www.chrisbilder.com35 of 31 AP Statistics May actually be an E?
36
www.chrisbilder.com36 of 31 AP Statistics
37
www.chrisbilder.com37 of 31 AP Statistics
38
www.chrisbilder.com38 of 31 Estimated probability of success for a field goal (PAT=0)
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.