# Www.chrisbilder.com1 of 31 Turning data into knowledge to solve real world problems Christopher R. Bilder, Ph.D. Department of Statistics University of.

## Presentation on theme: "Www.chrisbilder.com1 of 31 Turning data into knowledge to solve real world problems Christopher R. Bilder, Ph.D. Department of Statistics University of."— Presentation transcript:

www.chrisbilder.com1 of 31 Turning data into knowledge to solve real world problems Christopher R. Bilder, Ph.D. Department of Statistics University of Nebraska-Lincoln www.chrisbilder.com

2 of 31 15 years ago…  The year is 1990  Music – U2  George Bush is president  TV – The Simpson’s  Millard South –Senior year –Big hair –In the middle of winning state titles in basketball for 3 out of 4 years (1988, 1989, 1991)  What am I going to major in at college? –Calculus I –No AP Statistics!

www.chrisbilder.com3 of 31 15 years ago…  UNO (1990 – 1994) –Math undergraduate major – What can you do with a degree? –Planned to be an actuary –Hypothesis testing in a statistics course (junior year) Use for decision making! Scientifically prove a hypothesis or statement  Kansas State University for graduate school (1994 – 2000) –Statistics graduate major in Department of Statistics –Master of Science (MS) and Doctor of Philosophy (PhD)  Oklahoma State University faculty (2000 – 2003) –Department of Statistics  UNL faculty (2003 – now) –NEW Department of Statistics

www.chrisbilder.com4 of 31Purpose  Tell you a little about the statistical science  Turning data into knowledge to solve real world problems –3 actual examples  AP statistics exam  Website (www.chrisbilder.com/statistics) for more information

www.chrisbilder.com5 of 31  Undergraduate teaching example for a course like AP STATs  How could you determine which grocery store, Super Wal- Mart or Baker’s, has lower average prices? –Paired or dependent two sample hypothesis test for  Wal-Mart -  Baker’s –Sample the same items at each store Grocery store prices

www.chrisbilder.com6 of 31  Undergraduate teaching example for a course like AP STATs  How could you determine which grocery store, Dillon’s or Food-4-Less in Manhattan, KS, has lower average prices? –Paired or dependent two sample hypothesis test for  Dillon’s -  Food-4-Less –Sample the same items at each store  Only cereals from Fall 1998 Grocery store prices

www.chrisbilder.com7 of 31 Grocery store prices  Sample:

www.chrisbilder.com8 of 31 Grocery store prices  Do you think there are mean differences? 25% 75% 50%

www.chrisbilder.com9 of 31 Grocery store prices  Paired two sample hypothesis test –H o :  Dillon’s -  Food-4-Less =0 H a :  Dillon’s -  Food-4-Less  0 –t = 4.77, p-value = 0.0002, 95% C.I.: 0.1644 <  Dillon’s -  Food-4-Less < 0.4274 –Reject equal mean prices  If price was the only consideration, what store should one shop at?  Assumptions –Normal populations –The sample was taken in 1998; what about now? –Finite populations

www.chrisbilder.com10 of 31  The use of the statistical science in sports  Find a model to estimate the probability of success for placekicks (field goals, PATs) in the NFL  Video –January 7, 1996 –Playoff game –Indianapolis Colts 10 Kansas City Chiefs 7 –Lin Elliott of KC will attempt a 42 yard field goal to tie the game and send it into overtime –Field goal videoField goal videoPlacekicking

www.chrisbilder.com11 of 31Placekicking  What factors affect the probability of success for NFL placekicks? –Distance –Pressure – How do you quantitatively measure? –Wind –Grass vs. artificial turf –Dome vs. outdoor stadium  Collect sample of >1,700 placekicks during the 1995 NFL season  Find the best logistic regression model of the form where p is the probability of success x i for i=1,…,k are independent variables  i measures the effect of x i on p for i=1,…,k e  2.718; ln(e) = 1

www.chrisbilder.com12 of 31Placekicking  The  i ’s are parameters which are estimated using “iteratively reweighted least squares”  Estimated model –Change: lead change = 1, non-lead change = 0 –Distance: distance in yards –PAT: point after touchdown = 1, field goal = 0 –Wind: windy (speed > 15 MPH) = 1, non-windy = 0  What is the estimated probability of success for Elliott’s field goal? –Conditions: –Estimated probability of success: –90% confidence interval for probability of success: 0.6298 < p < 0.7402

www.chrisbilder.com13 of 31 Estimated probability of success for a field goal (PAT=0) 42 0.685

www.chrisbilder.com14 of 31 HCV prevalence  Hepatitis C (HCV) –Viral infection that causes cirrhosis and cancer of the liver  Questions: –How can people be tested in a cost effective and timely manner? Blood bank setting –What is the probability a person has HCV? What proportion of people is inflicted with HCV in a population? Prevalence in a population  Individual testing –Each blood sample is tested individually –Problems: Costly Time + or - 

www.chrisbilder.com15 of 31  Group testing –Pool the blood samples together to form n groups of size s –If the GROUP sample is negative, then all s people do not have the disease –If the GROUP sample is positive, then at least ONE of the s people have the disease May want to determine who in the group has the disease –Strategy works well when prevalence of a disease is small HCV prevalence + or -  Group 1 Group 2 Group n

www.chrisbilder.com16 of 31 HCV prevalence  Notation –p = probability an INDIVIDUAL is HCV positive (prevalence) –  = probability a GROUP is HCV positive –s = group size –n = number of groups –T be a random variable denoting the number of positive GROUPS T has a binomial distribution with “n trials” and “  as the probability of success”

www.chrisbilder.com17 of 31 HCV prevalence  How can we estimate p? –We observe information about the groups, not individuals! –Estimate  with = # positive / # of groups –  = P(group is positive) = P(at least one individual is positive) = 1 – P(no individuals are positive) using complement rule = 1 – P(all individuals are negative) = 1 – (1 – p) s since p = P(individual is positive) and s individuals per group –p = 1 – (1 –  ) 1/s –Then

www.chrisbilder.com18 of 31 HCV prevalence  Estimation of HCV prevalence in Xuzhou City, China –Data from Liu et al. (Transfusion, 1997) –1,875 blood donors screened for HCV There were 42 positives –In order to test the usefulness of group testing, blood samples were also pooled n = 375 groups s = 5 individuals per group t = 37 positive groups –Estimates of p, probability individual is positive Using individual data: 42/1875 = 0.0224 Using group data: –Which is easier and more cost effective? 1875 tests using individual testing 375 tests using group testing

www.chrisbilder.com19 of 31 HCV prevalence  New research – MS/PhD research –What factors could affect p? –Include independent variables to help model p –Problem: Do not have the individual outcomes –After a group is tested positive, how can you find what individuals have the disease? Use model to help decide who to retest if get a positive group –Multiple diseases HCV HIV Other disease Simultaneously model

www.chrisbilder.com20 of 31 HCV prevalence  Multiple vector transfer designs –Swallow (Phytopathology, 1985) –Want to estimate the probability a insect vector transfers a pathogen (virus, bacteria, etc.) to a plant Brown planthopper Whitebacked planthopper

www.chrisbilder.com21 of 31 HCV prevalence  Multiple vector transfer designs (continued) Greenhouse  Enclosed test plant Does not transmit virus Transmits virus y=0 y=1 y=0 y=1 y=0 Planthopper y = 0 if plant is negative, 1 if plant is positive T = number of plants with disease

www.chrisbilder.com22 of 31 Why statistics?  Statistics is used in many diverse areas! –Statistics is the “science of science” –Florence Nightingale quote: the most important science in the whole world: for upon it depends the practical application of every other science and of every art: the one science essential to all political and social administration, all education, all organization based on experience, for it only gives results of our experience.  Take statistics courses in college! –Of course, I want you to consider coming to UNL! –Statistics is mainly a graduate discipline, so there is no undergraduate major at UNL –Undergraduate minor in statistics can be useful for many majors –Most statisticians have an undergraduate degree (Bachelor of Science) in math

www.chrisbilder.com23 of 31 Why statistics?  Where do statisticians work? –Pharmaceutical and medical research – Pfizer, Merck, medical centers –Marketing – Target, Hallmark –Government research labs – INEEL, Los Alamos, Sandia, Argonne –Agriculture – Pioneer Hi-Bred –Consulting firms – Quintiles –In Nebraska – ConAgra, Gallup, First National Bank, MDS Pharma, Experian, UNMC and Creighton medical center, various universities, Pfizer, Acton International, Nebraska state agencies, Union Pacific  Everyone that I have known has had a job offer before they graduated!  How many statisticians are there? – 20,000

www.chrisbilder.com24 of 31 Why statistics?  Salaries –Non-academic starting (2003 American Statistical Association survey) Background needed  Strong in mathematics and using computers –Majority of statisticians have Bachelor’s degrees in mathematics Good with calculus Applied math courses Take at least one statistics course Comfortable with using software packages –To actually be a “statistician”, usually need to go to graduate school to get a MS or PhD in statistics Financial support Graduate Teaching Assistantship Survey response rate was 23.5%; see salary surveys at the American Statistical Association’s website

www.chrisbilder.com25 of 31 Why statistics?  What courses to take next in college? –AP statistics equivalent to a one semester introductory statistics course without calculus UNL: STAT 218 (Introduction to Statistics) UNO: MATH/STAT 3000 (Statistical Methods I); Business Administration 2130 (Principles of Business Statistics) –Theory – 2 semester sequence using calculus I-III UNL: STAT 462 (Distribution Theory) and STAT 463 (Statistical Inference) UNO: MATH 4740 and 4750 (Intro. to Probability and Statistics I and II) –Applications UNL: STAT 450 (Introduction to Regression Analysis) or STAT 412 (Introduction to Experimental Design) UNO: MATH /STAT 3010 (Statistical Methods II); Business Administration 3140 (Business Statistical Applications)

www.chrisbilder.com26 of 31 Why statistics?  Other recommended UNL classes (undergraduate) –MATH 340 Numerical Analysis –MATH 314 Applied Linear Algebra –MATH 325 Elementary Analysis and MATH 425 Mathematical Analysis Helpful if go on for a PhD –Computer science programming courses  Other recommended UNO classes (undergraduate) –MATH 3300 Numerical Methods –MATH 4050 Linear Algebra –MATH 4760 Topics in Modeling –MATH 4230 and 4240 Mathematical Analysis I and II Helpful if go on for a PhD –Computer science programming courses

www.chrisbilder.com27 of 31 AP Statistics  Grading done in Lincoln! –State fair grounds –Grade the free response section of about 66,000 student exams (2004) –250 AP statistics high school teachers and college professors –June 13 to June 19, 2005 –8:30AM – 4:45PM EVERYDAY

www.chrisbilder.com29 of 31 AP Statistics  Question #6 in 2002 –4 parts – (a), (b), (c), (d) –Each part is graded as E = Essentially correct P = Partially correct I = Incomplete –Graders are given a “conversion” table to show how to convert the scores into a numerical score 4 = Complete response 3 = Substantial response 2 = Developing response 1 = Minimal response 0 = No credit – 1 point given to an E, 0.5 points given to a P, 0 points given to an I Round up if (a) or (c) has the correct interpretation –Example given at end of PowerPoint file

www.chrisbilder.com30 of 31 For more information…  E-mail me at chris@chrisbilder.com  Website: www.chrisbilder.com/statistics –This PowerPoint presentation (including example question) –Links to Introductory information about being a statistician Jobs (including internships) Salary information List of all Departments of Statistics Professional societies Course websites that myself and others teach Newspaper and magazine articles about statistical applications

www.chrisbilder.com31 of 31 Turning data into knowledge to solve real world problems Christopher R. Bilder, Ph.D. Department of Statistics University of Nebraska-Lincoln www.chrisbilder.com

32 of 31 Statistics at UNL 33 rd st. Department of Statistics

www.chrisbilder.com33 of 31 AP Statistics

www.chrisbilder.com34 of 31 AP Statistics

www.chrisbilder.com35 of 31 AP Statistics May actually be an E?

www.chrisbilder.com36 of 31 AP Statistics

www.chrisbilder.com37 of 31 AP Statistics

www.chrisbilder.com38 of 31 Estimated probability of success for a field goal (PAT=0)

Download ppt "Www.chrisbilder.com1 of 31 Turning data into knowledge to solve real world problems Christopher R. Bilder, Ph.D. Department of Statistics University of."

Similar presentations