Surfing the education wave with official statistics

Slides:



Advertisements
Similar presentations
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Lesson 12.
Advertisements

Self-employed Evidence base Purpose This slide-pack aims to provide a broad evidence-base on self- employment in the UK. Drawn predominantly from.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Inference for Regression
SURFing with Statistics New Zealand Nathaniel Pihama and Deborah Brunning Statistics New Zealand Statistics Teachers' Day 30 November 2007.
Jared Hockly - Western Springs College
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Mathematics and Statistics A look at progressions in Statistics Jumbo Day Hauraki Plains College 15 th June 201 Sandra Cathcart.
Data analysis: Explore GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 9.
Chapter 13 Multiple Regression
Lecture 23: Tues., Dec. 2 Today: Thursday:
The Characteristics of Employed Female Caregivers and their Work Experience History Sheri Sharareh Craig Alfred O. Gottschalck U.S. Census Bureau Housing.
Chapter 12 Multiple Regression
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Multiple Regression Models
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Researching society and culture Alan Bradley
Ch. 14: The Multiple Regression Model building
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Time Series and Forecasting Chapter 16.
Time Series and Forecasting
© 2003 Prentice-Hall, Inc.Chap 12-1 Business Statistics: A First Course (3 rd Edition) Chapter 12 Time-Series Forecasting.
Bringing e-learning into a national certificate of official statistics Sharleen Forbes Adjunct Professor of Official Statistics School of Government, Victoria.
Inference for regression - Simple linear regression
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
Employment, unemployment and economic activity Coventry working age population by disability status Source: Annual Population Survey, Office for National.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
Quantitative Skills 1: Graphing
Employment, unemployment and economic activity Coventry working age population by gender Source: Annual Population Survey, Office for National Statistics.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
1 LECTURE 6 Process Measurement Business Process Improvement 2010.
1 Things That May Affect Estimates from the American Community Survey.
Formal Inference Multivariate Internal. Introduction This report compares if Auckland or Wellington citizens are more likely to borrow more money. The.
Introductory Statistics Week 4 Lecture slides Exploring Time Series –CAST chapter 4 Relationships between Categorical Variables –Text sections.
Things that May Affect the Estimates from the American Community Survey Updated February 2013.
Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe.
Lecture 4 Introduction to Multiple Regression
The Widening Income Dispersion in Hong Kong: 1986 – 2006 LUI Hon-Kwong Dept of Marketing & International Business Lingnan University (March 14, 2008)
Statistics & Probability Level I Parts 1,2,3. School Numerical  Money earned  Pulse rate  Height  Height (belly button)  Arm span  Length.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Seven.
EMPLOYMENT AND EARNINGS James and Clayton. Topic of Interest Describes the economic status of all businesses in Canada (trends) Helps with determining.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Multiple regression.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Using official statistics in the classroom Sharleen Forbes (assisted by Martin Ralph & Nathaniel Pihama) Statistics New Zealand School of Government, Victoria.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Using Official Statistics resources in your class room Emma Mawby and Te Aomihia Walker Statistics New Zealand
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics Education, Early 21st Century, New Zealand Pip Arnold Team Solutions The University of Auckland Presentation to Primary Symposium 2009.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
QM Spring 2002 Business Statistics Analysis of Time Series Data: an Introduction.
Exploratory data analysis, descriptive measures and sampling or, “How to explore numbers in tables and charts”
Chapter 12 Inference for Linear Regression. Reminder of Linear Regression First thing you should do is examine your data… First thing you should do is.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
STUC – SG Biannual – June 2013 Employment in Scotland is increasing and unemployment is decreasing. Scotland is outperforming the UK on all headline labour.
Aiming Higher Renfrewshire 16 June 2016 Paul Zealey, Skills Planning Lead.
TIME SERIES MODELS. Definitions Forecast is a prediction of future events used for planning process. Time Series is the repeated observations of demand.
Stats Methods at IC Lecture 3: Regression.
Chapter 13 Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
Travelling to School.
Applied Biostatistics: Lecture 2
Multiple Regression Analysis and Model Building
CHAPTER 29: Multiple Regression*
Writing the executive summary section of your report
Presentation transcript:

Surfing the education wave with official statistics Sharleen Forbes Statistics New Zealand School of Government, Victoria University

To cover: The role of a National Statistics Office in education - why surf at all? Prioritising - what can we afford and where should we invest? Current initiatives - Community groups Schools - Tertiary education Playing with official statistics Examples for classroom use Where to in the future? Providing more sets of real data New ways of visualising data

Role of Statistics New Zealand Lead the state sector in production of official statistics (official statistics system responsibility) Employ large number of statisticians Not funded specifically for education (promote, partner or facilitate rather than provide) Need to provide easily understood statistics (Public Good requirement) Should target informal / second chance education (NSO Workshop: ICOTS 6, Singapore) Focus on official statistics

Differences between official and other statistics OFFICIAL STATISTICS ”OTHER” STATISTICS / RESEARCH Often based on complex sample designs Often simple surveys or designed experiments Broad coverage (many variables – often high-level measures) In-depth studies Large-scale (provide comparisons between groups) Usually relatively small scale (experiments or surveys) Usually repeated regularly (provide long time series) Mainly cross-sectional (single point of time) Internationally comparable (agreed standards, classifications, indexes) Relevant to population studied (focused on research/policy question) Simple analysis provided by collectors (Univariate or bivariate) Sophisticated analysis (Multivariate) Provide primary data source Can involve secondary analysis (of other data sources) High cost Generally lower cost

Prioritising - what can we afford - where should we invest? Need to balance external demands with internal training needs Limited funds (need to ‘pick the wave’ - where can we make a difference?)

Current initiatives - community groups State sector (Official Statistics System) Certificate of Official Statistics (Level 4) School of Government and ANZSOG courses Workshops and seminars Journalists JTO compulsory statistics unit(s) Statistics prize Small businesses GoStats! Maori communities Pilot projects

Current initiatives - schools Resources to support the new curriculum Schools Corner on Statistics New Zealand website (http://www.stats.govt.nz/schools-corner) CensusAtSchools Joint funder (http://www.censusatschool.org.nz) Dataset provision Census Official Statistics Surveys Synthetic Unit Record Files (SURFs)

Current initiatives - tertiary education Network of Academics in Official Statistics To provide training and research Undergraduate student prizes ($1000) Official Statistics Research Fund Partnerships with researchers Vice-Chancellor’s agreement Confidentialised Unit Record Files (CURFs) Half-time Professor of Official Statistics School of Government, Victoria University

Playing with official statistics - Examples Census data Official Statistics Survey data Specially constructed data sets Confidentialised Unit Record Files (CURFS) Synthesised Unit Record Files (SURFS)

The statistical investigation (PPDAC) cycle (Creators: Wild and Pfannkuch, Auckland University,1999) Problem – statement of the research questions Plan – procedures used to carry out the study Data – data collection process Analysis – summaries and analyses of the data to answer the questions posed Conclusion – about what has been learned.

1. Census data example Problem (Question) Is Hamilton ‘greener’ than Wellington? Plan / Data Use 2006 Census data on ‘the way people travel to work’ to indicate how ‘green’ a city is. (www.stats.govt.nz/census/)

Analysis

Definitions & (Re)classifications How many and what classes of ‘green’ shall we have? Have defined ‘green-ness’ by mode of travel to work Let’s have only 3 classes of ‘green-ness’ Not green = Driving private or company vehicles Green = Passenger in private vehicle or using public transport Very green =Walking, biking or working at home Omit other categories

More analysis

Conclusion (and classroom questions) Wellington is ‘greener’ than Hamilton Questions Is ‘mode of travel to work’ a good indicator of ‘green-ness’? What other variables might affect ‘mode of travel’? Should we use more than one indicator?

Official Statistics Survey data Problem (questions) Are fewer people unemployed now than in previous years? Are you less likely to be unemployed if you have a high level of education ? Plan / Data Analyse time series data on national unemployment rates Statistics New Zealand’s Household Labour Force Survey (www.stats.govt.nz)

Analysis - Question a). Time series plots

Conclusions (and classroom questions) Unemployment has been lower since 2004 than in previous years Since 2004 unemployment has stayed at roughly the same level (about 4%) Seasonality is not marked Questions What was the cause of the peaks (1991-3 and 1999) in unemployment? What do the small peaks in 2004 - 2007 reflect? Should we answer a count question (number unemployed) with a rate (percent unemployed in the labour force)?

Analysis - Question b). Time series plots

Conclusions (and classroom questions) Pattern over time is similar for all qualification groups. Unemployment rate always highest for workers with no educational qualifications. Questions Which group appears to be the most disadvantaged when unemployment is high? What appears to be different in recent (compared to past) years between the qualification groups?

Another sample survey example - a simple look at seasonality Problem (question) Is there an annual pattern in retail sales? Plan / data Check for seasonality in quarterly summary time series data for monthly retail trade sales (in dollars) Statistics New Zealand’s Retail Trade Survey (www.stats.govt.nz)

Analysis Time series plot

Conclusions (and classroom questions) Annual seasonality - peak every December / January Rising trend over time - plateau in last 3 quarters Questions What components of retail trade would contribute most to the December peaks? What does it mean when the seasonally adjusted and trend lines lie virtually on top of each other? Easter fell in the March rather than June quarter in 2008. Is there any evidence that this affected the pattern of retail sales?

3. Specially constructed data sets - Confidentialised datasets (e. g 3. Specially constructed data sets - Confidentialised datasets (e.g. 2004 Income Survey)

SURFING: Classroom Examples (SURF creator: Pauline Stuart, Statistics NZ) Using 2004 Income Survey SURF data. Data available on CD or downloaded from Schools Corner on the Statistics New Zealand website (www.stats.govt.nz/schoolscorner/). Dataset has 200 records and seven variables: gender (male, female) highest education qualification (none, school, vocational, degree) marital status (married, never, previously, other) ethnic group (European, Maori, Other) age (15-45) hours worked weekly (0-79) weekly income ($0-$2000).

Example Background Problem (questions) In this example we let the SURF dataset represent a company’s employees. Every employee creates the same administration costs regardless of how many hours are worked. The company is concerned that its staff administration costs are too high. Problem (questions) Do most employees work a ‘normal’ (40 hour) week? What variables are related to the number of hours worked?

Specific questions for secondary school classrooms What proportion of employees work at least 40 hours per week? (Summary) 2. Are these proportions different for males and females? (Comparison) 3. Do males tend to work more hours per week than females? (Comparison) 4. What is the relationship between hours worked and income? (Relationship between two measurement variables)

Plan / Data (a). Take a random sample of 35 from the SURF Analysis Table: Sample Summary Statistics Total By gender (Records) (35) Male(17) Female(18) Mean 40.0 45.5 34.7 Standard deviation 11.9 8.4 12.6 Minimum 6 38 6 Lower quartile 38 40 27 Median 40 40 38.5 Upper Quartile 45 45 42 Maximum 65 65 50

Conclusions (and classroom questions) Only half of all employees work 40 hours or more. On average (mean) males work longer hours than females Hours females work vary (standard deviation, inter-quartile range) more than hours males work. Questions Are samples of size 17 and 18 large enough? (beware of categorical data) What does it indicate when the mean and the median are different?

Plan / Data (b). - Resample Compare between students’ samples (summary statistics) Combine students’ samples and create new summary statistics Sample (another 35 say) and compare (or combine) summary statistics

Plan / Data (c). - Use all the SURF data Analysis Summary Statistics (Total SURF): Hours worked Total SURF By gender (Records) (200) Male(93) Female(107) Mean 33.7 42.1 26.4 Standard deviation 16.2 13.2 14.9 Minimum 2 5 Lower quartile 20 39 14 Median 40 25 Upper Quartile 45 50 Maximum 70 60 How do sample statistics compare with total SURF? Would a graph be easier to interpret than the table?

Analysis Graphs of SURF data

Conclusions (and classroom questions) Use tables for reference, graphs to tell a story. Females bimodal?: at 5-25 hours (part-time) and 35-50 hours (full-time)? Males tri-modal?: small at 10-15 hours (part-time), large at 35-55 hours (full-time), small at 60-75 hours (maybe managers)? Proportions of males and females working 40 hours or more are different. About half of the males do but only about a quarter of the females do. Questions What is the ‘clumping’ at 40 hours? Given the size of the SURF do you think the above patterns will be similar if other SURFs are taken?

Analysis - Question 4. Relationship between hours worked and income?

Conclusion (and classroom questions) Income increases as work more hours. Questions What is the estimated income for someone who doesn’t work? What extra income (on average) is expected if work an extra hour per week? Is the (regression) line a good fit to the data?

Other factors related to hours worked Other factors related to hours worked? (Sex / Highest qualification / Ethnicity, etc.) Example from a first-year university course Creator: John Harraway, Otago University Plan / Data Recategorise highest qualification Secondary = None OR Secondary (105) =S Tertiary = Vocational OR Tertiary (95) =T Do a linear regression in SPSS (equivalent to t-test for difference in means)

Analysis SPSS regression output Weekly Income = $(414 + 344Tertiary) 95% confidence interval for increase in income if have a tertiary qualification is $257 - $431 T = 7.8, p = 0.000.. R2 = 0.24 (only about quarter of the variation in the points explained by the best-fitting line)

Conclusion (and classroom question) Income is higher on average (by $344) if have a tertiary qualification. Question Is ‘qualification’ a good explanator of income earned?

Are there multiple factors related to income? Problem (Question) Are both ‘qualification’ and ‘hours worked’ related to income? Plan / Data Do a multiple regression (main effects model - no interaction terms) in SPSS using SURF data

Analysis Scatterplot: Income by hours worked and qualification (S = secondary, T = tertiary)

SPSS regression output (values extracted & rounded for all 3 models) Unstandardised Coefficients 95% Confidence Interval t Sig. R2 ß Std error 1. (Constant) .3 34 (-68,68) 0.01 .992 .63 Hours 17 1 (15,19) 18.5 .000 2. (Constant) 414 30 (356, 472) 14.0 .24 Tertiary 344 44 (257, 431) 7.8 3. (Constant) -19 32 (-83, 45) -0.59 .553 .69 15 (14,16) 16.4 & Tertiary 183 (125, 242) 6.1

Conclusions Conclusions Weekly Income = $ (-19 + 15xHours + Worked + 183xTertiary) Conclusions Both hours worked and highest qualification are related to weekly income earned Mean increase in income per hour worked is reduced (from $17 to $15) if tertiary also considered Mean increase in income if have a tertiary qualification is also reduced (from $344 to $183) when adjusted for number of hours worked 95% confidence interval for the intercept (income when no hours are worked) still contains zero

Classroom questions Questions Is there any ‘interaction’ between hours worked and qualification? Which of the above models fits the data best? Are there any outliers? What does a scatterplot of the residuals (distances from the line) indicate?

More resampling Use SURF as sample from CURF population Bootstrapping Take repeated samples with replacement (of same size as original, n=200). Jack-knifing Take repeated samples dropping one value from original sample each time (n=199). Calculate mean and standard deviation of sample means Compare summary statistics with CURF (or full 2004 Income Survey).

Where to from here? Continue and develop partnerships (academics, teachers, community groups) More CURFs and SURFs (official launch 1 September 2008 - 2001 Savings Survey SURF www.stats.govt.nz/schools-corner) Increased free access to data for post-graduate students Data visualisation (dynamic graphs) More across-discipline outputs

Animated population pyramids (Creator: Martin Ralphs, Statistics NZ)

Economic structure population pyramid (Office of National Statistics: UK)

Gapminder: www. gapminder Gapminder: www.gapminder.org Geography, history, demography, econometrics (Creator: Hans Rosling)

Questions and comments What are your ideas for the future? Contact sharleen.forbes@stats.govt.nz Thank you.