Model-based lifestyle behaviour estimates Dr Jennifer Mindell Clinical senior lecturer, UCL Contributors: Shelley Bradley.

Slides:

Advertisements

Similar presentations

From study objectives to analysis plan Helen Maguire.

Advertisements

Survey design. What is a survey?? Asking questions – questionnaires Finding out things about people Simple things – lots of people What things? What people?

Welsh Health Survey Health Surveys User Meeting, July 5 th, London.

Health Survey for England Rachel Craig. Health Survey for England Commissioned by the NHS Information Centre for health and social care Conducted by NatCen.

Healthy Lifestyles Synthetic Estimates Project Shaun Scholes, Kevin Pickering and Claire Deverill.

Estimating a Population Proportion

Thu. 3 June An empirical study of the “healthy immigrant effect” with Canadian Community Health Survey Yimin (Gloria) Lou, M.A. Candidate University.

© NOO 2011 noo National Obesity Observatory Examining available data for the adult population.

© NOO 2012 noo National Obesity Observatory Examining available data for the adult population.

Statistics on Obesity, PA & Diet: England, Jan 08 i Compiled by Sally Cornfield on behalf of PAN-WM Headline Findings.

Adjustments for Age-sex and MLC NRAC 29 March 2007.

The Director of Public Health Annual Report - Health Profile of Cannock Chase Judith Wright Director of Public Health South Staffordshire Health Care Trust.

Small Area Estimates of Fuel Poverty in Scotland Phil Clarke (ONS), Ganka Mueller (Scottish Government)

Math 161 Spring 2008 What Is a Confidence Interval?

EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.

Journal Club Alcohol and Health: Current Evidence May–June 2005.

Beginning the Research Design

1 Journal Club Alcohol, Other Drugs, and Health: Current Evidence May–June 2011.

Journal Club Alcohol and Health: Current Evidence September-October 2005.

Lecture Slides Elementary Statistics Twelfth Edition

Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.

Palestinian Central Bureau of Statistics (PCBS) Palestine Poverty Maps 2009 March

Change in prevalence of Chronic Kidney Disease in England over time: comparison of nationally representative cross-sectional surveys from 2003 to 2010.

Chapter 5: Descriptive Research Describe patterns of behavior, thoughts, and emotions among a group of individuals. Provide information about characteristics.

Are exposures associated with disease?

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.

The new HBS Chisinau, 26 October Outline 1.How the HBS changed 2.Assessment of data quality 3.Data comparability 4.Conclusions.

Multiple Choice Questions for discussion

RESEARCH A systematic quest for undiscovered truth A way of thinking

Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.

Liesl Eathington Iowa Community Indicators Program Iowa State University October 2014.

The effect of uncertainty on fuel poverty statistics Laura Williams, Department of Energy and Climate Change GSS Methodology Symposium, 6 th July 2011.

Deanna E. White, Adam Stevens, John Barbaro, Kristy McGill and Lynne Russell.

Heart Health in Rotherham Looking at the most recent National trends in obesity Dr John Radford Director of Public Health.

1/26/09 1 Community Health Assessment in Small Populations: Tools for Working With “Small Numbers” Region 2 Quarterly Meeting January 26, 2009.

Design and Analysis of Clinical Study 8. Cross-sectional Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.

HS499 Bachelor’s Capstone Week 6 Seminar Research Analysis on Community Health.

CHAPTER 14 Introduction to Inference BPS - 5TH ED.CHAPTER 14 1.

1 Things That May Affect Estimates from the American Community Survey.

Department of SOCIAL MEDICINE Producing Small Area Estimates of the Need for Hip and Knee Replacement Surgery ANDY JUDGE Nicky Welton Mary Shaw Yoav Ben-Shlomo.

Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.

© NOO 2012 noo National Obesity Observatory Examining available data for the adult population.

DTC Quantitative Methods Survey Research Design/Sampling (Mostly a hangover from Week 1…) Thursday 17 th January 2013.

1 MARKETING RESEARCH Week 5 Session A IBMS Term 2,

Categorical data 1 Single proportion and comparison of 2 proportions دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه علوم.

Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.

Simon Power Managing Consultant John Rae Director Understanding Communities Through PayCheck

Chapter 15 Sampling and Sample Size Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.

Things that May Affect the Estimates from the American Community Survey Updated February 2013.

5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.

Measuring adult BMI using the Active People Survey Caroline Hancock | SEPHIG 26 March 2014.

Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…

United Nations Workshop on Revision 3 of Principles and Recommendations for Population and Housing Censuses and Evaluation of Census Data, Amman 19 – 23.

U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics Occupational exposure to.

IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.

Introduction to Disease Prevalence modelling Day 6 23 rd September 2009 James Hollinshead Paul Fryers Ben Kearns.

Surveillance and Population-based Prevention Department for Prevention of Noncommunicable Diseases Displaying data and interpreting results.

Prevalence Modelling – an APHO perspective Hannah Walford Eastern Region PHO With contributions from Julian Flowers, ERPHO Michael Soljak, Informing Healthier.

Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.

Methods of drink ethanol assessment for use in monitoring surveys of alcohol consumption. William C. Kerr Deidre Patterson Thomas K. Greenfield Supported.

Public Health Outcomes Framework (PHOF) update August 2015 London briefing London Knowledge and Intelligence Service, 4 August 2015.

As a data user, it is imperative that you understand how the data has been generated and processed…

Introduction to Lifestyle Data Peter Cornish South East Public Health Intelligence Analyst Training Day 2, Session 4 11 th February 2016.

Association of Public Health Observatories Day 3 Session 2 Sources of lifestyle data Andrew Hughes South East Public Health Observatory Spring 2009 Based.

Introduction to Lifestyle data Nicola Bowtell

This will help you understand the limitations of the data and the uses to which it can be put (and the confidence with which you can put it to those.

Epidemiologic Measures of Association

Is the freedom from Cognitive Impairment really at hand?

Patterns and trends in adult obesity

MGS 3100 Business Analysis Regression Feb 18, 2016

Presentation transcript:

Model-based lifestyle behaviour estimates Dr Jennifer Mindell Clinical senior lecturer, UCL Contributors: Shelley Bradley

Why are these needed? Demand for detailed information at a range of smaller geographical levels –eg MSOAs, LAs, PCOs National surveys designed to provide reliable estimates: –national level –sometimes regional levels Sample size usually too small for direct estimates with adequate precision for smaller geographical areas

Why are these needed? Prevalence estimates of health behaviours based on survey data can only be computed for those areas covered by the sample For small areas covered by the survey: –sample size usually small –estimates have low precision – i.e. very wide CIs for the survey estimates E.g. for a percentage of 25% –sample size of 15: 95% CI of around 4%-46% –sample size of 50: 95% CI of 13%-37% Most MSOAs have no sample respondents

Basic idea behind the model-based method Find a relationship between: –estimate, as measured by the national survey (e.g smoking in HSE) and –other information in the sampled MSOAs (e.g Census and administrative data). Can use this relationship to generalise and produce reliable estimates for all MSOAs

Steps in deriving model based estimates 1. Investigate and choose data sources to be used. (Two sets of information). 2. Build statistical model relating the survey variable to the covariate information for MSOAs (or LAs or PCOs) covered by the survey. –E.g examine whether the tendency for a person to be a current smoker varies significantly between regions or between LAs with varying proportions of residents aged 16+ who were living as a couple, claiming Job Seekers Allowance etc.

Steps in deriving model based estimates 3. Use the model and covariate data (available for all MSOAs) to create ‘expected’ prevalence estimates given the characteristics of the area for all MSOAs 4. If required, ensure the model-based estimates constrained to higher level geographies

Healthy lifestyle behaviours The Information Centre commissioned NatCen to produce model-based estimates for the prevalence of healthy lifestyle behaviours using HSE data The estimates cover the time period and are for 6,781 MSOAs, 352 LAs, and 152 PCOs in England.

Examples Model-based estimates and 95% CIs produced using data from the HSfE covering the prevalence of lifestyle indicators among adults 16+: –smoking –binge drinking –obesity –consumption of 5+ portions per day of fruit and vegetables

Examples Model-based estimates with 95% CIs been produced for MSOAs in England and Wales for: –total household weekly income, –net household weekly income, –net household weekly income before housing costs, –net household weekly income after housing costs

The survey dataset for model- based lifestyles estimates Core interview questions/measurements included each year 3 years of HSfE data (2003, 2004, 2005) combined to maximise sample size Only the general population samples in each year used

Current cigarette smoking Adult respondents (aged 16 +) to the HSfE: Defined to be current smokers if they reported that they were a “current cigarette smoker” Defined as not a current smoker if they reported that they: –had “never smoked cigarettes at all”, –“used to smoke cigarettes occasionally”, or –“used to smoke cigarettes regularly”.

Fruit and vegetable consumption (adults aged 16+) Generated from data collected in the HSfE about the quantities of different types of fruit and vegetables consumed on the previous day: –Includes fresh / frozen / tinned –Vegetables / salads / pulses –Fruit / juices –Some elements capped max 1/d - guidelines. Measures summed to give total number of portions of fruit and vegetables consumed.

Binge drinking Generated from data collected about the quantities of all the different types of alcoholic drinks (beer, wine, spirits, sherry and alcopops) consumed on a respondent’s heaviest drinking day in the previous week; measures summed to give the number of units of alcohol consumed on the heaviest drinking day. Binge drinking defined separately for men and women: –Men: ≥ 8 units of alcohol on the heaviest drinking day in the previous seven days; –Women: ≥ 6 units of alcohol

Obesity Obesity generated from the height and weight of respondents, as measured by the HSfE interviewers. BMI is the weight in kilograms divided by the square of the height in metres. Defined as obese if BMI ≥ 30kg/m 2

The covariate dataset The term ‘covariate’ describes area-level characteristics potentially related to the four healthy lifestyle indicators, eg –% of residents aged 16 years + residing as a couple –life expectancy –SHA –urban/rural indicator These covariates generally average values or proportions relating to all individuals or households in the area Census provided the main source for demographic and social covariate data –because of its total geographical and population coverage

CASE STUDY: CURRENT SMOKING The process of creating the model-based estimates of healthy lifestyle behaviours for 352 LAs in England involved three main stages Stage 1 Fitting the relationship between current smoking and area-level characteristics Stage 2 Producing an initial estimate of expected prevalence Stage 3 Adjusting the LA estimate to the direct HSfE estimate for that SHA

CASE STUDY: CURRENT SMOKING Stage 1 Fitting the relationship between current smoking and area-level characteristics Using the combined HSfE data, those area- level characteristics most strongly related to whether an individual was a current smoker are identified. This was done using a technique called ‘logistic regression’.

Odds Ratio Odds Ratio = (a/c)/ (b/d) Odds ratio < 1 : Lower odds in exposed group Odds ratio = 1 : Same odds Odds ratio > 1 : Higher odds in exposed group Odds of having disease given exposure = a/c Odds of not having disease given exposure = b/d

CASE STUDY: CURRENT SMOKING Logistic regression estimates are displayed on the odds scale and sometimes displayed on the log-odds scale. Using the log-odds scale: 1.A log-odds estimate of 0 means that the covariate has no effect on current smoking, after adjusting for the other variables in the model. 2.A log-odds estimate < 0 indicates a decrease (that is, an increase in the covariate is associated with a decrease in the odds of being a current smoker). 3.A log-odds estimate > 0 indicates an increase (that is, an increase in the covariate is associated with an increase in the odds of being a current smoker).

Does the model makes sense? A number of diagnostic checks used: –to assess the appropriateness of the models developed –to show that the models are well specified and the assumptions sound These processes ensure that: –the methodology and its application are valid, –the models developed are the best possible for the data available, and –the model-based estimates are credible.

Validating the model Provides confidence in the accuracy of the estimates and the associated CIs Need to validate: –the process of making the estimates –the estimates themselves Comparison of the model–based estimates with other sources to establish the credibility of the model-based estimates

Confidence intervals Confidence intervals produced to make the margin of error around the estimates clear. The interval reflects the range within which the true value is likely to lie. The CIs represent the uncertainty in the modelling process. At the 95% confidence level, assuming that the model is a good representation of reality, each CI would be expected to contain the true value 95 times out of 100.

The survey context Two key issues to consider when using the estimates are sampling error and non- sampling error. Sampling error arises as a result of drawing a sample rather than conducting a complete census.

Non-sampling errors Defined as errors arising during the course of survey activities Unlike sampling errors, there is no simple and direct method of estimating the size of non- sampling errors. Despite our best efforts to avoid them, non- sampling errors are inevitable particularly in largescale data collections.

Non-sampling errors Sources of non-sampling error include: 1.The respondent –may not want to reveal their true amount of alcohol consumption or unintentionally provide incorrect information (measurement error) 2.The interviewer –may make mistakes when measuring the height and weight of respondents 3.Refusals to participate –Adults contacted from the survey may refuse to have their height and weight measured or refuse to participate in the survey.

Limitations of the estimates A standard direct estimate for a particular area based solely on sample respondents located within the area represents an estimate of the actual prevalence of health behaviours such as current smoking, obesity for the area in question. A model-based estimate for a particular area is the expected prevalence for that area based on its population characteristics (as measured by the census/administrative data) and does not represent an estimate of the actual prevalence.

Limitations of the estimates To interpret the estimates you should use statements such as: “Given the characteristics of the local population, we would expect approximately x% of adults within LA/PCO Y to smoke/be obese.” Model-based estimates cannot take account of any additional local factors that may impact on the true prevalence rate –e.g. local interventions –subtle differences in population demographics The estimates cannot be used to monitor performance

Limitations of the estimates Cannot usually compare between two sets of model- based estimates in two different time periods Users warned not to interpret the difference between the point estimates as a measure of change. Typically: –The models have been fitted separately –Built on a different set of geographies –The covariates are not the same –Each estimate is given with a 95% confidence interval The prevalence for an area should be viewed in light of its CI, not just the point estimate. –To disregard the CIs ignores the uncertainty that surrounds estimates derived from survey data

BBC news online October 2006

Limitations of the estimates As with any ranking based on estimates, care must be taken in interpreting the ranking of the model based estimates. –The estimates are expected prevalences not measured actual prevalence –Assigning the areas to bands would still require the uncertainty in the ranking/banding to be represented

Examples of data use

Model-based estimates of smoking prevalence, (%) AreaLowerUpperEstimate LB Harrow LB Redbridge LB Brent RBKC LB Tower Hamlets LB Lambeth LB Barking & Dagenham London England

DON’T TURN THE PAGE! Exercise A Which LAs have higher and which have lower smoking prevalence than London?

An area can be described as statistically significantly different from the regional or national average if the CIs for those estimates do not overlap. Barking & Dagenham PCT has a significantly higher (model-based) current smoking rate than England (and than London) Redbridge has a significantly lower (model-based) current smoking rate than England and London Tower Hamlet PCT cannot be said to have a significantly higher estimate than England as a whole since the CIs overlap –NB Sex and ethnicity! CIs depend on no. of areas with at least one data point

LALowerUpperEstimate Harrow Redbridge Brent K & C Tower Hamlets Lambeth Barking & Dagenham London England

Examples of data use Supporting indicators Model-based estimates of healthy lifestyle behaviours can be used in conjunction with other data sources to build up an area profile. E.g. the 2007 IMD, Council Tax data, Urban/Rural classifications, HES, ONS area classifications, and commercial geodemographic classifications such as ACORN. Both LA and PCO estimates should also be viewed in relation to the direct estimates derived from the HSfE data over the same time period. Eg. Health profiles, NWPHO alcohol profiles

Limitations of the estimates Care must be taken in interpreting the ranking of the model based estimates. E.g. the confidence interval around the highest ranked MSOA suggests that the estimate lies among the group of MSOAs with the highest income levels rather than being the MSOA with the highest average income.

Source: Neighbourhood Statistics Using Modelled based estimates Maps The model-based MSOA- level estimates of average weekly household income can be displayed on maps to show broad trends.

Exercise B Would using model-based estimates be appropriate in the following situations: 1. In setting a baseline for a target for the Local area agreement? 2. Predicting need and planning for service provision? 3. Monitoring change over time? 4. Creating a profile of an area?

Synthetic estimates Use a statistical model to express the relationship between individual healthy lifestyle behaviour and area-level information Outputs from that model used to generate a model-based estimate for all areas Estimates represent the expected prevalence for an area based on its population characteristics So cannot be used to monitor local interventions

Fuller technical description of the methodology See the project reports on the NHS Information Centre website: collections/population-and- geography/neighbourhoodstatistics/neighbo urhood-statistics:-healthy-lifestyle- behaviours:-model-based-estimates