Presentation on theme: "High Resolution studies"— Presentation transcript:
1High Resolution studies Sampling andpower analysis in theHigh Resolution studiesPamela MinicozziDescriptive Studies and Health Planning Unit,Department of Preventive and Predictive Medicine,Fondazione IRCCS Istituto Nazionale dei Tumori, Milan
2High Resolution studies collected detailed datafrom patients’ clinical records, so that the influenceof non-routinely collected factors(tumour molecular characteristics, diagnostic investigations, treatment, relapse)on survival and differences in standard carecould be analysed
3Problem Solution In each country, the population of incident cases for a particular cancer consists of N subjectsN is large (so, rare cancers are not considered here)Since N is large, not all cases can be investigateduse a representative sample to derive valid conclusionsthat are applicable to the entire original populationSolution
4Two questions What kind of probability sampling should we use? What sample size should we use?
6Previous High Resolution studies Samples were representative of1-year incidencea time interval (e.g. 6 months) within the study period, provided that incidence was completean administratively defined area covered by cancer registration
7Present High Resolution studies Main types of probability sampling We want to eliminate variations intypes of sampling between countriesand within a single countryThis implies moresophisticated samplingMain types of probability sampling
8Simple random sampling assign a unique number to each element of the study populationdetermine the sample sizerandomly select the population elements usinga table of random numbersa list of numbers generated randomly by a computerAdvantage: auxiliary information on subjects is not requiredDisadvantage: - if subgroups of the population are of particularinterest, they may not be included in sufficientnumbers in the sample
9Stratified samplingidentify stratification variable(s) and determine the number ofstrata to be used (e.g. day and month of birth, year of diagnosis, cancer registry, etc.)divide the population into strata and determine the sample size of eachstratumrandomly select the population elements in each stratumAdvantage: a more representative sample is obtainedDisadvantage: - requires information on the proportion of the totalpopulation belonging to each stratum
10Systematic samplingdetermine the sample size (n); thus the sampling interval “i” is n/Nrandomly select a number “r” from 1 to “i”select all the other subjects in the following positions:r, r+ i, r+ 2*i, etc, until the sample is exhaustedAdvantage: eliminate the possibility of autocorrelationDisadvantage: - only the first element is selected on a probabilitybasis pseudo-random sampling
11many subjects do we need? Howmany subjects do we need?
12Hypothesis test and significance level The main elementsStatistical powerHypothesis test and significance levelPrevious pilot studyto determine the minimum sample size required to get a significant result (or to detect a meaningful effect)the probability that thedifference will be detected (e.g. 80%, 90%)the probability that a positive finding is due to chance alone (e.g. 1%, 5%)Previous pilotstudiesthey explored whether somevariables can be measuredwith sufficient precision(or available) and checkedthe study vision
13Previous High Resolution studies Number of patients was defined based on:observed differences in survival and risk of deathincidence of the cancer under studydifficulties in collecting clinical informationavailable economic resourcesNotwithstanding that ...we were able to identify statistically significant relative excess risks of deathup to 1.60 among European countriesup to 1.40 among Italian areasfor breast cancer for which differences in survival are small. Applicable to other cancers for which survival differences are larger
14Example for breast cancer (diagnosis 95-99) Plot power as a function of hazard ratio for a 5% two-sided log-rank testwith 80% power over sample sizes ranging from 100 and 1000Assume 75% survival as reference (the overall survival in Europe, range: 65-90%)45%
15Example for colorectal cancer (diagnosis 95-99) Plot power as a function of hazard ratio for a 5% two-sided log-rank testwith 80% power over sample sizes ranging from 100 and 1000Assume 50% survival as reference (the overall survival in Europe, range: 30-70%)32%
16Example for lung cancer (diagnosis 95-99) Plot power as a function of hazard ratio for a 5% two-sided log-rank testwith 80% power over sample sizes ranging from 100 and 1000Assume 10% survival as reference (the overall survival in Europe, range: 5-20%)30%
17Present High Resolution studies We want to analyse both differences in survival andadherence to standard carePower analysis for bothlogistic regression analysis(to analyse the odds of receiving one type of care (typically standard care))and relative survival analysis(to analyse differences in relative survival and relative excess risks of death)
18Conclusions Taking into account existing samplings and power methodologyexperience from previous studiesdifferent coverage of Cancer Registriesavailable economic resourcesWe want tostandardize the selection of datainclude a minimum number of cases that satisfies statisticalconsiderations related to all aims of our studiesProf. JS Long1 (Regression Models for Categorical and Limited Dependent,1997) suggests that sample sizes of less than100 cases should be avoided and that 500observations should be adequate for almost anysituation.1Professor of Sociology and Statistics at Indiana University
19Thank youfor your attentionAnd…What aboutyour experience?