# Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics.

## Presentation on theme: "Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics."— Presentation transcript:

Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics Canada ICES III June 2007 Optimal Coordination of Samples in Business Surveys

2 OUTLINE OF THE PRESENTATION: 1.Coordinated sampling 2.Optimal Sample Coordination 2.1 Transportation Problem 2.2 Reduced Transportation Problem 2.3 Variability of the Overlap 3.Example 1: NWCR method for negative coordination of two surveys. 4.Example 2: Reduced TP for positive coordination after re- stratification. 5.Conclusion

3 1. COORDINATED SAMPLING Needed when multiple sample surveys of overlapping populations are conducted. Encompasses many different techniques to control the overlap of samples = number of common units. higher overlap (positive coordination) Objective: lower overlap (negative coordination) than if samples are selected independently. References: Ernst (1999), ICES II (2000), etc.

4 1. COORDINATED SAMPLING First Survey: S = set of all possible samples s (marginal) prob. distribution on S Second Survey: S = set of all possible samples s (marginal) prob. distribution on S Integrated surveys: joint prob. distribution s. t. and

5 1. COORDINATED SAMPLING Overlap of s and s = number of units that s and s have in common Expected sample overlap (1) Survey are positively coordinated if

6 2. OPTIMAL SAMPLE COORDINATION 2.1 Transportation Problem We integrate two surveys so that the expected overlap is maximized (minimized): Find max (min) of (1) over all (2) subject to (3) objective function unknown constraints

7 2. OPTIMAL SAMPLE COORDINATION 2.1 Transportation Problem s1s1 s2s2 s3s3 … sLsL p(s) s1s1 … p(s 1 ) s2s2 … p(s 2 ) s3s3 … p(s 3 ) … …………… … sKsK … p(s K ) p(s)p(s 1 )p(s 2 )p(s 3 )…p(s L )1 s s o(s 1,s 1 )o(s 1,s 2 ) o(s 2,s 1 ) o(s 1,s 3 )o(s 1,s L ) o(s 3,s 1 ) o(s 2,s 2 )o(s 2,s 3 )o(s 2,s L ) o(s 3,s 2 ) o(s K,s L ) o(s 3,s L )o(s 3,s 3 ) o(s K,s 1 )o(s K,s 2 )o(s K,s 3 ) X 1 1 X 12 X 1 3 X 1 L X 2 1 X 2 2 X 2 3 X 2L X 3 1 X 3 3 X 3 L X 3 2 X K 1 X K 3 X K 2 X K L

8 2. OPTIMAL SAMPLE COORDINATION 2.1 Transportation Problem TP is too large, too many variables! Example: First survey selects SRSWOR of n = 20 from N = 40. = 137,846,528,820 BUT, for stratified SRSWOR designs, we can reduce TP by grouping samples! Condition: The matrix of o(s, s) within each group must be symmetric. We use a two-stage procedure.

9 2. OPTIMAL SAMPLE COORDINATION 2.2 Reduced Transportation Problem Notation: P frame for Survey 1, P frame for Survey 2, C = P P c = c(s) = number of units in C s Solution - Stage 1: Group samples s super-rows c Group samples s super-columns c Form a matrix of blocks (c, c), define block optimum o(c, c) Solve the reduced TP joint probabilities p(c, c) Solution - Stage 2: Distribute p(c, c) evenly among the pairs (s, s) that have the optimum overlap – each row s within the block gets the same probability – each column s within the block gets the same probability

10 2. OPTIMAL SAMPLE COORDINATION 2.2 Reduced Transportation Problem Matrix of o(s, s) within a block.

11 2. OPTIMAL SAMPLE COORDINATION 2.2 Reduced Transportation Problem Example 1: Survey 1: N =40, SRSWOR n =20 Survey 2: N=41, SRSWOR n=20 C=37 D=3 B=4 c = 17, 18, 19, 20 4 super-rows c = 16, 17, 18, 19, 20 5 super-columns Reduced TP has only 4 x 5 = 20 unknowns. Constraints:

12 2. OPTIMAL SAMPLE COORDINATION 2.3 Variability of the Overlap Optimal coordination maximizes (minimizes) In practice, one pair of samples (s, s) is selected its overlap o(s, s) should be close to ! TP can be used in 2 steps: –Step 1: as described on Slide 6 –Step 2: - Use from Step 1 as an additional constraint -New objective function: For example, find the minimum of (4)

13 3. Example 1 NWCR method for negative coordination of two surveys. Survey 1: N =40, SRSWOR n =20 Survey 2: N=41, SRSWOR n=20 D=3, C=37, B=4 Minimize. Stage 1 – Solve the Reduced TP: Group samples s into super-rows and s into super-columns. Order super-rows by ascending c and super-columns by descending c, form a matrix of blocks. Block optimum o(c, c) = max{0, c+c–C} = smallest possible overlap o(s, s) within (c, c). Use NWCR algorithm to obtain a solution. Stage 2 - Determine p(s, s) for each pair (s, s): Distribute p(c, c) equally among all pairs (s, s) within the block that have o(s, s) = o(c, c).

14 3. Example 1 NWCR method for negative coordination of two surveys. Table 1a: Reduced TP, p(c, c) assigned by NWCR c p(c) c 2019181716 17 0 0.0591 0 0.0563 0 0 0 0 0 0 0.1154 18 1 0 0 0.2064 0 0.1782 0 0 0 0 0.3846 19 2 0 1 0 0 0.2158 0 0.1689 0 0 0.3846 20 3 0 2 0 1 0 0 0.0675 0 0.0478 0.1154 p(c)0.05910.26270.39400.23640.04781.0000 o(c, c) p(c, c)

15 3. Example 1 NWCR method for negative coordination of two surveys. Stage 2 - Distribution of probabilities within blocks Consider (c=17, c=20) with o(c, c)=0: there are = 15,905,368,710 different samples (rows) s there are = 15,905,368,710 different samples (columns) s The matrix of overlaps o(s, s) is symmetric: For each sample s, there is exactly one sample s such that o(s, s)=0. Each sample s will get probability of

16 3. Example 1 NWCR method for negative coordination of two surveys. Theorem: (a)The joint density X NWCR obtained by the NWCR method for negative coordination satisfies the constraints given in (3). (b) X NWCR has the minimum expected overlap within the set of joint densities that satisfy (3). (c) X NWCR has the minimum variance within this set of joint densities. Proof in Mach, Reiss, Şchiopu-Kratina (2006).

17 3. Example 1 NWCR method for negative coordination of two surveys. Simultaneous Selection i)Select one block using the joint probabilities p(c, c) in Table 1a. ii)To draw samples s and s, randomly select units from each set: C = common units, D = deaths, B = births. Suppose block (19, 18) selected in i). To select s, randomly select 19 units from 37 in C, and 1 unit from 3 in D. To select s, take the remaining 37-19=18 units from C, and randomly select two units from 4 in B. Sequential Selection (s drawn first) i)Select one block from the super-row c(s) using the conditional probabilities p{(c, c)| c(s)} corresponding to the joint probabilities in Table 1b. ii) Randomly select units from C and B sets to form s.

18 3. Example 1 NWCR method for negative coordination of two surveys. Deaths (D=3) Common Units (C=37) Births (B=4) ss n = 20 o (s, s ) = 0c = 18n = 20c = 19

19 3. Example 1 NWCR method for negative coordination of two surveys. 2019181716 p(c)p(c) 1700.008300.033600.042600.023900.00430.1127 1810.022500.102200.157400.089000.01600.3871 1920.021010.098700.150800.099300.01820.3880 2030.005220.025110.042600.031900.00740.1122 p(c)p(c) 0.05700.25960.39340.24410.04591.0000 Table 1b: Empirical block probabilities for Sequential SRSWOR (PRN) E [o(s, s)]V [o(s, s)] NWCR00 PRN0.27160.3212 Table 1c: Expectations

20 4. Example 2 Reduced TP for positive coordination after re-stratification. C 1 : C 1 = 2 New stratum: N =15 n = 5 C 2 : C 2 = 3 C 3 : C 3 = 10 Old stratum 1: N 1 =20 n 1 =10 Old stratum 2: N 2 = 6 n 2 = 3 Old stratum 3: N 3 =10 n 3 = 2 Objective: Maximize.

21 4. Example 2 Reduced TP for positive coordination after re-stratification. Super-rows: 3 x 4 x 1 = 12 super-rows Super-columns: (0, 0, 5), (0, 1, 4), (0, 2, 3), (0, 3, 2), (1, 0, 4), (1, 1, 3), (1, 2, 2), (1, 3, 1), (2, 0, 2), (2, 1, 2), (2, 2, 1), (2, 3, 0). 12 super-columns Reduced TP has 12 x 12 = 144 unknowns. Constraints: Product of hypergeometric probabilities Multihypergeometric probabilities

22 4. Example 2 Reduced TP for positive coordination after re-stratification. c 1,2,22,1,20,3,2…0,0,5p(c) 2,3,2 5 0 5 0.0115 5 0 …2 0 0.0118 2,2,2 5 0 5 0.0301 4 0 …2 0 0.1066 1,3,2 5 0 4 0 5 0.0031 …2 0 0.0263 ………………… 0,0,2 2 0 2 0 2 0 …2 0.0118 p(c) 0.08990.04500.0150 … 0.08391.0000 c Table 2a: Block overlap and probabilities p(c,c) (TP solution) o(c, c) = min(c 1,c 1 ) + min(c 2,c 2 ) + min(c 3,c 3 ) E TP [o(s, s)] = 3.6494 V TP [o(s, s)] = 0.7292

23 4. Example 2 Reduced TP for positive coordination after re-stratification. Sequential selection: Suppose c = (2,3,2) with p(c)=0.01184 c2,1,22,3,0 Σ p(c)p(c)0.011510.000330.01184 p{c |c=(2,3,2)}0.972130.027871 E TP {o |c=(2,3,2)} = 5 V TP {o |c=(2,3,2)} = 0 i) Select super-column c using p{c |c=(2,3,2)}. ii)Suppose c = (2,1,2) selected. Randomly de-select 2 units from s C 2 to form s. Table 2b: Probabilities for c = (2,3,2)

24 4. Example 2 Reduced TP for positive coordination after re-stratification. Is the matrix of overlaps o(s, s), within a block, is symmetric? Consider block {c =(2,3,2), c =(2,1,2)} with o(c, c)=5: = 43,758 x 1 x 45 different samples (rows) s = 1 x 3 x 45 different samples (columns) s For each s, there are exactly 3 samples s such that o(s, s)=5. For each s, there are exactly 43,758 samples s such that o(s, s)=5. Each s will get probability of

25 4. Example 2 Reduced TP for positive coordination after re-stratification. 43,758 rows 16 s 28 s Table 2c: Matrix of o(s, s); block {c =(2,3,2), c =(2,1,2)}

26 4. Example 2 Reduced TP for positive coordination after re-stratification. c 1,2,22,1,20,3,2…0,0,5p(c) 2,3,2 5 0.0022 5 0.0015 5 0.0007 …2 0.0002 0.0124 2,2,2 5 0.0160 5 0.0173 4 0.0006 …2 0.0022 0.1067 1,3,2 5 0.0055 4 0.0001 5 0.0025 …2 0.0007 0.0254 ………………… 0,0,2 2 0.0001 2 0 2 0 …2 0.0069 0.0116 p(c) 0.08970.04530.0153 … 0.08471.0000 Table 2d: Empirical block probabilities for Sequential SRSWOR (PRN) c E [o(s, s)]V [o(s, s)]E{o |c=(2,3,2)}V{o |c=(2,3,2)} TP3.64940.729250 PRN3.56020.69404.32820.5746 Table 2e: Expectations

5. CONCLUSION Optimal sample coordination is a TP. For stratified SRSWOR, we can reduce TP by grouping samples. The groups must be formed so that the matrix of o(s, s) within each group is symmetric. The solution and the selection is done in two stages. Different objective functions can be defined, depending on the goal of the sample coordination project.