Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evolution of the art of keeping Records and Development of Total Survey Design with application to some projects By: Pulakesh Maiti Indian Statistical.

Similar presentations


Presentation on theme: "Evolution of the art of keeping Records and Development of Total Survey Design with application to some projects By: Pulakesh Maiti Indian Statistical."— Presentation transcript:

1 Evolution of the art of keeping Records and Development of Total Survey Design with application to some projects By: Pulakesh Maiti Indian Statistical Institute

2 Summary. While statistics have been collected and used in this subcontinent from antiquity, much changes in collection and use took place during the British Period (1757 – 1947) in Indian History. Some of the changes were due to imperial needs, but much of it took place indirectly as a result of western education and a spirit of scientific curiosity and experimentation. Interest in rapid social, economic and technical development added a new dimension after India’s independence in 1947. We keep a track of the evolution of data collection. Some discussions are made on the acceptance of random sampling with its basic principles and the necessary activities in its domain. Finally, Total Survey design has been developed and deployed to some projects undertaken in India and abroad.

3 Two highlights from the pre British period. In the Arthasastra by Kautilya (321 – 296 B.C.), which literally means a treatise on economics, one gets an account of data collection. “ It is the duty of Gopa, village accountant, to attend the accounts five or ten villages, as ordered by the collector general……..Also, having numbered the houses as tax paying or non-tax paying, he shall not only register the total number of inhabitants of all the four castes in each village, but also keep an account of the exact number of cultivators, cowherds, merchants, artisans, labourers, slaves and biped and quadruped animals, fixing at the same time the amount of gold, free labour, toll and fines that can be collected from it (each household)”. (Shamsastry, 1929, p 158 ).

4 “The basic unit for recording information pertaining to agriculturists and the produce was the village. Ascertainment of the extent of the soil in cultivation and weighing several portion of personal observation was made through the superintendent to the survey, the Bitkchi, the Patwaris who were being appointed at the village level”. The above period pertains to that of Akbar, the Great MoghulEmperor(1556A.D.to1605A.D.).

5 In many Countries, especially in European Countries too, evolved the mechanism of data collection. As early as in 1662 AD, Graunt published his work on social statistics based on the data collected in an arbitrary or haphazard manner. However, such practice was not well organized. It is only after the industrial renaissance in Europe, the necessity for such enquires in depth and breadth also increased and collection of data in the from of complete enumeration on various social, economic, demographic and biological characteristics came into practice in the 19 th century through organized bodies. Such a practice came into existence in India too, when the British Government started census operation around 1881.

6 The following is a summary picture of the current statistical system in India developed through the early and later British period and the period after independence.

7 Office of the Registrar General and census commissioner; Home Ministry is responsible for conducting the decennial population data, birth and death statistics, calculation of birth, death and other demographic rates; Department of commercial intelligence and statistics, Ministry of Finance looks after statistics on foreign trade and business; Reserve Bank of India, Ministry of Finance, looks after foreign trade, monetary flow, interest rates etc.; Directorate of Economics and Statistics, Ministry of Food and Agriculture is responsible for compiling and publishing agricultural statistics such as crop production, crop fore costs, fisheries, live stock on all India basis; Labour Bureau, Ministry of Labour, prepares consumer price index number;

8 Office of Economic Adviser (OEA), Ministry of Industry on a weekly basis, based on price quotations compiled by official as well as some non-official agencies in respect of 435 selected items and commodities identified in the basket of index; Central Bureau of Health Intelligence (CBHI), State Welfare Bureau, ICMR, Ministry of Health and Family Welfare records different aspects of public health and family welfare. The system producing health statistics is totally decentralized and still relatively week even by Indian Standard on incidence or prevalence of major diseases at the national level; Newly Created Ministry of Environment and CSO have been bringing about handbook on environmental statistics.

9 The rapid growth of interest in “sampling methods” and the conclusions made possibly started after Kiaer (1895) who introduced the concept of random sampling to replace the usual approach of complete enumeration and emphasize the value of a representative sample. A representative sample is defined as a photograph, who reproduces details of the original in its true relative proportions. Bowley A.L. (1906) discussed about the use of a random sample. The works of Bowley A.L. (1926) and Neyman.J (1934) may be said to have laid the foundation f modern sampling theory.

10 India witnessed the advent of large scale sample surveys under the guidance of Late Professor P.C. Mahalanobis. The National Sample Survey (NSS) was created in 1950 as a multifaced fact finding body. The Department of statistics (DOS) was set up in the Cabinate Secretariate in April 1961 and during the same period CSO and NSS were under the full fledged department of statistics (DOS). In the month of February, 1999, Department of Statistics and Programme Implementation were merged and named as ‘Department of Statistics and Programme Implementation’ in the ministry of planning and implementation. Finally by October 1999, Department of Statistics and Programme implementation were declared as ‘ Ministry of Statistics and Programme Implementation’.

11 Responsibility of collection or co- ordination of data fell on NSSO and CSO. Since then NSSO is continuing to contribute to National data base, whereas CSO is mainly playing the role of dealing conceptualization and standardization of different concepts and definitions.

12 Classification of Available Data Sources Data at present are obtained mainly through The government organization set up; Different line departments of the government; Academic research institutes / universities. The first two may be defined as official data, whereas the third one may be termed as academic statistic.

13 Academic statistics are mainly generated from different research projects undertaken by different research institutions/universities,while making investigations on methodological issues; development of probability/non probability samples understanding the nature and extent of errors and their effects on survey results.

14 It may be noted that, much attention has been paid by the survey theoreticians to measure the extent of sampling error(a part of total errors) and to control through properly adopted sampling designs, choice of appropriate estimators, but, so far survey design has received little importance in theory and practice of survey sampling

15 Schematic Diagram 1

16 Schematic Diagram 1 (Contd.)

17 Each tool involves estimation and estimators that differ in mathematical complexity. One needs to examine at this stage also, if relatively simple descriptive estimators such as totals, means and ratios may be used or more complex relationship measures such as regression or correlation coefficient may be used in exploratory analysis, whose primary concern is to make the characteristics of the population being studied more understandable. It is also necessary to plan, if some of those tools may also be used in confirmatory analysis, when the objective is to test statistical models or assumptions indicated by exploratory work; It is also necessary to decide on the type of activities to be conducted in the face of non-sampling errors and on the statistical tests to be used for measuring total error.

18 Schematic diagram 2

19 Schematic diagram 2 (Contd.)

20 The normal practice adopted so far in survey sampling is to take the decisions on the choice of a sampling frame; a sampling design; a questionnaire design; sample size; sample weights; without much considerations to survey design. In the next few slides,we discusses some issues relating to the above topics.

21 A Sampling Frame: In some situations one may have number of frames. For example, for the study of health status of workers, one of the frames could be (i) list of work places, (ii) visiting their homes, and (iii) telephoning them at home. For any specific illness, physician’s offices may also be visited. Thus, candidates for a sampling error could include list of areas, telephone numbers, business establishments, physicians or hospitals. The frame chosen will affect the quality of survey results. When deciding to adopt a particular frame, one would need to consider the errors that would be introduced as a result of this choice.

22 A Sampling Design: The type of frame chosen will influence the type of sample design that can be used and will influence the efficiency of the potential design; In case of non availability of a frame, sampling design adopted will be different from the one, where more than one frames, when jointly used cover the entire population, are used. Normally, stratified multistage sampling design is adopted in practice. In the face of having intermediate reference units as sampling units, sampling design would be different.

23 A Questionnaire Design: After defining the concepts, definitions to be used and choosing the sampling design, a detailed list and description of the survey variables with the units of measurement is prepared in consultation with the subject specialists, before they are presented in a most efficient way as a data gathering instrument. Sometimes, the variables to be measured have to be translated into operational/workable definitions and expressed in the form of a logical series of questions, which the interviewer can ask and the interviewee comprehend and answer. They should be designed in such a way that they (i) enable the collection of actual information, (ii) facilitate the work of data collection, collation, processing and tabulation, (iii) ensure economy in data collection (iv) permit comprehensive and meaningful analysis and purposeful utilization of captured data. The refinement of the general data requirement of any survey into the precise, questions is a step-by-step process. Just as development of a complex design is. There should be spaces indicating “confidentiality”, the identity of the agency and the hierarchical identity of the respondent.

24 Sample size: Basic approaches for single item(s) of enquiry based on SRS designs depend only on the precision measured in terms of (i) margin of error, (ii) coefficient of variation, (iii) cost concepts alone and (iv) also considering both precision and cost. These approaches are applied to different commonly used such designs as unstratified sampling, stratified sampling, cluster sampling and multistage sampling. The statistical tool used for determination of sample size is, for Qualitative Characteristic and, for Quantitative Characteristic where,, ) are sample statistics and population parameters; d is the margin of error and is the confidence coefficient attached to the statement that the sample statistic would be within + / - of d of the population parameter.

25 Considering precision of estimate only Situation I, Single Item: Qualitative Characteristic, under SRS, using the above probability statement, (with 95% confidence) nI=4.P(1-P)/d^2 As a thumb rule, under SRSWR, nI would be taken as nI=1/d^2 Under SRSWOR, nF would be taken as nF=nI[N/(N+nI)]

26 Situation II, Single Item

27 Situation III, Multiple Items

28 Situation IV: sample size for subdivision

29 Considering cost aspects only: There is no denying the fact that in most surveys the cost aspect is of primary concern. An overall budget is contemplated and various cost components are envisaged. This again depends on the set up and the survey problems at hand

30 Situation V:One stage sampling

31 Situation VI

32 Cluster Sampling

33 Two-Stage Sampling

34 Optimality Criteria

35 Computation of Sample Weights: Two qualities of each respondent are identified under the fixed population view. One is structural identity indicating which part of the stratum structure (stratum, primary, secondary sampling unit) the person came from and the other is sampling weight indicating the relative likelihood of being selected and responded in the survey. A sampling weight is calculated as the reciprocal of each respondent’s original probability of selection provided there is no non response. In the case of non response, the above weight has to be revised by multiplying by the inverse of response probabilities.

36 Type of estimator(s) used for Estimating total of a Character y

37 Survey Design: Survey design is the design for allocation of the jobs to the investigators and supervisors engaged as members of survey personnel group; It helps one estimate measurement variance and Survey Design has to be determined at the planning stage. Survey design is essential for separating the sources of variation.

38 Analysis is carried out in standard practice with the assumptions that, (1) there is no problem with the frame, (2) no problem of non-response and, (3)no problem of measurement error, The only error arising is due to sampling error and for that s.e. of the estimator is calculated.

39 WHAT HAPPENS WHEN THE ABOVE ASSUMPTIONS ARE VIOLATED? (i.e.there are frame errors,true values are not reported and data set are incomplete).

40 Survey Errors: Survey errors can be classified as sampling error and non-sampling errors by type and within each category, errors are classified as variable errors and biases by nature. Variable errors and biases can arise form sampling and/or non-sampling operations. This double dichotomy gives rise to a four fold classification of errors.

41

42 Many potential sources of errors can be found in each of these classes, since every operation is a potential source of variable errors and biases. Different biases can be considered as a set of constants determined by the essential survey conditions, although their values remain largely unknown Biases represent the difference between expected sample value and true value, whereas, variable errors measure the source of difference between the estimate and its expected value. Variable sources would fluctuate, if we are to select different samples with the same design. Most biases can not be reduced by increasing the size of the sample, but only by improving the quality of operation. Contrariwise the reduction in variable errors depends on the number of units of some kind.

43 Variable errors can be measured by noting internal replications of the units within the sample. Measurement requires the replication of units, whether sampling units or observations by proper survey design to separate sources of variation. Measurement of biases essentially depends on a different method external to the survey proper.

44 Non-sampling errors are often thought of as being due entirely to mistakes and deficiencies entered during planning, execution and processing stages of the survey operation. Non-sampling errors are defined as a residual category. Thus, one can have non-sampling errors arising from (1)deficiencies in the problem formulation leading to wrongly conceived concepts, definitions and inability to arrive at the workable definition; (2)imperfections in the frame leading to an inappropriate sampling design and wrong population being studied; poor construction and/or inadequacies of the frame; (3)imperfections in the questionnaire design; (4)inappropriate choice of reference period; imperfections in the tabulation plan;

45 (5)inability in collecting information from all items and all respondents; (6)inaccurate survey design; (7)mistakes in recording information; (8)variability in responses; (9)illogical /unrepresented data; (10)errors in interpretation.

46 Many such factors can cause a disagreement between survey results and true population value. As one might expect, even the notion of true population value sometimes appears to be controversial. In some situations, the notion of an absolute standard for comparison is a fundamental element of the conceptual frame work, in other situations one may be satisfied with a purely operational view of reality, where measurements are simply defined as a product of a specified data collection procedure. Absolute standard of truth plays no role in the purely operational work.

47 Some illustrative examples of sources of non- sampling errors encountered in real life problems: Every activity outlined at different stages of the schematic diagram 1 may be subject to errors, if proper measure is not taken at that stage. Started with the definition, errors may occur and end up till the completion of the study. For illustrative purposes, we mention few examples of possible errors likely to occur at each stage of some of the projects undertaken, had there been no measure taken,through the display of next few slides.

48 Workable definition: The project “Domestic Tourists in Orissa (1988-89), needed redefinition of a “tourist” with respect to the objective of the study. Among others, enquiries were also directed to finding availability of existing infrastructure facilities in terms of accommodation, transport (road, rail, air), medicine and other aspects. Normally, a tourist by definition is a person who visits places of historical monuments, pilgrimages etc. According to objective of the study, any person, for any reason whatsoever, requiring accommodation to spend at least one night should be considered as a tourist, and hence became a member of the target population. Therefore, the usual definition of a tourist became unusable and was defined according to the objective of the study. Otherwise, target population considered could have been under coverage.

49 Frame Problems: That an imperfect frame may lead to coverage errors was observed both in the study of health status of workers and the study through IPPVIII Project (1998); In the former study of health workers, errors due to coverage problem were likely to occur, If an area frame were sued, which would cover all workers, but would also include large workers; If a telephone frame were used, which would not cover workers without telephones, and would also large number of workers; If business establishments were used, which would contain large concentration of workers. However, it might be extremely difficult to construct a complete list of elements; However, if medical records are used, it became easy to identify persons who had the disease.

50 The Indian Population Project (IPP-VIII 1998): undertaken at the Indian Statistical Institute was meant for studying different facts of IPP-VIII. One important component was to assess the impact of the project on the beneficiaries. The lower income group formed the beneficiary group. While listing the beneficiaries in an area, many non- beneficiaries were included causing over coverage. It was also observed in the project: “Cost Benefit Analysis of Rural Electrification (1975-76): that investigators employed as piece rate workers appeared to list more households, which were not within the village boundaries. Later this was seen through maps and other available relevant materials.

51 In the absence of availability of a frame in the study of the project entitled identification of other backward classes (1994-95), a frame for urban population was generated through a sample drawn from a rural population. The method developed and deployed generated the frame with coverage error. The project entitled “Evaluation of total literacy Programe” (23.06.92-07.07.92): in the district of North 24 Parganas, undertaken at the Indian Statistical Institute aimed at evaluating the Programe of total Literacy Campaign (TLC) in terms of literacy rates and some other parameters. The learning centers with identity parameters formed the frame under study.

52 When target population appears to be either too mobile in nature or unidentifiable to be listed down, even an imperfect frame was used knowing consciously that it will lead to errors of over coverage or under coverage. This mistakes were unavoidable; but necessary adjustment on the survey results was made. [Stanza-Bopape Project (1995-96) Calcutta Urban Poverty Survey (1977-78)].

53 For some kinds of population, traditional finite population sampling is not feasible because of the following reasons. To mention a few. The population size may not be known, and an exhaustive list of target population may not be readily available. Instead, one can locate and identify a set of distinct reference units forming what we may call a reference population. These reference units are used as the list of sampling units; Sampling from these reference units and en scanning the sample, a sample of population units is obtained following a specified linking rule; Use of intermediate reference units as frame units create the problem of multiplicity of the population unit.; Concept of Generaliszd Horvitz-Thompson Estimator; This situation creates unavoidable mistakes due to a frame, knowing consciously that it will create multiplicity problem; but adjustment on the survey estimates can be made [NSS-Slum Survey June 2012, to December 2012].

54 Questionnaire Design: Errors are likely to occur from Inappropriate ordering/spacing: Inappropriate order of placement not only generates biased information, but partial non-responses may occur for the questions following the sensitive and/or logically not properly placed questions. For instance “Questions on which area hospital the respondent prefers” should not be asked before questions mentioning the name of specific area hospitals. A logical sequence should be mentioned; otherwise responses to the preference questions would be biased in favour of those hospitals whose names have been mentioned [CMDA-Survey (1975-76)]

55 Inability in understanding the question: In the personal interview, a question on “satisfaction with respect to health care” might fail to make it clear to the respondent, which aspect of health care is being addressed; accessibility, cost or quality [CMDA (1975- 76 ); IPP VIII (1988)]; Respondents may not also understand questions that use technical jargon or unfamiliar words. Presenting more questions at a time. Some question might ask more than one questions “Do you plan to quit your job and find another next year”? The respondent may feel compelled to answer rather than to admit ignorance; such questions should be avoided

56 In appropriate choice of word: certain words or phrases in the questions can influence answer. For example, “would you agree that” could influence some respondents to answer in the affirmation [identification of other backward classes (1994-95 )]. Other characteristics of questions may taint the respondent’s answer. Use of inflammatory words, links to the status quo (for example most guardians think that) and suggestion of hypothetical circumstances should be avoided[(1994-95)]. Length of schedule/questionnaire: If the questionnaire is too long, the respondent may loose interest and end participation prematurely. Even, if he continues to do till the end, quality of the data may be diluted. It is true, one would need at the same time additional information for consistency check; but there should be a balance between the two.

57 Choice of the reference period: There should be varying reference periods instead of fixed reference period for all information [NSS]. Sequence of the questions: Starting with complicated or sensitive questions that are difficult to answer may cause the respondent to feel inadequate or uncomfortable. Starting simple and innocuous, but interesting questions, on the other hand, tends to put the respondent at case and create harmonius feelings towards the survey topic.

58 Interviewers’ inabilities: interviewer’ deficiencies (poor interviewing techniques, misunderstanding of concepts, misinterpretation of response, wrong arithmetic etc., his gender, employment status, ability to create rapport etc.) may create problems in data collection. Field condition: difficulty in implementing a random sample due to a peculiar field condition [REC ( 1975-76 )] may arise. Respondent’s inability: respondents’ failures in interpretation of the question, inability to provide answers and deliberate or inadvertent supply of wrong information etc., and also their preferences for some members may create some problem [ 1976 fertility survey in Indonesia, Dasgupta and Mitra (1958)]; Choice of the respondent: inappropriate respondent rule does not help choose a respondent [Tuigan and Cavdar (1975)]; Social stigma: imposition of social stigma like those of female participation in workforce [Shah (1981)], taking alcohols etc., ; Purposeful reporting: purposeful reporting of certain information incorrectly, such as women do not like to disclose their ages;

59 Sampling Design: for a stratified random sampling, choice of the stratification variable must be correlated with the study variable; the administrative zones should not always form the different strata. As illustrative examples, the following instances are cited: Time of start of functioning of the projects under evaluation; Age of the respondents (ISI Project (1997-98)]; Degree of affluence [Community life in Selected Communities in South Africa (1995)]; Population size [The socio-economic demographic and cultural pattern of the female labour force participation (1995-96)]; Different types of races [Community attitudes and preferences pertaining to country and cremation related issues in the East Rand in South Africa (1996-97)]; Degree of concentration [Domestic Tourists Survey (1988-89)]; Intensity of electrification [Rural Electric Corporation (1976-1977)]; Administrative Zones [ISI Project (23.06.92-07.07.92); ISI Project (1994-95); ISI Project (1997-1998a), ISI Project (1997-1998).

60 Above deficiencies on Survey Data lead to Incomplete Data: (1)Non response arises due to deficiencies at all the stages of the integrated, system-contrary to the general belief that it occurs only at the interactive process between a respondent and an investigator. An extended definition of non-response, particularly item non-response includes in which missing data arise, (2)From the processing of information provided by units rather than refusal of units. For example, editing procedures may eliminate some responses which are to be judged to be impossible and inconsistent with other findings; (3)Out of the problem of non contact due to inaccurate assessing information to reach a sampling unit for inadequacy in information in the frame; (4)Because of non-availability (temporary) of a respondent at home; (5)Because of non-coverage due to the frame problem; (6)Because of ill designing of the schedule which creates burden over the respondent; (7)For lack of solicitation to make respondents participate in the survey process; (8)Due to difficulties in contacting under natural calamities like floods/earthquakes and/or political disturbances. [see the Schematic Diagram 2].

61 Causes of item non-response: (1)Non-response rates are higher for sensitive items such as income etc., [Donald (1960)]; (2)Mode of interview is responsible for producing items non-response [1975-76]; (3)Higher items non-response rates arise on questions enquiring substantial thought or effort on the part of the respondent [Frances and Bush (1975), Craig and MC- Cann (1978)]; (4)Item non response is independent of questionnaire length [Craig and MC-Cann (1978)]; (5)A significant age and occupation has effect on item non-response [Messmer and Seymour (1982)]; (6)Questions appearing after a branching question has notably higher item non- response [Messmer and Seymour (1982)]; (7)Interviewers who were more impersonal have lower item non response rates than those of interviewers who had a more personable interviewing style [Rogers (1976)]; (8)Non response on some items will be higher for some subgroups (elderly, females, the less educated); (9)Interviewers who though it inappropriate to ask a sensitive question will have higher items non-response on the question [Bailer etall (1977)].

62 The previous discussions generate awareness on the existence of non sampling errors. Experiencing globally the existence of non-sampling errors of different types, the design to control total error of survey estimates considering all sources of error has come to be known as the Total Survey Design. The practice of total survey design should operate in a comprehensive and integrated fashion. Surveys need to be carefully planned with due considerations, given to all known sources of errors. Resources available for conducting the survey should be directed towards minimizing total error and not any single error component. During analysis, an analysis should be made to make an estimate of total error. Finally, in anticipation of future survey on similar types, estimates of the components of total error should be made so that they may be used in the planning phase of the survey population in future.

63 During the design phase of the survey, the practice of the total survey design involves assessing the level of error associated with alternative procedures on (i) sampling design, (ii) questionnaire design, (iii) survey design and choosing that combination of sampling design, measurement procedure, analysis method which will minimize the total error of the estimate within available resources. The success of total survey design methods at the planning stages depends on good information on costs and errors of alternative procedures, and on the availability of total error and total cost models that can be used for choosing an optimum design, optimality in the sense of minimizing total error. While the survey is being carried out, the practice of total survey design involves the use of quality control procedures that monitors progress of data collection.

64 The goal of quality control procedures is to detect errors when they occur or soon after, so that the survey work can replaced, if necessary ; Total survey design at the analysis and reporting stage entails attempting to calculate and report on the total error of the survey estimate by (1)use of suitably designed probability sampling procedures which would allow one to calculate accurate measures of sampling error; (2)introduction of experiment design into the survey process that allows determination of the magnitude of the effect of a particular error source on the total error of the estimate.

65 However, attempts to make use of total survey design may be hampered by such several problems as (1) introduction of additional complexities into the survey; (2) collection of extra data or inclusion of experimental methods to permit ; (3) estimation of impact of certain sources of non-sampling errors; need of effort and money that could be devoted at the estimates, once the data are available; (4)use of quality control procedures considering time and money that could be divided in to the primary activities of sampling and data collection.

66 Emperical Evidence in support of the previous statements obtained through a number of real life projects.

67 Development of The total Survey Design: Since the success of the total survey design depends on good information on costs, type of errors with some quantitative measures, a dress rehearsal through pilot studies was held to examine, (1)If, question wording may be confusing; (2)If, the forms may be difficult for an interviewer to administer; (3) if, procedures appearing to be more complicated and extensive for interviewers to complete on schedule; (4) if, selection of interviewers would be guided by gender or not; (5) if, selection of interviewers would be guided by subject specialist or (6) if, experienced household interviewers;

68

69

70

71

72

73

74

75

76

77

78

79 The previous Tables were critically examined to see, (1)if, the proposed approximately designed field experiment finally would be appropriate considering the magnitude of errors due to non-response as well as measurement error [Ref. Table 8, 9, 10, 11, 12, 13, 14, 15, 16]; (2)If, random sub sampling from the non-respondents or call back procedure should be adopted and in case of call back procedures what should be the number of attempts/call backs [Table 17]; (3)If, non-responses vary by ages [Ref. Table 18]; (4)If, non-responses vary by household size [Ref. Table 19]; (5)If, non-responses vary by type of dwelling units [Ref. Table 20]; (6)If,non-responses vary by region [Ref. Table 21];

80 On the basis of the information gathered through pilot studies, total survey design was developed and deployed in executiting the projects at the final stage and the following information were incorporated in designing the total Survey Design. It was observed that three attempts are good enough for completion of any project. This was revealed in the literature also [Ref. Table 22]; Since non-response rates appeared to be less, no effort was made in estimating response probabilities of responding units. Had the non-response rates been comparatively higher, weighting adjustment procedure could have been adopted in revising the initial designed based weights by multiplying the reciprocal of the estimated response probabilities, compensating for the error due to non-response.

81 On the Choice of Weighting Adjustment Cells: It hasbeen empirically observed through a number of real life surveys [Bennet and Hill (1964), Cobb, Kind and Chen (1957), Dunn and Howks (1966), Lubin, Levitt and Zuekerman (1962), Lundberg and Larsen (1949), Newman (1962), Ognibene (1970), Pan (1951), Reuss (1943), Skelten (1963), Warwick and Lininger (1975), Kendal and Buckland (1960), Sudman (1976), Suchman (1962), Birbaum and Sirken (1950), Deighton etall (1980), Politz and Siman (1949), Madow etall (1983), Gower (1979), Demio (1980), Kalsbeek and Lessler (1978), Lessler (1974. 1980), Roy (1976-77, 1977 – 78, 1988 – 89), Maiti (1994 – 1995, 1995 – 1996), Lyberg and Rapa Port (1979), Turner, etal (1970), Bergman etal (1978), etc.], that non- respondents differ with respect to the following characteristics. 1. Income class; 2. Household size; 3. Status of labour force; Ownership status of dwelling units; 5. Age; 6. Socio-economic groups; 7. Extent of coverage ; 8. Different recall periods; 9. Varying multiplicity size etc.

82 Any of the above variables may be used for defining weighing adjustment cells in estimating response probabilities. In case, post stratification method is used, Different cells or the same weighting adjustment cells may be used.

83 (1)A non-linear cost model alternative to existing linear model has been developed [Maiti, P. (2008)]; (2)A survey design model for measuring the measurement variance has been developed [Ref. Maiti, P. (2009)];

84 References: (1)Babbie, Earl R. (1973): Survey Research Methods. Belmont, CA, Wadsworth. (2)Backstorm, Charles H. and Gerald Hursh-cesar (1981). Survey Research, 2 nd edition, New York, Wiley. (3)Bailer Barbara A. (1979): Ratation Sampling Biases and their effects on estimates of changes, 43 rd session of the International Statistical Institute, Manila. (4)Bergman L.R. Honve R and Rappa, J. (1978): Who do some people refuse to participate interview surveys? Statistik Tidskrift. (5)Bowley, A.L. (1906): Address to the Economic and Statistics Section of the British Association for the Advancement of Science, York, 1906, J. Roy Statist. SOC. 69, 540-558 (6)----------(1926): Measurement of the Precision attained in Sampling. Bulletin of the International Statistical Institute, 22, 6-62. (7)Bennet, C.M. and Hill, R.E (1964): A companson of selected personality characteristics ofrespondents and non-respondents to a mailed Questionnaire. Journal of Educational Research, 58, No.4 178-180. (8)Birbaum, Z.W. and Monroe G. Sirken (1950): Bias due to non-availability III Sampling Survey. JASA, 45, 98-111. (9)Bailer, Barbara A. (1979): Rotation Sampling Biases and their effects on estimates of changes 43 rd session of the International Statistical Institute, Manita. (10)Bergman, L.R. Honve, R. and Rappa, 1. (1978): Why do some people refuse to participate interview surveys? Statistik Tidskrift. (11)Brooks, Camilla and Barbara Bailar (1978): An Error Profile Employment as Measured by current Population Survey Statistical Policy Working Paper 3, office Federal Statistical Policy and Standards. u.S. Department of Commerce. (12)Bandyopadhyay, S. Chaudhury, A., Ghosh, J.K. and Maiti, P. (1999): A Draft Proposal for an Enterprise Survey Scheme as a substitute for Economic Census. Indian Statistical Institute, Calcutta. (13)Cochran, W.G. (1977): Sampling Techniques, Wiley Eastern Limited, New Delhi, III edition.

85 (14)Cole, D (1956): Field Work in Sample Surveys of Household Income and Expenditure, Applied Statistics, Volume 5, 49-61. (15)Cobb, J.M., King S., and Chen, E. (1957): Differences between respondents and non- respondents in a morbidity survey involving clinical examination, Journal of Chronic Diseases, 6. (16)Chevry Gabriel (1949): Control of General Census by means of an area sampling method, JASA, 44,373-379. (17)Chapman, David D. and Rogers Charles, E.(1978): Census of Agriculture- Area Sample design and methodology. Proceedings of the American Statistical Association Section on Survey Research Methods, 141-147. (18)Craig, C. Samuel and John M. Mc Cam (1978): Item non-response in Mail Surveys: Extent and Correlates, Journal of Marketing Research, 15, 285 – 289. (19)Deming, W (1960): Sampling Design and Business Research, New York, Wiley. (20)---------------(1944): On Errors in Surveys" American Sociological Review, 9, 359- 369. (21)--------------(1950): Some Theory of Sampling, John Wiley and Sons, New York. (22)--------------(1953): On a Probability mechanism to attain an Economic Balance between in resultant error and the bias of non-response. JASA 48, 743-772. (23)Dalenius, Tore (1974): The Ends and Means of Total Survey Design; Stockholm, The University of Stockholm. (24)----------------(1957): Sampling in sweden Contribution to the Methods and Theories of Sample Survey Practice, Stockholm, Almquist and Wicksell. (25)-----------------(1962): Recent Advances in Sample Survey Theory and Methods, AMS, 33, 325-349. (26)-----------------(1977a): Bibliography of non-sampling errors III Surveys. l(A-G), International Statistical Review, 3, 71-89. (27)-----------------(1977b): Bibliography of non-sampling errors III Surveys II(A-Q), International Statistical Review, 45, 181-197. (28)------------------(1977c): Bibliography of non-sampling errors III Surveys, IIl(R-Z), International Statistical Review, 45, 313-317.

86 (29) Dasgupta, A and Mitra, S.N. (1958): A Technical Note on Age Grouping. The National Sample Survey No.12, New Delhi. (30) Dunn, J.P. and Hawkes, R (1966): Comparison of non-respondents and respondents in a Periodic Health Examination Program to a mailed questionnaire, American Journal of Public Health, 56, 230-236. (31) Demaio, T.Y.(1980): Refusals, who where and why? Public Opinion Quarterly 44. (32) Deigton, Richard E, James, R. Poland, Joel R Stubs and Robert D Tortora (1978): Glossary of Non- sampling Error Terms, An illustration of a semantic problem in statistic, Statistical policy working paper 4, Washington DC: U.S. Department of Commerce. (33) Donald, Marjorie N (1960): Implication of non-response for the interpretation of mail questionnaire data, Public opinion quarterly, 24, 99-114. (34) Erickson, W.A. (1967): "Optimal Sample Design with non-response", JASA, 62, 63- 78. (35) Emrich, Lawrence (1983): "Randomised Response Technique" In William G. Madow and Ingram olkin eds.. (36) Fellegi, Ivan P. (1963): The EIncomplete data in Sample Surveys; Volume 2, Theory and Bibliographies, New York, Academic, 73-80valuation of the Accuracy of Survey Results Some Canadian Experiences. International Statistical Review, 41, 1-14. (37)----------------(1964): Response Variance and its Estimation, JASA, 59, 1016-1041. (38) Fellegi, Ivan and Sunter, A.B. (1974): Balance between Different Sources of Survey Errors, Some Canadian Experiences, Sankhya, 36, Series C), 119-142. (39) Ferver, Rebert (1966): Items non-response in a consumer survey, Public Opinion Quarterly, 12,669-676. (40) Ford, Barry L. (1976): Missing Data Procedures, A Comparative Study, American Statistical Association, Proceedings of the Social Statistics Section 1976, Pt. 1, 326- 329. (41)Frances Joe D and Lawrence Busch (1975): What we know about – I don’t know, Public opinion quarterly 34, 207 – 218.

87 (42)Ghosh, J.K. and Maiti, Pulakesh (2003): The Indian Statistical System at cross roads an appraisal of Past, Present and Future, presented at the IMS meet during 2-3 January - 2004. (43)Ghosh, A (1953): Accuracy of Family Budget Data with reference to period of re- call, Calcutta Statistical Association Bulletin, 5, 16-23. (44)Gower, A.R. (1979): Characteristics of non-respondents in the Labour Force Survey, Statistics Canada. (45)Groves, Robert, M. and Kahn Robert Louis (1979): Surveys by Telephone, A national comparison with personal interview, New York; Academic. (46)Gray, P. and Gee, F.E.N. (1972): A Quality check on the 1966 ten percent sample census of England Wales, office of the population census and surveys, London. (47)Ghosh, J.K., Maiti, P. Mukhopadhyay, A.C., Pal, M.P (1977): Stochastic Modeling and Forecasting of Discovery, Reserve and Production of Hydrocarbon-with an application, Sankhya, Series B, 59, pt. 3,288-312. (48)Godambe, V.P. (1976) A historical perspective of the recent development in the theory of sampling from actual populations, Dr. Panse memorial lecture organized by Ind.Soc.Agri. stat., New Delhi, 29 th March, 1976. (49)Hansen, M.H., Madow William G., and Tepping B.J. (1983): An Evaluation of Model dependent and Probability Sampling inference in Sample Surveys, JASA, 78, 776- 807. (50)Hanse, M.H., Hurwitz William N. (1946): The Problem of non-response in Sample Survey, JASA, 41,516-529. (51)---------and Nisselson, JASA, 50, 701-719H., Steinberg, J. (1955): The redesign of the current population survey,. (52)----------Jubine, Tomas B. (1963): The use of imperfect lists for Probability Sampling at U.S. Bureau of Census, Bulletin of the International Statistical Institute, 40(1), 497- 517. (53)--------and Pritzker, Lenon (1964): The Estimation and interpretation of Gross differences and the simple response variance. In C.R. Rao with D.B. Lahiri, K-P. (54)Messmer, Donald J, and Daniel T. Seymour (1982): The effects of branching on item non-response, Public opinion quarterly 46, 270 – 277.

88 (55) Nair, P. Pant and S.S. Shrikhande eds. Contributions to Statistics Presented to Professor P.c. Mahalanobis on the occasion of his 70 th birth day Oxford, England, Pergaman, Calcutta Statistical Publishing Society, 111-136. (56)-----------and Bershad, Max A. (1961): Measurement errors in censuses and surveys, Bulletin of the International Statistical Institute, 38, 359-374. (57)-----------Marks, Elis Mauldin, Parker W. (1951): Response Errors in Surveys, JASA, 46, 147-190. (58)------------(1976): Some Important Events in the Historical Development of Sample Surveys in Donald Bruce Owen ed., on the History of Statistics and Probability, Statistics Text Books and Monographs, Volume 17, New York Dekker, 73-102. (59) Hacking, I. (1965): Logic of Statistical Inference, Cambridge University Press. (60) Hurscgberg, David Frederick, J. Scheuren and Yuskavage Robert (1977): the impact on Personal and Family income of adjusting the current population survey for under coverage, Proceedings of the Social Statistics Section, American Statistical Association, 70-80. (61) Hubback, J.A. (1927): Sampling for rice yields in Bihar and Orissa, Imp. Agr. Res. Inst. Bulletin, Pusha (reprinted in Sankhya (1946), 7, 282-294). (62) Halden, J.B.S. (1957): The Syadvada System of Prediction, Sankhay 18, 195-2000. (63) Hacking, J. (1965): Heinemann. Lobgic of Statistical Inference Cambridge University Press. (64)Hoinville, Gerald and Robert Joell (1978): Survey Research Practic, London, (65) Jessen, Raymund J. (1978): Statistical Survey Techniques, New York, Wiley. (66) Jeganathan, P. (1997a): Structural Reading and Evolution of Indus Script Viewed as a Complex System, Part I: Meteorological Reading, Prague Bulletin of Mathematical Linguistics, 67, pp. 75 – 137. (67) Jeganathan, P. (1997b): Also appeared with agreement of editors in RASK, International Tidsskrift for Sprogog Kommunikation, 8, December 1998, pp. 47 – 78. (68) Kendal, Maurice George and William R. Buckland (1960): A dictionary of statistical terms, 2 nd Edition, London, Oliver and Boyd.

89 (69) Kiawer, A. (1895): Observations et experiences concernant des denombrements representatives, Bull. Int. Statist. Inst. 9, 176-183. (70) Kruskel, William and Frederick Mosteller (1980): Representative Sampling, IV, the History of the concept in Statistics, 1895-1939, International Statistical Review, 48, 169-195. (71) Kish, L. (1965): Survey Sampling Wiley and Sons, New York. (72) ------ and Hess 1. (1958): on non-coverage of sampling dwellings, JASA, 54, 509- 524. (73) Kalton, Graham and Daniel Kasprzyk (1982): Imputing of missing survey Response, American Statistical Association 1982, Proceedings of the Section on Survey Research Methods, 22-31. (74) Koop, J.C. (1974): Notes for a unified theory of estimation for sample surveys taking into account response errors, Metrika, 21, 19-39. (75) Kalton, Graham (1983): Compensating for missing survey data Research Report Series, Ann. Arbor Ml, Institute for Social Research, University of Michigan. (76) Kendal, Maurice George and William R. Buckland (1960): A Dictionary of Statistical Terms, 2 nd edition, London, Oliver and Boyd. (77) Kiaer, A.N. (1985): Observations of experiences concernantles denombremetns representatives (79) Lessler J.T., Bull. Inst. Int. Stat. I. Div. I pp 1976. (78) Lessler, J.T. and Kalsbuk W.D. (1992): Non-sampling Error in Surveys, John Wiley and Sons Inc. (1974): A double sampling scheme model for eliminating measurement process bias and estimating measurement errors in surveys, Institute of Statistics Mimeo Series No. 949, University of North Carolina, New …… (80) Lessler J.T. (1980): Error associated with the frames, Proceedings of the America Statistical Section in Survey Research Methods, 125 – 130.

90 (81) Lubin, B. Levitt, E. and Zuckerman, M. (1962): Some personality differences between respondents and non-respondents in a survey questionnaire, Journal of Consulting Psychology, 26-192. (82) Lundberg, G.A. and Larsen, O.A. (1949): Characteristics of Hard-to-reach individuals in field surveys, Public Opinion Quarterly, 13,487-494. (83) Lyberg, L. and Rapaport, E. (1979): Unpublished non-response problems at the national central Bureau of Statistics, Sweden. (84) Little, Roderick J.A. (1982): Models for non-response in Sample Surveys, JASA, 77, 237-250. (85)-------------------------(1983): Super Population models for non-response, Part IV. In William G. Madow and Ingram Olkin eds. Incomplete data in Sample Surveys, Volume 2, Theory and Bibliographies, New York, Academic, 337-413. (86) Lessler, J.T. (1974): A double sampling scheme model for eliminating measurement process bias and estimating measurement errors in surveys, Institute of Statistics Mimeo Series No. 949, University of North Carolina, New Chapel. (87) --------------------(1980): Errors associated with the frames, Proceedings of the American Statistical Association Section on Survey Research Methods, 125-130. (88) Madow, W.G. Nisselson, Harold and Olkin, Ingram (1983): Incomplete data on Sample Survey, Volume 1, Report and Case studies; New York, Academic. (89) Mc. Neil, John M. (1981): Factors affecting the 1980 census content and the effort to develop a post census disability survey. Presented at the annual meeting of the American Public Health Association. (90) Mahalanobis, P.C. and Lahiri, D.B. (1961): Analysis of errors in censuses and surveys, Bulletin of the International Statistical Institute, 38(2), 359-374. (91)----------------and Sen, S.B. (1954): On some aspects of the Indian National Sample Survey, Bulletin of the International Statistical Institute, 34, pt. 2. (92)Mahalanobis, P.C.(1944): On Large Scale Sample Surveys, Philosophical Transactions of Royal Society, 231-(B), 329-451. (93)-------------------(1946): Recent Experiments in Statistical Sampling in the Indian Statistical Institute.

91 (94)--------------------(1941): A Sample Survey of the Acre-age under jute in Bengal, 4, 511-30. (95)---------------------(1954): The Foundations of Statistics, Dialectica, 8, 95-111 (reprinted in Sankhya 18, 183-194) (96) Maiti, P (1983): unpublished Ph.D. Thesis entitled Some Contributions to the Sampling Theory using auxiliary information" submitted to the Indian Statistical Institute, Calcutta. (97)---------------------(2008): Existence of the BLUE for finite population mean under multiple imputation, Statistics in Transition new series, p. 223 – 258; Volume 9, Number 2. (98)------------------------- (2009): Estimation of non-sampling variance components under the linear model, Statistics in Transition new series, p. 193 – 233. (99) Maiti, P. etal. (1999): Strengthening local Government in Madhya Pradesh, Indian Statistical Institute, Kolkata. (100) Maiti, P. (2009): Intra-and-Inter-block variation between fourteen blocks of the rural sector of the district of Howrah, Report on decentralized planning, Indian Statistical Institute, Kolkata. (101) Maiti, P. (2003): Development of Statistical Information System for Decentralised Planning, Occasional paper no. 10 under Development Research Support Scheme, Deparment of Economics and Rural Development, Vidya Sagar University, East Midnapore, West Bengal. (102)---------------, Pal, M. and Sinha, B.K. (1992): Estimating unknown Dimensions of a Binary matrix with application to the estimation of the size of a mobile population, Statistics and probability, 220 – 233. (103) Moser, Clays Adolf and Graham Kalton (1972): Survey methods in Social investigation, 2 nd edition, New York, Basic Books. (104) Mooney, H. (1962): On Mahalanobis' contributions to the development of sample survey theory and method in C.R. Rao etal (eds) contributions of statistics, Pergamon Press.

92 (105)------------------- (1967): Sampling Theory and Methods, Statistical Publishing Society, Calcutta. (106) Neyman Jerzy (1934): On the two different aspects of the representative method, the method of stratified sampling and the method of purposive selection, J. Roy, Statist. SOC. 97, 5589625. (107) Neter, J. and Waksberg, J. (1965): Response errors in collection of Expenditures data by household interview. An Experimental Study Technical Report No. 11 U.S. Bureau of the Census. (108) Newman, S. (1962): Difference between early and late respondents in a mailed survey, Journal of Advertising Research, volume 2,37-39. (109) Ognibene, P. Traits affecting questionnaire response, Journal of Advertising Research Volume 10, 18- 20. (110) Pan, J.S. (1951): Social Characteristics of respondents and non-respondents in a questionnaire study of later maturity, Journal of Applied Psychology, 35, 780-781. (111) Politz, A.N. and Simmons, W.R. (1949): An attempt to get Not-at Homes into the sample without call- backs, JASA, 44, 9-31. (112) Platek, R. (1977): Some factors affecting non-response, Survey Methodology, 3. (113) Plan, V.T. (1978): A Critical appraisal of household surveys in Malaysia Multipurpose household survey in developing Countries, Development Centre, OECD, Paris. (114) Reuss, C.F. (1943): Differences between persons responding and not responding to mail questionnaires, American Sociological Review, 8,433-438. (115) Rao, V.R. and Sastry, N.S. (1975): Evolution of a total survey design, The Indian Experience, Invited paper presented to the International Association of Survey Statisticians Warsaw. (116) Rogers Theresa F. (1960): Interviews by Telephone and in person: Quality of responses and Field Performance, Public opinion quarterly, 40, 51-56.

93 Report of Research Projects (117) (1975-76): Cost Benefit Analysis of Rural Electrification, Project Leader Professor J. Roy, Computer Science Unit, ISI, Calcutta. (118) (1977-78): Calcutta Urban Poverty Survey, Project Leader Professor J. Roy, Computer Science Unit, ISI, Calcutta. (119) (1975-76): CMDA, Health Survey, Computer Science Unit, ISI, Kolkata. (120) (1988-89): A Survey on Domestic Tourists in Orissa, Project Co-ordinator, P. Maiti, IS I, Calcutta. (121) (1994-95): An Enquiry into the Quality of Life in five communities in selected districts of Rural West Bengal, Project Co-ordinator, P. Maiti, ISI, Calcutta. (122) (1995): Community attitudes and Preferences pertaining to cemetery and cremated related issues in the East Rand in the Republic of South Africa, CENSIAT, HSRC, Pretoria, South Africa, Principal Statistician - P. Maiti. (123) (1995): Survey of family and Community life in the Selected Communities of the cape Peninsula of the Republic of South Africa, CENSTAT, HSRC, Principal Statistician - P. Maiti. (124) (1995): the Socio-economic demographic and cultural pattern of the female labour force participation in the North West and the Cape; CENSTAT, HSRC South Africa, Principal Statistician - P. Maiti. (125) (1996): Stanza-Bopape Project; CENSTAT, HSRC, South Africa, Principal Statistician - P. Maiti. (126) (1998): Mid. Term Review of IPP- VIII in Calcutta Metropolitan Area, ISI, Calcutta, Survey Statistician - P. Maiti. (127) (1998): ISI-PWI Project on Strengthening Local Government in Madhya Pradesh,India, ISI, Kolkata – P. Maiti. (128) (1999): ISI-HLL Collaborative Research Project on Business Research, ISI, Kolkata – P. Maiti. (129) (2001, August): National Statistics Commission, Government of India (130) (2009): Statistical Information System for decentralized planning with an application to District of Howrah – P. Maiti, Preserved in the Prasanta Chandra Mahalanobis Memorial Archive and Museum, Kolkata.

94 (131)Roshwalb, Alan (1982): Respondent Selection Procedures within Households, American Statistical Association 1982 Proceedings of the section on Survey Research Methods, 93-98. (132) Rubin Donald B. (1983): Conceptual issues in the presence of non-responses, In William G. Madow and Ingram Olkin eds. Incomplete data in Sample Surveys 2, Theory and Bibliographies, New York, Academic, 123-142. (133) -------------------(1977): formalizing Subjective notions about the effect of non- respondents in Sample Surveys", JASA, 72, 538-543. (134) ---------------(1978): Multiple imputations in Sample Surveys - A Phenomenological Bayesian Approach to non-response, American Statistical Association 1978 proceedings of the Section on Survey Research Methods, 20-28. (135) --------------(1987): Multi imputation for non-response in Surveys, New York, Wiley. (136) Rizvi, M. Haseeb (1983): Hot-Deck Procedures Imputation in William G. Madow and Ingram Olkin eds., Incomplete Data in Sample surveys, 3, Proceedings of the symposium, New York, Academic, 351- 352. (137) Shamasastry, R. (1929): Translation of Kautilya’s Arthasastra, 3 rd Edition, Mysore, Wesleyan Mission Press. (138) Sinha, Bikas (2006): Sample size determination in survey sampling, A lecture notes prepared for the participants under UNDP programme. (139)Stephen, Frederick F. (1948): History of the uses of Modern Sampling Procedures, JASA, 43, 12-39. (140) Smith, T.M.F. (1976): The Foundations of Survey Sampling, A Review., JRSS, 139A, 183-195 (141) S arndal, C.E., Swensson, B. and Wretman, J. (1992): Model Associated Survey Sampling, Springer Verlag, New York, Inc. (142) Scott Christopher (1973): Experiments on recall error in African Budget Surveys,

95 paper presented to the International Association of Survey Statisticians, Viena. (143) Shah, Nasra M. (1981): Data from Tables used in the paper presented at Weekly Seminar of East West Population Institute, October 28, Honolulu. (144) Skeleton, V.C. (1963): Patterns behing income refusals, Journal of Marketing Volume 27. (145) Sudman, Seymour (1976): Applied Sampling, New York, Academic. (146) Suchman, Edward A. (1962): An analysis of bias in Survey Research, Public Opinion Quarterly, 26, 102-111. (147) Scheaffer, Richard, L. Mendenhall, William and Ott lyan (1979): Elementary Survey Sampling, 2 nd edition, North Scituate, MA: Duxbury Press. (148) Szameitat, Kleus and Schaffer, Karl August (1963): Imperfect Frames in Statistics and the consequences for their use in sampling, Bulletin of the International Statistical Institute, 40, 517-544. (149) Singh, Bahadur, Sedransk, Joseph (1978): A two phase sampling design for estimating the finite population mean when there is non-response, In N. Krishnan Namboodiri ed. Survey Sampling and measurement, New York, Academic, 143-155. (150) Thompson, Ib and Siring, E. (unpublished): On the causes and effect of non- response, Norwegian Experiences, Central Bureau of Statistics, Norway. (151) Tuygan, Kuthan and Cavador, Tevfik (1975): Comparison of self and presale responses related to children of ever married women. In laboratories for population Statistics Scientific Report Series No. 17,22-28. (152) Turner, Anthony G., Waltmen, Henry. F, Fay Robert and Carlson Beverly (1977): Sample Survey Design in developing Countries - three illustrations of methodology, Bulletin of the International Statistical Institute. (153) U. S. Bureau of the Census (1974): Standards for the discussion and presentation of errors in data, Technical Report No. 32. (154)……………(1976): An overview of population and housing census evaluation programmes conducted at the U.S. Bureau of Census, Census Advisory Committee of the American Statistical Association.

96 (155) Verma, Vijay (1980): Sampling for national fertility surveys, World fertility survey conference, London. (156) Warwick, Donald P. and Chartes A. Lininger (1975): The Sample Survey, Theory and Practice, New York, Me. Graw Hill. (157) Woltman, Henry and Bushery, John (1975): A panel bias study to the national crime survey. Proceedings of the Social Statistics Section. American Statistical Association. (158) William, W.H. and Mallows, C.L. (1970): Systematic biases in panel surveys JASA, 65, 1338-1349. (159) Warner, Stanley L. (1965): Randomised Response, A Survey Technique for eliminating Evasive answer bias, JASA, 60, 63-69. (160) Zarkovich. S.S. (1966): Quality of Statistical data, Rome; FAO of the United Nations.

97 THANK YOU!!

98 PART II: Bayesian Mode of Analysis: 2. When one is not interested on how the data were collected; but given the data, how analysis can be made: Bayesian Mode of Analysis. The projects where stochastic models were developed and used for explaining the process and estimating the parameters involved using Bayesian Mode of analysis all under this category. Transformation of the real life problems into the statistical ones required a series of discussions with technical experts at different levels of the organizations. In fact, the project formulations were not of routine works; instead, definitions and other related concepts were defined, developed and redefined into the frame work of the present problems.

99 2.1. Indian Statistical Institute- Hindustan Lever Limited (ISI-HLL)- collaborative project on Business Research (October 1999-August 2000): Objective: A corporate sector company engaged in producing some products, particularly, consumer products, would like to know, in the face of uncertainty, the buying behaviour, during the given period in terms of (a) the proportion of buyers purchasing a given brand and no other brand; (b) proportion of buyers purchasing a given brand or a 'combination of brands' at least once out of those, who did purchase the same brand at least once prior to the period under consideration, (c) average purchase frequency of a particular brand or a 'combination of brands' (d) market penetration of a brand or a particular' combination of brands'. Their purpose of gathering the information is to make predictions, if possible, about the demand for its products in the market at some future date.

100 There are some models of buying behaviours available in the existing literature on Market Science. Among these, the Ehrenberg Bayesian model appeared to have given good results in western countries. One area of interest which was important to the HLL was to this Ehrenberg model, if this could be applied to consumer panel data in Indian context. At the request of General Manager, Business research and corporate planning, Hindustan lever Limited (HLL) this project was undertaken at the Indian Statistical Institute, Kolkata to examine the validity of the Ehrenberg model.

101 The appropriateness of the distributional assumption of the model namely, assumption of the negative binomial distribution on the number of purchases and assumption of the Beta priors on p j, the choice probability of the j th brand in a product group were examined.

102 Some Technical novelties of the work were as follows. The data used for the purpose of analysis consisted of, among others, the information on household identification number, brand code, month number and quantity of the product purchased. The number of purchases was the basic input of the model. Thus for the model verification exercise, the data on the quantity (in grammes) purchased of each brand by the households had to be converted into the number of units. This required a knowledge about the standard size of each brand. In the absence of such information precisely and also for the sake of rendering flexibility to our analysis, we used, alternative sizes, viz, 125 grammes, 250 grammes, 500 grammes and 1000 grammes with proper rounding off to obtain the number of units. By this procedure, we generated four sets of data on number of purposes of each brand made by each household in each of the months. The data on number of purchases based on 1000 gramme was in agreement with the experiences as realised by the practitioners of the HLL.

103 “Empirical Bayes” as well as “Hierarchical Bayes” estimates for the model parameters were obtained; Data failing to support the assumption of negative binomial distribution on the number o purchases, insisted us to make some further studies, for each of income group separately, as if the data resulted from a mixed distribution with some mass p at n = 0 (n being the number of purchases) and a binomial of (3,1-p); It was interesting to observe that there had been a good fit on the distribution of the number of brands. The distribution appeared to follow a truncated geometric distribution (truncated at Zero); Assumption of Beta prior on Pj, the choice probability for the j th brand also appeared to be not tenable in the Indian Data Context;

104 The reasons behind the model for not being fitted to the Indian data were tried to be found out. One of the reasons was that unlike the European family, the number of decisions makers in an extended family in a country like India may be many. The model was revised by introducing into it a new stochastic variable namely, the number of decision makers at the household level to examine if the revised model was supported by the indian data. After classifying the households according to family sizes and income groups, buying behaviours across such classes were examined also to detect any pattern of similarity or otherwise; However, buying behaviour did not appear to have been affected by such classifications.

105 2.2. ISI-ONGC Collaborative Research Project (1985-1992) Objective: At the request of Oil and Natural Gas Commission (ONGC) of India, the Indian Statistical Institute evaluated the economic and physical consequences of various strategies for action in different basins in India. Both economic and stochastic models were used for estimating reserve and the per unit cost of hydrocarbon. (Ref Estimation of Discovery and Production Costs of Hydrocarbon with some application to Indian Data. Indian Statistical Institute, Calcutta).. In fact, the project formulation was not of a routine work; instead, definitions and other related concepts were defined, developed and redefined into the frame work of the present problem. FoTransformation of the real life problem into the statistical one required a series of discussions with the technical experts working at the different levels of the organisationr example, The 'reserve in place' was distinguished from the 'recoverable reserve'.

106 The following were the technical novelties among others: The available primary cost data were in the form of well-wise cost. The well- wise cost figures were measured at current prices, and therefore, the cost data for different years were non-comparable. In order to make them comparable, it was necessary to deflate them using a suitable index number of well-drilling cost. No such index number was available which could be used for the purpose of deflation of cost figures measured at current prices and for this, a new index was constructed and applied to the given data; The question of how to aggregate and what economic models to choose had to be resolved; To test on the constancy of success ratio in hydrocarbon exploration, data were examined through a number of statistical devices some of which were based on graphical representations, while the others explored standard statistical techniques.

107 A fully Bayesian Hierarchical method, which provides better estimates of errors in estimation and prediction was sought for. But because of analytical complications, an empirical Bayesian view was taken in predicting the (n+1)th discovery, given the data on past discoveries. Two types of simulation estimates were provided - one based on the assumption of an approximate Gamma distribution of the field sizes where as the second alternative used a Gamma population and employed the classical method of 'importance sampling' for adjustment. Both methods involved novel methods of simulation developed by us.

108 Where the use of sample survey technique is itself an error (complete enumeration): One may not always need to make an estimate of the aggregate population characteristic, but requires information on required items for every population units. Sample survey may not be appropriate in such situations. In such a situation use of a sample survey itself may be an error. The following project was directed towards this direction. Objective: Planning for development involves four different types of activities, formulation, implementation, monitoring during implementation and evaluation on completion. To carry out each of these activities, relevant, reliable and timely information is needed at every stage.

109 The 73 rd and 74 th amendments of the constitution made by the Government of India have squarely laid the responsibility of local planning on local bodies. It stands to reason that information needed for such planning should also be collected and managed by local bodies and hence the Government of India had requested Asian Development Bank (ADB) for technical assistance to strengthen local Government of Madhya Pradesh as part of the reform agenda of the state. At the request of Price Water House India, who was one of the prime contractor to the Bank for this purpose, an inter- firm agreement between PWI and ISI was made to develop the Statistical Information System (SIS) as a part of the project assignment. The SIS was envisaged to be a statistical database for rational decision making. It was expected to address the information needed for planning at Panchayat, Janpad, district and higher levels. This was developed in accordance with the 73 rd /74th amendment of the constitution of India, 1992.

110 The work involved in developing the SIS consisted of (a) identification of required data items, (b) designing of formats of data collation, collection and compilation, (c) specification of output formats, amenable to computerised data base and (d) organisation of workshops in collaboration with the state government to finalise the methodology of data collection and data formats. The SIS developed for decentralised planning has the following major components: Computer hardware forming the container of Statistical Information; Computer software to process the information; Statistical data, the actual content of the system;

111 Highlights of some of the Technical Feature: Extensive surveys of the areas of activities as listed in the 11 th schedule of 73 rd and in the 12 th schedule of 74the amendment and of the lists of items prepared by the expert committee on small area statistics and also of the report by Hanumanth Rao Committee were made. A Survey on identification of the availability of required information with an analysis of the existing data gap was made. Twelve (12) rural and thirty one (31) urban schedules with indications of respective sources of data servings as SIS input manual were developed. The appropriate method of data collection was suggested. Designing the output format for (i) general information on variables considered for decentralized planning, (ii) quarry based information, (iii) report based information and for (iv) summarized information, was made.

112 Schematic diagram of the work involved in development of statistical information system(S I S)


Download ppt "Evolution of the art of keeping Records and Development of Total Survey Design with application to some projects By: Pulakesh Maiti Indian Statistical."

Similar presentations


Ads by Google