Presentation on theme: "Automating the Production of Descriptive Tables at Statistics Canada mog.ado, a user-written program with quality controls Questions and comments may be."— Presentation transcript:
Automating the Production of Descriptive Tables at Statistics Canada mog.ado, a user-written program with quality controls Questions and comments may be sent to the author at firstname.lastname@example.org
Statistics Canada Statistique Canada Contents Environment of where mog was developed Statistics Canada Purpose of mog Examples Options: present and future
Statistics Canada Statistique Canada Statistics Canada Statistics Canada produces statistics that help Canadians better understand their countryits population, resources, economy, society and culture Objective statistical information is vital to an open and democratic society. It provides a solid foundation for informed decisions by elected representatives, businesses, unions and non-profit organizations, as well as individual Canadiansinformed decisions As Canadas central statistical agency, Statistics Canada is legislated to serve this function for the whole of Canada and each of the provinceslegislated In addition to conducting a Census every five years, there are about 350 active surveys on virtually all aspects of Canadian life Data uses include: GDP, CPI, unemployment rate; health, social and education statistics We at Statistics Canada are committed to protecting the confidentiality of all information entrusted to us and to ensuring that the information we deliver is timely and relevant to Canadiansprotecting the confidentialityrelevant to Canadians Visit us at www.statcan.gc.ca for more informationwww.statcan.gc.ca Source: http://www.statcan.gc.ca/about-apercu/overview-apercu-eng.htm
Statistics Canada Statistique Canada Collection and Dissemination Collecting data (census, administrative data and surveys) Questionnaire development, testing, collection, and data processing Check data Verification (errors in processing, coding mistakes) Certification (compare estimates to other data sources) Preparations for dissemination (e.g. for an analysis made on the data) Reliability of the estimates is acceptable Suppression (confidentiality of respondents is being protected) Significance testing between estimates
Statistics Canada Statistique Canada Purpose of mog mog designed to automate the dissemination quality control steps of: reliability, suppression, and significance testing As well, it displays estimates by up to two other classification variables in tabular form Result: a table giving estimates (mean or total) of one variable over one or two other categorical variables Useful for simple, descriptive statistics
Statistics Canada Statistique Canada Example I Make a table showing the mean of retired by age and education categories (similar to table education age, c(m retired)), but with quality control checks mog retired education age, nodetail survey dec(0) Means of retired by education and age Estimation technique for standard errors: linearized Table 45 to 65 66 to 75 Over 75 doctorate/maste~ 20 87^ 88^ diploma/certifi~ 16 86^ 97^ some university~ 18 83^ 92^ high school dip~ 18 88^ 79^ some secondary/~ 26 76*^ 76*^ Notes * significantly different from the reference group of the variable educ5, category number 1, p <.05 ^ significantly different from the reference group of the variable age3, category number 1, p <.05 The data in the table is not real.
Statistics Canada Statistique Canada Example II Same as example I with additional options mog retired education age, nodetail /// survey dec(0) ref2(2) pubs pubdichot underscores varwidth(40) Means of retired by education and age Estimation technique for standard errors: linearized Table 45_to_65 66_to_75 Over_75 doctorate/masters/bachelor's_degree 20^E 87X 88X diploma/certificate_from_community_colle~ 16^ 86 97^X some_university/community_college 18^E 83X 92X high_school_diploma 18^ 88X 79 some_secondary/elementary/no_schooling 26^ 76* 76* Notes * significantly different from the reference group of the variable educ5, category number 1, p <.05 ^ significantly different from the reference group of the variable age3, category number 2, p <.05 The data in the table is not real.
Statistics Canada Statistique Canada Example I: the Long Way At Statistics Canada, to create the table in our example that meets key confidentiality and quality requirements (there are others) would need the following commands to be run: One table command to create a table of estimates One mean command and one estimates table command to examine individual significance of the 15 estimates 22 test or lincom commands requiring visual inspection of results One tabulate command and a visual inspection of 15 cell counts In total, 26 lines of code and 52 numbers that need to be visually inspected, as opposed to 1 line of code to run mog and inspecting the 15 estimates it produces, all in one place The work multiplies for each table you have All of the above needs to be done again if the sample changes
Statistics Canada Statistique Canada Copying Process Select the table rows from the mog output Right click and select: copy table if copying to a spreadsheet or word processor (in a Word table, select enough rows and columns in the table into which you are copying) Other options include: copy text if copying to a word processor where you will use a fixed width font copy html if copying to a location where you want a table to be automatically generated mogs underscore option useful when value labels have spacesensures the correct number of columns are created
Statistics Canada Statistique Canada Other Options Display Options: Number of decimal places displayed; number rounding Control of column width (although columns will automatically enlarge if large numbers/many decimal places are to be displayed) Reshow table by typing mog with no arguments Reshow table with different reference groups (or other display options) without re-estimating the variances (time saver when bootstrapping) Can show quality control symbols that indicate: individual statistical significance of results at two user-defined thresholds (e.g. F = do not publish if cv > 1/3, E = publish with warning if 1/3 >= cv >= 1/6); and whether the estimate is based on enough observations (e.g. X if too few) The cut-offs and symbols can be changed as per the users needs Statistics Canada surveys have User Guides that indicate these values Analysis Significance level used for tests between classification levels can be changed (.05,.01, …) mog is byable Will use svyset information in variance estimation via survey option (not through svy prefix)
Statistics Canada Statistique Canada Future Options Save table as a csv file Show standard errors/t-ratios under estimates Harmonize syntax with Statause over() option to specify classification variables Use estimates based on different populations by one classification variable Use with proportion command Find alternative to the underscores option
Statistics Canada Statistique Canada Requests for the Program Contact me directly at email@example.com and I will send you the firstname.lastname@example.org Please provide me with any comments you may have on bugs, wording, inconsistencies, etc. After receiving enough feedback, I will update the program and make it available online at one of the stata program archive sites