Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anna Bombak, Chuck Humphrey, Lindsay Johnston, Angie Mandeville and Leah Vanderjagt Winter Institute on Statistical Literacy for Librarians, February 18-20,

Similar presentations


Presentation on theme: "Anna Bombak, Chuck Humphrey, Lindsay Johnston, Angie Mandeville and Leah Vanderjagt Winter Institute on Statistical Literacy for Librarians, February 18-20,"— Presentation transcript:

1 Anna Bombak, Chuck Humphrey, Lindsay Johnston, Angie Mandeville and Leah Vanderjagt Winter Institute on Statistical Literacy for Librarians, February 18-20, 2009 The Winter Institute on Statistical Literacy for Librarians Demystifying statistics for the practitioner

2 Outline Introductions A framework for understanding statistics Statistics shaped by geography Official statistics: national Official statistics: international Non-official statistics Applying what you have learned

3 Introductions: your backgrounds Please introduce yourself  Your name  Your institutional affiliation  Your librarian responsibilities  Is there anything in particular that you hope will be covered this workshop?

4 Introductions: your backgrounds Almost twice as many from academic than non-academic libraries. In the past, the split has been almost equal. The largest group, with 12, is from universities other than the U of A. The second largest groups, with 5 each, are from government libraries and the U of A.

5 Introductions: your backgrounds Geographically, 16 of you are from Alberta and 10 are from other provinces. We have representa- tion from Ontario, Manitoba, Saskatch- ewan, B.C. and Alberta. Thirteen are from the Edmonton region.

6 Statistics are ubiquitous “Statistics are generated today about nearly every activity on the planet. Never before have we had so much statistical information about the world in which we live. Why is this type of information so abundant? For one thing, statistics have become a form of currency in today’s information society. Through computing technology, society has become very proficient in calculating statistics from the vast quantities of data that are collected. As a result, our lives involve daily transactions revolving around some use of statistical information.” Data Basics, page 1.1

7 Statistics: what are we talking about? Statistics and data are related but different

8 How statistics and data differ Statistics numeric summaries known as facts/figures derived from data, i.e, processed from data presentation-ready format Data numeric files created and organized for computer analysis requires computer processing not in a display format

9 A statistic can’t be real without data A ‘real’ statistic requires a data source. If the publisher of a statistic can’t tell you the data source behind a statistic, you should question that the statistic is ‘real.’ After all, people do make up statistics. Classic example: a statistic in a 1986 Newsweek article claimed that a 40- year-old woman had a better chance of being killed by a terrorist than of getting married (2.6 percent). Twenty years later, Newsweek admitted that this “comparison wasn’t in the study.”

10 A statistic can’t be real without data A statistic may have been derived from poor quality data and, consequently, may be of questionable value. But nevertheless, it is a ‘real’ statistic. For example, a debate erupted over a Lancet article on the number of civilians deaths in Iraq following the first 18 months after the invasion.number of civilians deaths The desire is to have quality statistics that are derived from quality data.

11 Statistics Canada’s criteria Statistics Canada uses the following criteria to define quality statistics or “fit for use” quality  Relevance: addresses issues of important to users  Accuracy: degree it describes what it was designed to measure  Timeliness: the delay between when the information was collected and when it is made available  Accessibility: the ease to which the information can be obtained by users  Interpretability: access to metadata that facilitates interpretation and use  Coherence: the fit with other statistical information through the use of standard concepts, classifications and target populations

12 Statistics are about definitions

13 Statistics are about definitions! You may think of statistics as being just numbers, but these numbers represent summaries of measurements or observations that have a conceptual meaning. Deriving statistics from data is dependent on definitions of the concept that is being summarized. definitions

14 Statistics are about definitions! Consider the following example from the Canadian Census on the data behind statistics about visible minorities. This table displays the size of the visible minority population in Canada from the 2006 Census. Visible Minority Groups (15), Generation Status (4), Age Groups (9) and Sex (3) for the Population 15 Years and Over of Canada, Provinces, Territories, Census Metropolitan Areas and Census Agglomerations, 2006 Census - 20% Sample Data

15 Statistics are about definitions! How is visible minority status identified in the Census? Are aboriginals among the visible minority in Canada? What is the definition of visible minority?

16

17

18 Statistics involve classifications The definitions that shape statistics specify the metric of the data they summarize (for example, Canadian dollars) or the categories used to classify things if a statistic represents counts or frequencies. In this latter case, classification systems are used to identify categories of membership in a concept’s definition. Some classification systems are based on standards while others are based on convention or practice. For an example of a standard, see the North American Industrial Classification System (NAICS).NAICS

19 Statistics are presentation ready Tables and charts (or graphs) are typically used to display many statistics at once. You will find statistics sprinkled in text as part of a narrative describing some phenomenon; but tables and charts are the primary methods of organizing and presenting statistics.

20 A quick review To this point, we have established that:  Statistics are ‘real’ only if they are derived from data;  Statistics are dependent of definitions of the concepts they summarize;  Statistics that represent counts of things in the data employ classification systems, which are based either on standards or convention; and  Statistics are typically organized for display using tables or charts.

21 Characteristics of statistics To discover some additional characteristics of statistics, we will examine a table published by Statistics Canada about the average undergraduate tuition fees for full-time students by field of study. While this table does not display all of the information that I want to find in a published statistical table, it is fairly comprehensive. Refer to the handout entitled, “Tips for Reading a Statistical Table,” to find a full list of the information that I do want to find in a statistical table.

22

23

24 What about data? While we are not focusing our attention on data in this workshop, it is helpful to understand some basics about the origins of data, especially since statistics are derived from data. As we will see later, having a good understanding of data can greatly help in the search for statistics. There are three generic methods by which data are produced. One will find statistics generated from the data arising out of all of these methods.

25 Methods producing data Observational Methods Experimental Methods Computational Methods Focus is on developing observational instruments to collect data Focus is on manipulating causal agents to measure change in a response agent Focus is on modeling phenomena through mathematical equations CorrelationCausationPrediction Replicate the analysis (same data or similar) Replicate the experiment Replicate the simulation Statistics summarize observations Statistics summarize experiment results Statistics summarize simulation results

26 Methods producing data A particular discipline or field of study will tend to be dominated by one of these three methods, although outputs may also exist from the other two methods. Consequently, the knowledge disseminated within a field is often fairly homogeneous in the way statistical information is used and reported. We will see later how knowing the method from which data are derived and the life cycle in which statistics are produced can help in the search for statistics.

27 Life cycle of survey statistics 1Program objective 2Survey unit organized 3Questionnaire & sample 4Data collection 5Data production & release 6Analysis 7Findings released 8Popularizing findings 9Needs & gaps evaluation 1 2 3 4 5 6 7 8 9 Access to Information

28 Life cycle of survey statistics 1Program objective 2Survey unit organized 3Questionnaire & sample 4Data collection 5Data production & release 6Analysis 7Official findings released 8Popularizing findings 9Needs & gaps evaluation 1 2 3 4 5 6 7 8 9 Preserving Information

29 Life cycle applied to health statistics 1Program objectives increased emphasis on health promotion and disease prevention; decentralization of accountability and decision- making; shift from hospital to community-based services; integration of agencies, programs and services; and increased efficiency and effectiveness in service delivery. 1 2 3 4 5 6 7 8 9 Health Information Roadmap Initiative

30 Life cycle applied to health statistics 1 2 3 4 5 6 7 8 9 Health Information Roadmap Initiative 2Survey unit organized 3Questionnaire & sample 4Data collection 5Data production & release 6Analysis 7Official findings released

31 Reconstructing statistics One way to see the relationship between statistics and the data upon which they were derived is to reconstruct statistics that someone else has produced from data that are publicly accessible.

32 Reconstructing statistics 1 2 3 4 5 6 7 8 9 Health Information Roadmap Initiative 1Program objective 2Survey unit organized 3Questionnaire & sample 4Data collection 5Data production & release 6Analysis 7Official findings released 8Popularizing findings 9Needs & gaps evaluation

33 The statistics that we will reconstruct are reported in “Health Facts from the 1994 National Population Health Survey,” Canadian Social Trends, Spring 1996, pp. 24-27. The steps we will follow are:  identify the characteristics of the respondents in the article;  identify the data source;  locate these characteristics in the data documentation;  find the original questions used to collect the data;  retrieve the data; and  run an analysis to reproduce the statistics. Reconstructing statistics

34 The findings to be replicated Page 26

35 Summary of variables identified Findings apply to Canadian adults  Likely need age of respondents Men and women  Look for the sex of respondents Type of drinkers  Look for frequency of drinking or a variable categorizing types of drinkers Age  Look for actual age or age in categories Smokers  Look for smoking status

36 Identify the data source Survey title is identified: National Population Health Survey, 1994-95 Public-use microdata file is announced Page 25 of the article

37 Locate the variables Examine the data documentation for the National Population Health Survey, 1994-95  PDF version is on-line PDF version Use TOC and link to “Data Dictionary for Health” Identify the variables from their content  NOTE: check how missing data were handled Trace the variables back the questionnaire Did sampling method require weighting cases?  NOTE: in addition to the other variables, is a weight variable needed to adjust for the sampling method?

38 Retrieve and analyze the data For universities subscribed to the Statistics Canada Data Liberation Initiative (DLI), the public use microdata from the NPHS can be downloaded without additional cost. See the Statistics Canada Online Catalogue for further cost details. Make use of local data services to retrieve data from the NPHS.local data services to retrieve data

39 Lessons from the NPHS example This example demonstrates the distinction between producing statistics and interpreting statistics that have been published by others. This is an important distinction because:  Choices are made in creating statistics.  Interpreting statistics requires an ability to understand the choices that were made. Searching for statistics that others have published can be facilitated by understanding these points.

40 Search strategies for statistics Over the next two days, we will talk about two general search strategies for finding statistics. The government publications strategy is to identify an agency that might produce and publish such a statistic. This approach relies on knowledge of governmental structure and the content for which agencies are responsible. The data strategy is to identify a data source from which the statistics could have been produced. This approach replies on knowledge of data sources collected by agencies.

41 The data strategy What data source or sources would be used to produce such a statistic? What unit of observation would be needed to produce such statistics? What would the structure of the table look like given time, geography and attributes of the unit of observation? Who would collect such data? Would the source be an official agency? Use the literature trail and its indexes to see if a data source can be found (official and non-official publications)


Download ppt "Anna Bombak, Chuck Humphrey, Lindsay Johnston, Angie Mandeville and Leah Vanderjagt Winter Institute on Statistical Literacy for Librarians, February 18-20,"

Similar presentations


Ads by Google