STATISTICAL LITERACY, NUMERACY AND THE FUTURE Peter Holmes, Senior Consultant, RSS Centre for Statistical Education. Nottingham Trent University, Nottingham England, 2003 I think the whole thing started in England. Brits do start some things. We started with a word. We had a word that you didnt have. In 1959, there was a government report in England that talked about the numeracy problem. … it was talking about the education of 16-year-olds saying that they needed to be literate. There was a literacy strand, but they also needed to be numerate. So there was a numeracy strand. So from 1959, we have had a very good English word called numeracy."
…Theres now Statistical Numeracy, Statistical Literacy, or Statistical Reasoning or Statistical Thinking…. But theyre all in the same ballpark. The word numeracy when it was first introduced was in the context of the ability to use numbers in practice. … particularly in the context of statistics that you might have to read and interpret. In fact in that first use of [numeracy] in 1959, it was in terms of reading tables. STATISTICAL LITERACY, NUMERACY AND THE FUTURE Peter Holmes, 2003
A more recent take on Statistical Literacy… Statistical Literacy studies the use of statistics as evidence in arguments (Schield, Milo 1998,1999) "A key element of statistical literacy is assembly: how the statistics are defined, selected and presented" Schield, Milo (2004). "Information Literacy, Statistical Literacy and Data Literacy". IASSIST Quarterly 28 (2-3): 6-11.
Literacy matters. There is no argument about that fundamental statement. But numeracy counts. Research in numeracy trails research in literacy by 50 years. It will never catch up if elected leaders and politically appointed officials continue to exclude numeracy. That means numeracy needs to count more. Lynda E. Colgan: Kingston Whig-Standard, January 18, 2006, p. 5
Know about and how to use major statistical sources (print and electronic, national and international) Know about value-added commercial products that may hide statistical details from us. Be critical consumers of statistics Be familiar with and able to make informed decisions about the use of charts, graphs, mapping, etc used in the presentation of statistics. Summarized from: Data and Statistical Literacy for Librarians Ann S. Gray IASSIST Quarterly, Summer/Fall 2004 Special Issue: Developing Statistical Literacy Issue 2/3 What Librarians Need to Know:
Published 2001 Published 2004 More Damned Lies and Statistics: How Numbers Confuse Public Issues
Statistics The word statistics Origins in the 1600s Political arithmetic used to calculate population size & life expectancy A growing population was thought to reflect a healthy state – so early number crunchers became known as statists. Hence, development of the term statistics…
Statistics crop up in a variety of circumstances in Libraries… Copyright: Unshelved.com (c) Overdue Media LLC and used with permission
Contrary to Laines email signoff: Smoking is a major cause of statistics statistics are in fact, a major cause of social problems. Statistics identify and define social issues (a.k.a. problems) and provide ammunition to those who would promote these issues. Belief in the numbers, especially those reported by experts, typically solidifies popular conviction that a problem exists. Statistics Create Social Problems
Issue or situation Measurement Opposition Promotion Awareness Official statistics Polls, etc. Official statistics Polls, etc. Activists, media, officials, experts, etc. General public awareness and/or involvement Defence of policies, interests, etc. Statistics Create Social Problems Number Laundering
Best describes three types of people when it comes to statistics: Cynical, Naïve, and Critical Cynical – Suspicious of statistics; as consumers of statistics, not willing to give them much stock. They will often discount or ignore statistics that dont align with their views. Worse, as producers of statistics, cynics will collect and report statistics in such a way as to support their point of view. Derived from Best, 2001, p 162-167
Naïve – Slightly more sophisticated than the Awestruck; they think they understand something about statistics (but often dont), and are basically accepting of any numbers they encounter, and accept that they mean what they appear to mean. As consumers of numbers, they are bad enough, but as producers of numbers they can be as dangerous as cynics, if not worse. Derived from Best, 2001, p 162-167
Critical Thinkers – Not negative or hostile; thoughtful in approaching statistics. Recognize that statistics summarize complex information into relatively simple numbers and that as a consequence some of the complexity is lost. Statistics are a product of choices and more specifically a compromise among choices. Given this, approaching statistics with a critical eye is only being prudent and responsible. Critical thinkers ask questions about statistics. Derived from Best, 2001, p 162-167
Some Common Problems Geographic comparisons – there is a good chance statistics gathered from different places are based on different definitions and different measurements. For example, comparing US and Canadian statistics on race is complicated by different perspectives on this issue (i.e. definitions and measurements can vary widely).
Cult X is the fastest growing religion in Canada On closer examination, the cult grew from 20 to 200 members (a 1000 % increase). To match this, the Catholic Church in Canada would have to grow from 13 million to 130 million – far more than the population of Canada. SIZE MATTERS… Comparing groups (derived from Best, 2001 p. 113)
Numbers vs Percentages Most poor people are white Take, for example, a population of 700 families 600 white families, of which 60 are poor 10% 100 visible minorities, of which 20 are poor 20% Number Percentage In absolute numbers, more white families are poor, but… Proportionally, more visible minority families are poor.
Mutant Statistics Not all statistics start out bad. Even good numbers can be stretched, twisted, distorted, or mangled… generating mutant statistics. Best, 2001, pp. 62 - 95 Generalizations, Transformations, & Confusion There are three main ways mutant statistics are created:
Robert Ludlum, The Ambler Warning 2005, p. 465-466. An Economist, Physicist, and Statistician were driving through Scotland, and they see a brown cow… The Economist says, Fascinating that the cows in Scotland are brown. The Physicist says, Im afraid youre overgeneralizing from the evidence. All we know is that some cows in Scotland are brown. The Statistician shakes his head at both of them. Wrong again. Completely unwarranted by the evidence. All we can infer, logically, is that there exists at least one cow in this country, at least one side of which is brown. Generalizations…
Generalizations Measuring ALL the cases of a given social phenomenon is normally not feasible. We collect samples and generalize, but problems can arise: Definitions Measurements Sampling Best, 2001, pp. 62 - 95
Definitions – In 1996,... news media reported on what was considered to be a rash of arson fires against black churches in the southern U.S. Amid those images were fears of raging racism. Statistics were suspect because of poor definitions of what was an appropriate church fire to include in the counts. Analysis of six years of federal, state and local data found that the number of arson cases was up, but that these increases applied to both black and white churches in roughly equal proportions. …There was NO dramatic increase in the number of insurance claims made against church fires. http://www.emergency.com/arsnstat.htm & Best, 2001, pp. 62 - 95
Measurements – Hate crimes statistics are gathered across many jurisdictions. Best, 2001, pp. 62 - 95 Race Religion Sexual Orientation Ethnicity/National Origin Disability Multiple-Bias Incidents But, ultimately, any crime could be a hate crime. It comes down to a question of motive – and how do you objectively and consistently measure motive?
Sampling – Bad sampling can give rise to mutant statistics. If youre in the wrong place, or at the right place at the wrong time, your sample wont be representative. A report on racial profiling by Kingston Police was criticized for this. Best, 2001, pp. 62 - 95 Calculation of the Police Stop Rate: Number of Stops divided by Population Estimate Times 1,000
BUT… How, when and where was this mini-census conducted? BUT… How, when and where was this mini-census conducted?
Transformations This form of mutant statistics results from transforming the meaning of a number. Take the estimate that 6% of the 52,000 Roman Catholic Priests in the US are at some point in their adult lives sexually preoccupied with young people Source: A former priest turned psychologist who treated disturbed clergy and derived this estimate from his observations. transformed into 6% of priests are pedophiles. Best, 2001, pp. 62 - 95
Transformations: 1. People forgot that it was an estimate and treated it as fact. 2. The original sample was drawn from priests who sought psychological help (hence a biased sample) and generalized to all priests. 3. People turned Sexual preoccupation into actual behaviour. 4. Young people were morphed into children – bringing the word pedophile into the mix.
Confusion Garbling complex statistics Wendy Watkins of Carleton University provided an example: Two polling companies, Decima and Compass, surveyed Canadians regarding Harpers policy on the Middle East. Decima – 30 % approval of policy Statistic based on a single question: What do you think about Harpers Middle East policy? Compass – 60% approval of policy Statistic based on an amalgam of responses to several questions – Israels right to defend itself… Syria flouting UN sanctions… Iran flouting UN sanctions… etc. Compass Survey sponsored by a right-leaning Think Tank
This kind of statistics is about as valid as the one that argues that the average Canadian has one testicle
How can you recognise good, reliable, well reported statistics?
A critical view Look at: Who collected the data (source) Why were they collected How were they collected What was counted When the data were collected How were the data processed after collection (added up, averaged, grouped etc.) How are the data being presented. Always read the footnotes!
Who? - Formal Organizations Statistics Canada (National statistical agency) United Nations Statistics Division (national statistics) OECD (NGO) Provincial and Municipal governments –Ontario –City of Toronto Societies and Associations: –Cancer Society; Amnesty International etc.
Sources Companies: –Sears Canada; Ford etc. Consumer advocacy groups: –International Coffee Organization –Dairy Farmers of Canada Publications (print and electronic) –Annual reports from companies and societies –Journal articles, print and electronic –Newspapers, print and electronic, such as Toronto Star, Globe and Mail –Commercial databases such as Datastream
Sources – Media etc. Media –Magazines range from National Enquirer to Chatelaine, MacLeans to the Economist –Newsfeeds - Reuters to more dubious ones Informal Organizations –Wikipedia – variable content –User groups – again a range from professional ones to casual ones –Blogs, Chatrooms
Good or Quality statistics If the figures are from a reputable source then usually considered good But still consider the Why? Especially for companies, opinion polls, consumer organizations, advocacy organizations such as Greenpeace, United Way etc. Can get question bias Can get sample bias
Government planning at all levels Political reasons (good, bad or neutral) Academic research Commercial reasons (company finances, resellers of data, media, etc.) Baseline data (environment, health) Advocacy organizations (Greenpeace, Amnesty International, Cancer Society)
Census and Statistics Canada surveys: can be considered a gold standard Academic research Companies, product associations Media
How - Newspapers, Magazines MacLeans University issue –Now in its 16th year, the annual MacLean's rankings assess Canadian universities on a diverse range of factors –From its inception, Macleans has consulted with academic experts about the design, composition and methodology of the rankings. –Universities boycotting it now Globe and Mail University survey –students register themselves therefore self selections –More than 32,700 students answered over 100 questions –Our assessment has spread to 49 schools -- up from 37 Toronto Life surveys –Talk to 100 pedestrians about a topic
What is being counted? Need to be aware of definitions so you can get comparable data over time and place If it is a number what does that number represent: –a person, a household, a family? –Total, single or multiple responses? –income or earnings? –a weight, kilograms or pounds? –a currency, Can$ or U.S.$ –Is it a percentage? –Is it in millions or does the table have a 000 sign?
What is the unit of measurement? Is it a rate e.g. Unemployment rate? Is it indexed e.g. Consumer price index? –What is the base date –Has the basket of goods changed Is it seasonally adjusted? Are classifications comparable: –NAICS 2000 vs. SIC 1980, definition of pet food may have changed –Concordances exist
Household internet use at home by internet activity What is being measured?
Internet use by individuals by type of activity
What is the unit of measurement - Geography Make sure that if data are from different tables or sources that they are for the same geographic area –North America vs. U.S.A. –Maritimes vs. Atlantic Canada –City of Toronto 1998 and before vs. City of Toronto after amalgamation. In the late 1990s many municipalities amalgamated –Prior to 1949 Newfoundland was not part of Canada –Nunavut included in the Northwest Territories prior to1999
Date of the Data! Data are often several years old before publication There should always be a date that tells you what time period the data are for and the unit of time – monthly, quarterly, annual etc. Census data – the income information is always for the previous year so the 2006 census will give income for 2005
Presentation of the data Often crucial for the awareness of the value of statistics Can be in the form of : Text Tables Graphs and charts Maps
Discussion Points What are the responsibilities of reference desk staff in evaluating statistics and educating users? –Do we review the stats with the user when we direct the user to them or is caveat emptor? –Should we direct users to a website or a handout that talks about how to recognize good statistics
Discussion points What are the chances of people actually reading the necessary information? Does our responsibility vary with the type of library we work in? –School –Public –Post secondary
Lies, Damn Lies and Statistics! (attributed to Disreali 1804-1881) Scepticism about statistics has been around for a long time – need to be a critical thinker! What should we look at to get some idea of the validity and reliability of the statistics we or our user have found?
Sources (Who) (adapted from Rice, 2006) Formal Organizat. PublicationsMediaInformal Organizat. Individuals National Govt. BooksT.V.Special Interest Statisticians Local GovtJournal Art.MagazinesE-MailExperts UniversitiesReportsRadioUser groupsTeachers CompaniesNewspapersNewsfeedsChatroomColleagues Non-Govt Organizat. Commercial websites Open Repositories Web Pages (Wikipedia) Librarians SocietiesOpinion PollsBlogsFamily
How were the data collected? Census and Statistics Canada surveys –Usually a lengthy user guide that gives you details of the methodology http://www.statcan.cahttp://www.statcan.ca –Structured questionnaire with carefully phrased questions e.g. Census form –Selected sample – who were selected and why, which populations were over or under sampled e.g. some native communities opt out of the census –How and when it was carried out – personal interview, telephone survey, web survey. What the follow-up was to get responses from missed respondents.
How were the data collected? Academic research –Usually can get methodology from researcher –May be mentioned in book or article –May be web-link to method and data Companies, product associations –May be somewhere on the website e.g. http://www.ico.org http://www.ico.org –May not give much detail Media often only give source and no details e.g. Statistics Canada
Internet use by individuals by type of activity
Reading tables 101 Laine Ruus University of Toronto Data Library Service 2007/02/02 OLA Super Conference 2007
Take a table, one that Statistics Canada publishes like this: Source: STC cat no. 71-001-XIE200612 We can now make part of the table look like…
Full vs part-time employment by gender, Canada, 2005 …this (note, its a different date, and therefore different numbers from the previous slide): Source: Labour force historical review: table cd1t15an. [computer file] 2006 ed. And compute some percentages to make it look like…
Full vs part-time employment by gender, Canada, 2005 More males work full-time than part-time: True/False More females work full-time than part-time: True/False Three times as many women as men work part-time: True/False Women are three times more likely to work part-time than men: True/False Source: Labour force historical review: table cd1t15an. [computer file] 2006 ed. …this:
Full vs part-time employment by gender, Canada, 2005 Of those who work full-time, 2/3 are men: True/False Of those who work part-time, 2/3 are women: True/False Almost twice as many women work part-time as full-time: True/False 100% Source: Labour force historical review: table cd1t15an. [computer file] 2006 ed.
…but the table behind the numbers is… Source: Labour force historical review: table cd1t15an. [computer file] 2006 ed.
Do you agree with this Toronto Star reporter? Source: Toronto Star, Dec. 9, 2006
Now for a slightly more complex table: Source: Labour force historical review: table cd1t15an. [computer file] 2006 ed Less than 15% of males who work full time are over 55: True/False Of males who work part time, the largest number are youth: True/False Fewer women 25-54 work part-time than full-time:True/False
Same table – but wheres the 100% now? Source: Labour force historical review: table cd1t15an. [computer file] 2006 ed Twice as many young women as young men work part-time:True/False Twice as many women as men over 65 work part-time: True/False Women over 65 are twice as likely to work part-time as men: True/False Most of the men who work part time are under 24 or over 65: True/False
And heres what the table values/counts are: Source: Labour force historical review: table cd1t15an. [computer file] 2006 ed
Lesson 1: Can compare sizes of percentages and rates only within the row/column in which they have been computed (ie add up to 100%) Between rows/columns, can only compare relative proportions or likelihoods, or counts.
Source: Census of Canada, 2001: legal marital status, age groups, and sex for population (Topic based tabulations; 97f0004xcb2001001) Source: Census of Canada, 2001: legal marital status, common-law status, age groups, sex and household living arrangements for population 15 years and over (Topic based tabulations; 97f0004xcb2001040) Why are these two numbers so different? Which one is correct?
And this is what the original Statistics Canada publication called the same table: Same table, different titles. Which one would you use? Source: Women in Canada. STC cat no. 89-503, pl. 116
Employment rate and participation rate are not the same thing: participation rate = ((labour force)*100 (total population 15 and over) employment rate = ((employed labour force)*100 (total population 15 and over)
Source: Labour force historical review 1999 ed.: table tab01an.ivt. This is the original table from the Labour force historical Review cd-rom participation rate = (labour force / total population 5 and over) * 100) participation rate = (labour force / total population 5 and over) * 100)
Lesson 3: whenever possible, go back to the original data collector.