Download presentation
Presentation is loading. Please wait.
Published byKazimierz Owczarek Modified over 6 years ago
1
Week 1 Lecture Notes PSYC2021: Winter 2019
2
What do we need to bring to an intro statistics course?
Growth Mindset: The belief that abilities can be developed (Dr. Carol Dweck) (Duration: 3:06 minutes) The Power of believing you can improve, a TED Talk by Dr. Carol Dweck: (Duration: 10:20 minutes)
3
On the Psychology of Statistics
Why do we need to use statistics? Why don’t we just use our common sense?
4
On the Psychology of Statistics
Using “common sense” to evaluate evidence means trusting gut instincts, relying on verbal arguments and on using the raw power of human reasons to come up with the right answer. However, the instinct of our gut are not designed to solve scientific problems. If we want to believe in information (e.g., data), we need to keep our personal biases under control. That is, we need to avoid judging the strength of a statement (e.g., information) based on the plausibility of their conclusion. We need to evaluate the strength of an evidence based on how it supports the conclusion. Statistics helps us be objective.
5
What is Statistics? Statistics is the science of data.
Statistical science involves collecting, classifying, analyzing, presenting, interpreting and communicating information (often messy information that requires cleaning and organizing). Statistics in Psychology: Psychology is a statistical science. The things we study in psychology are usually people. If you want to be good at designing psychological studies, you at least need to understand the basic of statistics.
6
Why Study Statistics? Data are everywhere!
Think of statistics as a tool to help you learn about your data. Employees in various careers need to: Read, interpret and communicate reports that contain statistics. Carry out small- and/or large-scale quantitative data collection and analyses. YouTube Channel on “This is Statistics” – Watch Testimonial on the importance of learning statistics for enhancing career opportunities and understanding everyday life: Improving Human Welfare in 2013 International Year of Statistics:
7
What Study Statistics? Statistics helps us:
make sense of the world in everyday life by seeing past the underlying variation (differences) to find patterns and relationships (e.g., in psychology, health, economics, education, environment). become informed citizens by giving us the tools to understand, question, and interpret data. understand articles published in research journals and reports in government agencies and private industries. answer research questions and determine what general conclusions are justified based on the specific results. For example: What contributes to students’ performance in school? Is there a relationship between number of hours of sleep and grade point average for university students? What contributes to people’s quality of their health care? How do we asses the safety and effectiveness of new drugs submitted to Health Canada for approval? Is there a relationship between the amount of violence in video games played by children and the amount of aggressive behaviour they display?
8
Dr. Hanna Fry Videos Can Maths Predict the Future? - Hannah Fry at Ada Lovelace Day 2014 (Duration: 11:17 minutes) Is life really that complex? (Duration: 10:08 minutes)
9
Data Data is plural. Datum is singular. Data is information gathering.
Data is any collection of numbers, characters, images, or other items. Data are observations gathered on the characteristics of interest in statistical analysis. Data are values along with their context. Nowadays data mostly come in an excel form. Data are presented in a table (rows and columns) like the example below (Data on Students’ Attitudes Toward Statistics).
10
Why Do We Need Data? We can make sense of the world by making sense of data. Data help us see the underlying truth, and pattern. Looking at the data in the right way help us learn about how such characteristics are related. For example: From previous slide, the example regarding students’ attitudes toward statistics, we can answer whether there is a relationship between students’ affect toward statistics and their interest in learning statistics?
11
Type of Generating or Obtaining Data
To generate data, researchers use variety of methods, including: Surveys using questionnaires (e.g., The General Social Survey (GSS)) Experiments Direct observation of behaviors in natural settings (e.g., schools (or in classrooms), playgrounds) Data recorded for purposes such as police records, census materials, and hospital files. Published source (e.g., archived existing collections on the internet: databases) e.g., Ontario Ministries: Often called secondary databases (when other researchers want to access these data sets) Available online (sometimes restricted access) Researchers in Academia can request access to restricted data through their affiliated institution
12
Data: Canadian Context
Government of Canada: Government of Ontario: Ontario Ministries: City of Toronto, Open Data:
13
OECD Data Sets The Organisation for Economic Co-operation and Development (OECD) “The mission of the Organisation for Economic Co-operation and Development (OECD) is to promote policies that will improve the economic and social well-being of people around the world” (retrieved from OECD, 2016) Canadian Context: About OECD (4:21 minutes):
14
More Available (on Internet) Data Sets
World Statistics World Bank Data United Nation Statistics Information UNdata:
15
The General Social Survey (GSS)
One important database for researchers (in particular social scientists) contains results since 1972 of the American General Social Survey (GSS), and since 1985 of the Canadian GSS. We will explore both versions of GSS.
16
The General Social Survey (GSS) – The American Version
Every other year, the National Opinion Centre at the University of Chicago conducts the General Social Survey (GSS): This survey of about 2000 adults provides data about opinions and behaviours of the American Public. About GSS: (1:58 minutes) Researchers use this data to investigate how Americans answer a wide diversity of questions, for example: Would you be willing to pay higher prices in order to protect the environment? Do you think a preschool child is likely to suffer if his or her mother works? GSS (Open to Public) Data Available (Archive):
17
Example: The General Social Survey (GSS) – American Version
A GSS question asked a random sample of adult Americans: “About how many good friends do you have?” Go to the Web site: Click on: GSS – with “No weight” as the default (SDA 4.0)
18
Example: The General Social Survey (GSS) – American Version
Web site:
19
Example: The General Social Survey (GSS) – American Version
The responses (answers to the GSS question regarding number of good friends) is stored as: NUMFREND Type NUMFREND as the Row variable name.
20
Click on the Run the table.
Example: The General Social Survey (GSS) – American Version Click on the Run the table.
21
Example: The General Social Survey (GSS) – American Version
A new tab will open in your internet browser. In this tab, we get a table that shows values (1, 2, …, 96+) for “number of good friends” and the “Distribution: Counts/Frequencies, and Percentages” – Percentage = 𝑪𝒐𝒖𝒏𝒕𝒔 𝑻𝒐𝒕𝒂𝒍 𝑪𝒐𝒖𝒏𝒕𝒔 x 100 There were total of 840 valid responses (total valid counts)
22
What were the most common responses?
Example: The General Social Survey (GSS) – American Version What were the most common responses? The most common responses were 2 and 3 friends (about 16% made each of these responses). Note: These counts and percentages are based on valid cases (in this example, 840 people responded).
23
The General Social Survey (GSS) – Canadian Version
The Canadian General Social Survey was established in 1985. It gathers social trends to monitor changes in the living conditions and well-beings of Canadians and to provide information on specific social policy issues. It is compromised of six topics: caregiving, families, time use, social identity, volunteering, and victimization – and is repeated approximately every five years. Each survey collects comprehensive socio-demographic information such as age, sex, education, religion, ethnicity, income, etc. The information on this slide is retrieved from Statistics Canada’s Publication: The General Social Survey: An Overview
24
Example: The General Social Survey (GSS) – Canadian Version
A GSS on social identity question asked a random sample of 27,534 adult Canadians (15 years of age and over) living in ten Canadian provinces: “How many close friends do you have (that is, people who are not your relatives, but who you feel at ease with, can talk to about what is on your mind, or call on for help)?” To access GSS on social identity (open access), we will explore: CHASS (Survey Documentation and Analysis, Computing in the Humanities and Social Sciences at U of T):
25
The General Social Survey (GSS) – Canadian Version
From Canadian microdata, click on “G”, and then click on “General Social Survey (GSS)” Website:
26
The 2013 GSS on Social Identity – Canadian Version
Scroll down the page to find: General social survey on social identity Click on Data (click on Documentation to read about the context of data).
27
Example: The 2013 GSS on Social Identity – Canadian Version
Click on Codebooks > SDA codebooks
28
Example: The 2013 GSS on Social Identity – Canadian Version
Click on Sequential Variable List
29
Example: The 2013 GSS on Social Identity – Canadian Version
Click on “Social contact with friends”
30
Example: The 2013 GSS on Social Identity – Canadian Version
Click on item: scf_100c Number of close friends It only gives us, total case numbers (and not the survey question – we will find out the question another way)
31
Example: The 2013 GSS on Social Identity – Canadian Version
Click on Codebooks > More documentation
32
Example: The 2013 GSS on Social Identity – Canadian Version
Click on “Questionnaire (Format pdf)”
33
Example: The 2013 GSS on Social Identity – Canadian Version
Search in the document for: “close friends”
34
Example: The 2013 GSS on Social Identity – Canadian Version
Go back to the main SDA screen: From the variable selection list, click on “Social contact with friends” Select/click on “scf_100c - Number of close friends” You will see “scf_100c” appear in the “Selected” box in the Variable Selection (see next slide for appearance) Click on Copy to “Row” Under “SDA Frequencies/Crosstabulation Program”, change Weight to “No Weight” Under “Chart Option”, Select “Bar Chart” Under “Table Option”, Select “Question Text” – although, we saw earlier that this information was not available to us this way (we had to search for it in the documentation file) In “Title”, type: Number of close friends reported by 27,534 Canadians Click on “Run the Table”
35
Example: The 2013 GSS on Social Identity – Canadian Version
36
Example: The 2013 GSS on Social Identity – Canadian Version
37
Example: The 2013 GSS on Social Identity – Canadian Version
Most Canadians have 5 friends. Do you expect to obtain the same answers (responses) from different selection of Canadians in the same year (2013)? Do you expect to obtain the same responses from the same number of selected Canadians in 2018? Do you expect to obtain the same responses from different selection of Canadians in 2018? Most Canadians have 5 friends. Of the 27,112 valid cases, 3,870 (14.3%) reported having 5 close friends.
38
Example: The 2013 GSS on Social Identity – Canadian Version
Most Canadians have 5 friends. Do you expect to obtain the same answers (responses) from different selection of Canadians in the same year (2013)? Do you expect to obtain the same responses from the same number of selected Canadians in 2018? Do you expect to obtain the same responses from different selection of Canadians in 2018? Most Canadians have 5 friends. Of the 27,112 valid cases, 3,870 (14.3%) reported having 5 close friends.
39
Elements of Statistics: Subject, Sample, Population
An Individual Case (subject, participant, unit) is an object about which we collect data. The cases that a study observes are called subjects or participants for that study. The cases are sample of cases selected from some larger population that we would like to understand. Example: students, people (such as in the GSS study), families, schools, cities, companies, etc. The entire group of individuals that we want information about is called population. A population is a set of individual cases (total set of subjects/participants) that we are interested in studying (collecting information about). A sample is a subset of the individual cases (units) of a population on which the study collects data. Why Sample? We would like to know about an entire population of individuals, but examining all of them is usually impractical, if not impossible.
40
Elements of Statistics: Representative Sample
A representative sample exhibit characteristics typical of those possessed by the population. It is a kind of snapshot of image of larger world. The most common way to satisfy the representative sample requirement is to select a random sample as it ensures that every subset of a fixed size in the population has the same chance of being selected. Sample need to be representative of the population. That means, it need to: Match the population Avoid bias: over or underestimate some characteristics of the population
41
Randomization Randomization is an important sampling method.
The quality of inference based on summary statistics obtained from data depends on how well the sample represents the population. Simple Random Sampling (SRS): A method of sampling for which every possible sample of size n has equal chance of selection (note we use n as our notation for sample size - a universal notation). The word “simple” distinguishes from other form of sampling (later discussed in this lecture). Why use SRS? Everyone in the target population has an equal chance (fairness) of being selected. Example: Choose 2 students from 4 equally qualified applicants. How to select SRS?: Need a list (sampling frame), e.g., The list of all employees at a particular company can be obtained from the department of Human Resources at that company. Use random digit numbers (use tables or computers) to select a representative sample. (e.g., randomly 100 employees from that company).
42
Randomize: Why Does Randomization Work?
It protects us from the influences of all features of our population by making sure that, on average, the sample looks like the rest of population. It protects us from factors that we know are in the data, and it can protects us from factors that we do not know that are in the data. We cannot predict which individuals are going to end up in sample. With a large sample, the sample will have approximately right proportion of different genders, different age groups (e.g., young, old), different living areas (e.g., urban, rural), and of course many more layers (grouping) or things that we didn’t think of.
43
Example of Population and Sample
The Programme for International Student Assessment (PISA) is a triennial international survey which aims to evaluate education systems worldwide by testing the skills and knowledge of 15-year-old students. Around 510,000 students (sample) in 65 economies took part in the PISA 2012 assessment of reading, mathematics and science representing about 28 million 15-year-olds globally (population). (duration: 12:14)
44
Elements of Statistics: Variables
Variation is the of statistics. Variation is the foundation of sound reasoning about the data. Statistical methods helps explain the variation in the data; We model the variation in the data. A variable is a characteristic of an individual case in the population. Any characteristics we can measure for each subject is a variable. A variable can take different values on different cases.
45
Example of Variables Consider an undergraduate university students’ data base. Individual cases (subjects): Students of the university. Variables: Program of Study, Year of Study, Gender, University GPA, High School GPA, etc. For example: There is student to student university GPA variation. Different university GPA for different students. What could explain the variation (differences) in students’ university GPA? Hours of study, High School GPA, etc.
46
Parameters and Statistics
Parameter: A numerical summary of the population. True values in the population. Actual numerical values in the population. E.g., Actual number of people voted for a political party in the entire country (entire population) Statistics: A numerical summary of the sample data. Numbers calculated from a data set. E.g., a survey that is a subset of a population (NOT the entire population) may result in indicating the estimated number of people (in the survey) who voted for a candidate in a electoral political campaign.
47
Sampling Variability and Potential Bias
The result of any study depends on which subjects are sampled. The values of sample statistics differ from sample to sample. Sampling Error: How much the statistic differs from the parameter it predicts because of the way results naturally exhibit variation from sample to sample. Sampling error of a statistic is the error that occurs when we use a statistic based on a sample to predict the value of a population parameter.
48
Example of Sampling Error
Suppose Gallup, Harris, Zogby, and Pew polling organizations each randomly select 1000 adult Canadians, in order to estimate the percentage of Canadians who give the prime minister’s performance in office a favorable rating. Based on the sample: Gallup reports approval rating of 63% Harris reports 68% Zogby reports 65% Pew reports 64% Suppose the actual percentage of the population of adult Canadians who give the prime minister a favorable rating is 66%. Then we have for each organization: Gallup reported 63% had a sampling error of 63% - 66% = -3% Harris reported 68% had a sampling error of 68% - 66% = +2% Zogby reported 65% had a sampling error of 65% - 66% = -1% Pew reported 64% had a sampling error of 64% - 66% = -2%
49
Note Regarding Sampling Error
Random sampling protects against bias, in the sense that the sampling error tends to fluctuate about 0 (sometime positive, e.g., in Harris poll, sometimes negative, e.g., in Gallup poll). In actuality (in practice) the values of population parameters are unknown. Thus, the sampling error is unknown.
50
Statistical Analysis: Descriptive or Inferential
Statistical analysis: A way of analyzing data. Description and inference are two types of statistical analysis. Descriptive statistics summarizes the information in a collection of data (e.g., graphs, tables). The main purpose is to reduce data to simpler and understandable form without losing information. (e.g., in a census). Inferential statistics provide predictions about a populations, based on data from a sample of that population. The main purpose is to provide a prediction about the larger population. (e.g., characteristics of entire populations) using the sample data (e.g., PISA results in OECD countries).
51
Example The 2013 Canadian GSS on Social Identity asked a random sample of 27,534 adult Canadians aged 15 and over living in ten Canadian provinces “How many close friends do you have (that is, people who are not your relatives, but who you feel at ease with, can talk to about what is on your mind, or call on for help)?”. The result showed that 14.3% of 27,211 (Valid Cases) sampled subjects reported having 5 close friends. Identify the following: Population: Sample: Statistic: Parameter:
52
Example The 2013 Canadian GSS on Social Identity asked a random sample of 27,534 adult Canadians aged 15 and over living in ten Canadian provinces “How many close friends do you have (that is, people who are not your relatives, but who you feel at ease with, can talk to about what is on your mind, or call on for help)?” The result showed that 14.3% of 27,211 (Valid Cases) sampled subjects reported having 5 close friends. Identify the following: Population: The collection of all adult Canadians (over 35 millions) Sample (Total Cases): 27,534 subjects. Statistic: 14.3% of 27,211 (Valid Cases) sampled subjects reported having 5 close friends. 14.3% describes a characteristic of the sample, it is a descriptive statistics. Parameter: The true percentage of all adult Canadians (the entire population) having 5 close friends. An inferential statistical method can predict how close the sample value of 14.3% was likely to be to the unknown percentage of the population having 5 close friends. For example: An inferential method presents that the population percentage that had 5 close friends falls between 13.9% to 14.7%. This means that the sample value of “14.3%” has a margin of error (same term for sampling error) of 0.4% (even though the sample size was tiny compared to population size, we can use this statistic to make an inference about the true percentage of Canadian population that had 5 close friends).
53
Simpson’s Paradox Simpson’s paradox is named after the statistician who first described it in 1960s. The Simpson’s paradox describes the problem of combining percentages over different group. One famous example of Simpson’s paradox arose during an investigation of admission rates for men and women at the University of California at Berkley’s graduate school. The contingency table below illustrates this example: About 44% of males were admitted but only about 35% of females were admitted. It looked like a clear case of sex discrimination due to the difference of 9% (44% - 35% = 9%).
54
Simpson’s Paradox However, when the aggregated data were broken down by the department (e.g., Engineering, Law, Medicine), that is when the data was disaggregated, it turned out that within each school, the females were admitted at nearly the same or, in some cases, much higher rates than the males. The contingency table below illustrates this (the names of the departments are removed for privacy reason): Most departments had higher admission rates for females than for males. Females applied in large numbers to departments with very low admission rates (e.g., law, medicine). Men tended to apply to Engineering and Science. Those departments have admission rates above 50%.
55
Simpson’s Paradox When we look at the aggregated data, the female applicants had a much lower overall admission rate. But when we look at the disaggregated data, taking to the account individual behaviour of all departments, it turns out that the actual departments were, if anything, slightly biased in favor of female applicants. The sex bias in total admissions was because women tended to self-select for harder departments. From a purely legal perspective, that puts the university in the clear.
56
Simpson’s Paradox What is the moral of story?
The issue often is whether you are unknowingly averaging or combining over an important unidentified information (variable) buried in your data (lurking variable). Think about and try to identify variables that have the potential to alter your understanding of a relationship. Note that identifying such variables may not always be easy.
57
Summary We introduced the need for learning statistics in psychology in order to understand data and the ways in which data can be collected. We explored the General Social Survey (GSS), both the American and Canadian Versions, with the free web-based Survey Documentation and Analysis (SDA). We introduced elements of statistics (e.g., Subject, Population, Sample, Representative Sample). We defined the terms statistic and parameter. We describe the notion of statistical analysis (Descriptive Statistics and Inferential Statistics). We illustrated an example of Simpson’s Paradox.
58
Nice to meet you! Please bring your laptop to the lectures for exploring data using RStudio.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.