Statistics for Linguistics Students Michaelmas 2004 Week 1 Bettina Braun.

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

Unit 16: Statistics Sections 16AB Central Tendency/Measures of Spread.
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Statistics.
Measures of Central Tendency. Central Tendency “Values that describe the middle, or central, characteristics of a set of data” Terms used to describe.
Descriptive (Univariate) Statistics Percentages (frequencies) Ratios and Rates Measures of Central Tendency Measures of Variability Descriptive statistics.
DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion.
Descriptive Statistics Statistical Notation Measures of Central Tendency Measures of Variability Estimating Population Values.
PSY 307 – Statistics for the Behavioral Sciences
Descriptive Statistics
Intro to Descriptive Statistics
Edpsy 511 Homework 1: Due 2/6.
Data observation and Descriptive Statistics
Central Tendency and Variability Chapter 4. Central Tendency >Mean: arithmetic average Add up all scores, divide by number of scores >Median: middle score.
2 Textbook Shavelson, R.J. (1996). Statistical reasoning for the behavioral sciences (3 rd Ed.). Boston: Allyn & Bacon. Supplemental Material Ruiz-Primo,
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Chapter 2 Describing Data with Numerical Measurements
Describing distributions with numbers
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
EPE/EDP 557 Key Concepts / Terms –Empirical vs. Normative Questions Empirical Questions Normative Questions –Statistics Descriptive Statistics Inferential.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
1.3 Psychology Statistics AP Psychology Mr. Loomis.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
PTP 560 Research Methods Week 8 Thomas Ruediger, PT.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
Measures of Spread Chapter 3.3 – Tools for Analyzing Data I can: calculate and interpret measures of spread MSIP/Home Learning: p. 168 #2b, 3b, 4, 6, 7,
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
1 MATB344 Applied Statistics Chapter 2 Describing Data with Numerical Measures.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Descriptive Statistics
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Basic Statistical Terms: Statistics: refers to the sample A means by which a set of data may be described and interpreted in a meaningful way. A method.
Numerical Measures of Variability
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
PROBABILITY AND STATISTICS WEEK 1 Onur Doğan. What is Statistics? Onur Doğan.
BASIC STATISTICAL CONCEPTS Chapter Three. CHAPTER OBJECTIVES Scales of Measurement Measures of central tendency (mean, median, mode) Frequency distribution.
IE(DS)1 Descriptive Statistics Data - Quantitative observation of Behavior What do numbers mean? If we call one thing 1 and another thing 2 what do we.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
Descriptive Statistics Tabular and Graphical Displays –Frequency Distribution - List of intervals of values for a variable, and the number of occurrences.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
LIS 570 Summarising and presenting data - Univariate analysis.
Introduction to statistics I Sophia King Rm. P24 HWB
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
Chapter 2 Describing and Presenting a Distribution of Scores.
Descriptive Statistics(Summary and Variability measures)
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 2 Describing and Presenting a Distribution of Scores.
Figure 2-7 (p. 47) A bar graph showing the distribution of personality types in a sample of college students. Because personality type is a discrete variable.
Measures of dispersion
Chapter 6 ENGR 201: Statistics for Engineers
Description of Data (Summary and Variability measures)
Descriptive Statistics
Central Tendency.
An Introduction to Statistics
Descriptive Statistics
Descriptive and inferential statistics. Confidence interval
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Describing Data with Numerical Measures
Political Science 30 Political Inquiry
Advanced Algebra Unit 1 Vocabulary
Presentation transcript:

Statistics for Linguistics Students Michaelmas 2004 Week 1 Bettina Braun

Why calculating statistics? Describe and summarise the data E.g. examination results (out of 100) … Average mark/Spread of scores/Lowest and highest marks?/Comparison with other results (e.g. from last year’s?)

Population vs. Sample Population: total universe of all possible observations. Populations can be finite or infinite, real or theoretical –the IQ of all adult men in Britain –The outcome of an infinite number of flips of a coin Descriptive statitics are called parameters

Population vs. Sample (cont’d) Sample: Subset of observations drawn from a given population –The IQ scores of 100 adult men in Britain –The outcome of 50 flips of a coin Descriptive statitics from a sample are called statistics Note: In experimental research it is important to draw a representative, random sample that is not biased

Histograms: Frequency distribution of each event Data: Tutorial1.sav

Central tendency: mode and median Mode: Most frequent mark (Note: there may be multiple modes) Median: score from the middle of the list when ordered from lowest to highest. Cuts data into halves (doesn’t take account of values of all scores but only of the scores in middle position).

Central tendency: mean Mean: sum of scores divided by the number of scores Note on notation: Greek letters often used for population, roman letters used for statistic (properties of a sample)

Comparing measures of “central tendency” Mode: –quick if we have frequency distribution –Possible with categorical data Median: –Good estimate if we have abnormally large or small values (e.g. max aircraft speed of 450km/h, 480km/h, 500km/h, 530km/h, 600km/h, and 1100km/h) –Only influenced by values in the middle of ordered data Mean –Every score is taken into account –Some interesting properties  Most widely used

Types of variables Interval (scale): difference between consecutive numbers are of equal intervals (e.g. time, speed, distances). Precise measurements Ordinal: assignments of ranks that represent position along some ordered dimension (e.g. ranking people wrt their speed, 1 = fastest, 4 = slowest). No equal intervals Categorical (nominal): numerical categories, labels (e.g. brown = 1, blue = 2, green = 3) Question: on which type of data can we calculate a meaningful “central tendency”?

Spread of distributions: why?

Spread of distributions: range and quartiles Small spread often desirable as it indicates a high proportion of identical scores Large spread indicates large differences between individual scores Range: difference between highest and lowest score – rather crude measure Quartiles: cuts the ordered data into quarters (second quartile = median)

Median, quartiles, and outliers oOutlier (more than 1.5 box lengths above or below the box) Interquartile range *Extreme value (more than 3 box lengths below or above the box) Largestvaluewhich isnot outlier Upper quartile Median Lower quartile Smallestvaluewhich isnot outlier tutorial1.sav: simple bp, sep. var

Spread of the population: variance measures Variance: sum of squared deviations from the mean Variance = Standard deviation: square root of variance

Normal distribution (Gaussian distribution) Example: IQ scores, mean=100, sd=16 Mean = Median = Mode

Skewed distributions and measures of central tendency

Bimodal distributions

Normal distribution (Gaussian distribution) Example: IQ scores, mean=100, sd=16 Mean = Median = Mode

z-scores Z-score: deviation of given score from the mean in terms of standard deviations

How likely is a given event? Example: time to utter a particular sentence: x = 3.45s and sd =.84s Questions: –What proportion of the population of utterance times will fall below 3s? –What proportion would lie between 3s and 4s? –What is the time value below which we will find 1% of the data?