Descriptive statistics Petter Mostad 2005.09.08. Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 14 Descriptive Statistics 14.1Graphical Descriptions of Data 14.2Variables.
Random Sampling and Data Description
Measures of Dispersion
B a c kn e x t h o m e Frequency Distributions frequency distribution A frequency distribution is a table used to organize data. The left column (called.
Descriptive Statistics Summarizing data using graphs.
IB Math Studies – Topic 6 Statistics.
Chapter 2 Presenting Data in Tables and Charts
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Statistics for Decision Making Descriptive Statistics QM Fall 2003 Instructor: John Seydel, Ph.D.
Ch. 2: The Art of Presenting Data Data in raw form are usually not easy to use for decision making. Some type of organization is needed Table and Graph.
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
QM Spring 2002 Statistics for Decision Making Descriptive Statistics.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive statistics (Part I)
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Frequency Distributions and Graphs
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 1 Exploring Data
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
Stats 95 Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 3 Organizing and Displaying Data.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
Numerical Descriptive Techniques
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Basic Business Statistics Chapter 2:Presenting Data in Tables and Charts Assoc. Prof. Dr. Mustafa Yüzükırmızı.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Chapter 2 Describing Data.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Lecture 3 Describing Data Using Numerical Measures.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
DESCRIPTIVE STATISTICS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Data: Presentation and Description Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 2-1 Chapter 2 Presenting Data in Tables and Charts Statistics For Managers 4 th.
Unit 4 Statistical Analysis Data Representations.
BUSINESS STATISTICS I Descriptive Statistics & Data Collection.
MMSI – SATURDAY SESSION with Mr. Flynn. Describing patterns and departures from patterns (20%–30% of exam) Exploratory analysis of data makes use of graphical.
Engineering Statistics KANCHALA SUDTACHAT. Statistics  Deals with  Collection  Presentation  Analysis and use of data to make decision  Solve problems.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 2-1 Chapter 2 Presenting Data in Tables and Charts Basic Business Statistics 11 th Edition.
Types of Graphs.
Sampling ‘Scientific sampling’ is random sampling Simple random samples Systematic random samples Stratified random samples Random cluster samples What?
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
MATH 2311 Section 1.5. Graphs and Describing Distributions Lets start with an example: Height measurements for a group of people were taken. The results.
Statistical Methods © 2004 Prentice-Hall, Inc. Week 3-1 Week 3 Numerical Descriptive Measures Statistical Methods.
Class Two Before Class Two Chapter 8: 34, 36, 38, 44, 46 Chapter 9: 28, 48 Chapter 10: 32, 36 Read Chapters 1 & 2 For Class Three: Chapter 1: 24, 30, 32,
Chapter 5: Organizing and Displaying Data. Learning Objectives Demonstrate techniques for showing data in graphical presentation formats Choose the best.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Descriptive Statistics – Graphic Guidelines Pie charts – qualitative variables, nominal data, eg. ‘religion’ Bar charts – qualitative or quantitative variables,
ALL ABOUT THAT DATA UNIT 6 DATA. LAST PAGE OF BOOK: MEAN MEDIAN MODE RANGE FOLDABLE Mean.
Descriptive Statistics
Exploratory Data Analysis
Descriptive Statistics
Descriptive Statistics
Unit 4 Statistical Analysis Data Representations
Descriptive Statistics
Description of Data (Summary and Variability measures)
Topic 5: Exploring Quantitative data
Constructing and Interpreting Visual Displays of Data
Ten things about Descriptive Statistics
Probability and Statistics
Presentation transcript:

Descriptive statistics Petter Mostad

Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when you first get the data. Data presentation: Illustrating for others some conclusion with numbers or graphs based on the data.

Data exploration Understand description of variables Find ranges, typical values, distributions of variables –Is the data OK? Meaningful? Outliers? Errors? How do variables relate to each other? –Is it meaningful? As expected? Can you form new hypotheses?

Data presentation Remove superfluous information Present essential information fairly Present information efficiently Make it possible to understand information quickly and simply

Types of variables Numerical variables –Discrete –Continuous Categorical variables –Nominal values –Ordinal values

Histograms Subdivide continuous data into intervals, and display counts in intervals Desicion about width of intervals can influence result a lot ”Ogives”

Bar charts Can show variation between categories Grouped bars can compare variations in different groups Stacked bars can show proportions, or cumulative effects

Example Shows changing proportions of 8 types across 24 groups Groups: coexpressed genes Types: Types of organisms

Cumulative distributions Cumulates the proportions up to each level Can never decrease; goes from 0 to 1 (or 100%)

Stem-and-leaf diagrams A way to show both the distribution of numbers graphically, and the digits involved Age in years Stem-and-Leaf Plot Frequency Stem & Leaf 2,00 1. & 18, , , , , , , , , , & 7, & 1,00 7. & Stem width: 10 Each leaf: 2 case(s) & denotes fractional leaves.

Pie charts Illustrates percentages or parts well for comparison between the parts. 3D pies, or ”exploded” pies, distort more than they clarify the information

Pareto diagrams Focuses on the most important (frequent) categories. Shows cumulative frequences when including each category

Numerical summary statistics (Arithmetic) mean Median Mode Skewness Outliers Max, min, range

Arithmetic versus geometric mean Given observations x 1, x 2, …, x n Arithmetic mean: Geometric mean: They correspond to each other when the scale is changed by taking logarithms!

Measures of variability (Sample) variance (Sample) standard deviation Coefficient of variation

Percentiles and quartiles The x percentile is the number p such that x percent of the data is smaller than p. The first and third quartiles are the 25th and 75th percentiles, respectively The inter-quartile range is the difference between the third and first quartiles.

Boxplots ”Box and whisker plots” Sometimes shows min, 1st quartile, median, 3rd quartile, max May instead show some outliers separately

Scatterplots Probably the most useful graphical plot Can show any kind of connection between variables, not only linear Can be done for many pairs at a time (matrix plot), or for triplets (3D plot)

Covariance Given paired observations (x 1,y 1 ), (x 2,y 2 ), …, (x n, y n ) (sample) covariance: Positive when variables tend to change in the same direction, negative if opposite direction

Correlation coefficient Correlation coefficient: Always between -1 and 1 If exactly equal to 1, then points are on an increasing line Can be a more illustrative measure than covariance

Least squares line fitting We can illustrate a trend in the data by fitting a line

Fitting the line The line is often fitted by minimizing the sum of the squares of the ”errors” (the vertical distances to the line) We will hear much about regression methods later

Cross tables When items can be classified using two different categorical variables, we can illustrate counts in a cross table. If percentages are computed, they must be either relative to the columns or the rows. In multiway tables, more than two classifying variables are used.

Early example: Napoleons Russian campain

DNA sequence logos Used to show what is conserved, and what varies, at DNA binding sites for some protein Relative height of letters show which bases are conserved Total height shows degree of conservation

Chernoff faces A way to visualize about 20 parameters in one figure Background: We are good at remembering and comparing faces Features in the face correspond to parameters you want to visualize

Chernoff faces

Use your own creativity! When exploring data, try to make the kinds of plots that will answer your questions! When presenting data, think about –simplicity –fairness –efficiency –inventiveness