Map Generalization and Data Classification Gary Christopherson

Slides:

Advertisements

Similar presentations

Describing Quantitative Variables

Advertisements

Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.

TYPES OF DATA. Qualitative vs. Quantitative Data A qualitative variable is one in which the “true” or naturally occurring levels or categories taken by.

Agricultural and Biological Statistics

Introduction to Data Analysis

Statistics. The usual course of events for conducting scientific work “The Scientific Method” Reformulate or extend hypothesis Develop a Working Hypothesis.

Why do we do statistics? To Make Inferences from a Small number of cases to a Large number of cases This means that we have to collect data.

QM Spring 2002 Statistics for Decision Making Descriptive Statistics.

Today: Central Tendency & Dispersion

@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.

Objective To understand measures of central tendency and use them to analyze data.

Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.

Chapter 3 Statistical Concepts.

Data Presentation.

CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Univariate Data Analysis.

© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.

Price Ch. 2 Mapping GIS Data ‣ GIS Concepts GIS Concepts Ways to map data Displaying rasters Classifying numeric data.

Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.

1 Describing distributions with numbers William P. Wattles Psychology 302.

1 Concepts of Variables Greg C Elvers, Ph.D.. 2 Levels of Measurement When we observe and record a variable, it has characteristics that influence the.

© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.

I Introductory Material A. Mathematical Concepts Scientific Notation and Significant Figures.

University of Sunderland CSEM03 R.E.P.L.I. Unit 1 CSEM03 REPLI Research and the use of statistical tools.

Review Data: {2, 5, 6, 8, 5, 6, 4, 3, 2, 1, 4, 9} What is F(5)? A.2 B.4 C.6 D.8.

Chapter Eight: Using Statistics to Answer Questions.

Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.

IE(DS)1 Descriptive Statistics Data - Quantitative observation of Behavior What do numbers mean? If we call one thing 1 and another thing 2 what do we.

Symbolizing and Classifying How to improve your displayed data. ?

Exploratory data analysis, descriptive measures and sampling or, “How to explore numbers in tables and charts”

Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.

Statistics Vocabulary. 1. STATISTICS Definition The study of collecting, organizing, and interpreting data Example Statistics are used to determine car.

Descriptive Statistics

Introduction to Quantitative Research

INTRODUCTION TO STATISTICS

Measurements Statistics

Different Types of Data

Central Tendency & Scale Types

Chapter 2 Mapping GIS Data.

Measures of Central Tendency

Introduction to Summary Statistics

Key Terms Symbology Categorical attributes Style Layer file.

APPROACHES TO QUANTITATIVE DATA ANALYSIS

Review Data: {2, 5, 6, 8, 5, 6, 4, 3, 2, 1, 4, 9} What is F(5)? 2 4 6

PROGRAMME 27 STATISTICS.

Data Representation and Mapping

Introduction to Summary Statistics

Introduction to Summary Statistics

Introduction to Summary Statistics

Descriptive Statistics

Introduction to Statistics

Introduction to Summary Statistics

Basic Statistical Terms

Lesson 1: Summarizing and Interpreting Data

Introduction to Summary Statistics

Quantitative vs. Qualitative Data

PBH 616: Quantitative Research Method

Describing distributions with numbers

Introduction to Summary Statistics

Ms. Saint-Paul A.P. Psychology

Honors Statistics Review Chapters 4 - 5

Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics

Myers Chapter 1 (F): Statistics in Psychological Research: Measures of Central Tendency A.P. Psychology.

DESCRIPTIVE STATISTICS BORAM KANG

Introduction to Summary Statistics

Chapter Nine: Using Statistics to Answer Questions

Statistics Definitions

Introduction to Summary Statistics

Advanced Algebra Unit 1 Vocabulary

Biostatistics Lecture (2).

Presentation transcript:

Map Generalization and Data Classification Gary Christopherson 4/4/2019 Map Generalization and Data Classification Gary Christopherson

Review/Preview Everything we talked about before the midterm Everything we will be talking about before the final Data classification Why classify data Classification rules How to classify data Map types Map layout 4/4/2019

Data Classification The process of sorting or arranging entities into groups or categories On a map, the process of representing members of a group by the same symbol, usually defined in a legend. 4/4/2019

What’s the point? When there are too many data values on a map it can lose its power to tell a story or make a point 4/4/2019

How Many Symbols?? -- 133 4/4/2019

How Many Symbols?? -- 1 4/4/2019

How Many Symbols?? -- 3 4/4/2019

How Many Symbols?? 1107 SYMBOLS 4 SYMBOLS 4/4/2019

Too Many Colors?? 4/4/2019

Jenks and Coulson’s Classification Rules Encompass the full range of the data. Have neither overlapping values nor vacant classes. Be great enough in number to avoid sacrificing the accuracy of the data, but not so numerous as to impute a greater degree of accuracy than is warranted by the nature of the collected data. Divide the data into reasonably equal groups of observations. Have a logical mathematical relationship if practical. 4/4/2019

Map Abstraction Process 4/4/2019

Nominal Scale Data Nominal scale data merely establish identity A phone number signifies only the unique identity of the phone jack or the cell phone In a race, the numbers used to identify individual racers are at a nominal scale These identity numbers do not indicate order or relative value 1. Nominal · on a nominal scale, numbers merely establish identity · e.g. a phone number signifies only the unique identity of the phone · in the race, the numbers issued to racers which are used to identify individuals are on a nominal scale · these identity numbers do not indicate any order or relative value in terms of the race outcome 4/4/2019

Nominal, or Categorical Data Qualitative Dealing with qualitative data that is ordered but without a measurable range There are no absolute rules for this kind of classification, just general guidelines : Features in different classes or categories should be more dissimilar than similar and should be symbolized differently Features in the same class or category should be more similar than dissimilar and should be symbolized similarly 4/4/2019

Nominal/Categorical Symbolization 4/4/2019

Nominal/Categorical Symbolization 4/4/2019

Nominal/Categorical Symbolization 4/4/2019

Ordinal Scale Data Ordinal Numbers establish order only In the race, the finishing places of each racer, i.e. 1st place, 2nd place, 3rd place, are measured on an ordinal scale. The numbers mean something relative to each other, but we do not know how much time difference there is between each racer 2. Ordinal · on an ordinal scale, numbers establish order only · phone number 9618224 is not more of anything than 9618049, so phone numbers are not ordinal · in the race, the finishing places of each racer, i.e. 1st place, 2nd place, 3rd place, are measured on an ordinal scale. The numbers mean something relative to each other BUT · we do not know how much time difference there is between each racer 4/4/2019

Ordinal Data Quantitative Dealing with Quantitative data that is ordered but without a measurable range Ordinal Classes show relative values, not absolute values In this example, 1 is less than 3, but we don’t know how much less it is Using numbers to label ordinal data is often confusing But be careful to use text that does not imply absolute values 4/4/2019

Ordinal Data 4/4/2019

Interval Scale Data On an interval scale, the difference (interval) between numbers is meaningful, but the numbering scale does not start at zero – i.e. no absolute zero Subtraction makes sense but division does not 200C is 100 degrees warmer than 100C, but you can’t say that it is twice as hot In the race: the time of day that each racer finished is measured on an interval scale If racers finished at 9:10, 9:20 and 18:20, then racer one finished 10 minutes before racer two But the racer finishing at 9:10 did not finish twice as fast as the racer finishing at 18:20 3. Interval · on interval scales, the difference (interval) between numbers is meaningful, but the numbering scale does not start at 0 · subtraction makes sense but division does not · e.g. it makes sense to say that 200C is 100 degrees warmer than 100C, so Celsius temperature is an interval scale, but 200C is not twice as warm as 100C · e.g. it makes no sense to say that the phone number 9680244 is 62195 more than 9618049, so phone numbers are not measurements on an interval scale · in the race, the time of the day that each racer finished is measured on an interval scale · if the racers finished at 9:10 GMT, 9:20 GMT and 9:25 GMT, then racer one finished 10 minutes before racer 2 and the difference between racers 1 and 2 is twice that of the difference between racers 2 and 3 · however, the racer finishing at 9:10 GMT did not finish twice as fast as the racer finishing at 18:20 GMT 4/4/2019

Interval Data Quantitative, but deals with quantitative data that has no absolute zero – so subtraction works but division does not Interval Classes show a range of values In this example the classes show a range of low elevations for states Notice the negative numbers – this is why these values are interval scale, not ratio scale 4/4/2019

Ratio Scale Data On a ratio scale, measurement has an absolute zero and the difference between numbers is significant … Division makes sense A 50 kg person weighs half as much as a 100 kg person, so weight in kg is on a ratio scale Is weight in pounds on a ratio scale? Is temperature on a ratio scale? In the race: the first place finisher finished in a time of 2:30, the second in 2:40 and the 450th place finisher took 5 hours The 450th finisher took twice as long as the first place finisher (5/2.5 = 2) Allows direct comparison 4. Ratio · on a ratio scale, measurement has an absolute zero and the difference between numbers is significant · division makes sense · e.g. it makes sense to say that a 50 kg person weighs half as much as a 100 kg person, so weight in kg is on a ratio scale. Clearly the same is true for pounds. A 150 # person weighs half as much as a 300 pd person. · the zero point of weight is absolute but the zero point of the Celsius scale (used above) is not · in our race, the first place finisher finished in a time of 2:30, the second in 2:40 and the 450th place finisher took 5 hours · the 450th finisher took twice as long as the first place finisher (5/2.5 = 2) 4/4/2019

Ratio Scale Data Quantitative data that has an absolute zero – so both subtraction and division work Ratio scale classes show a range of numeric values In this example the classes show a range of population for states Notice there are no negative numbers – this is why these values are ratio and not interval scale data 4/4/2019

Data Classification Most maps use data that have been classified Number of classes is usually between 5 and 10, more likely 5 than 10 Classification methods vary depending on data and on the story you are telling ArcGIS includes a number of different classification schemes 4/4/2019

Data Classification Best carried out in the context of a histogram X-axis shows data values – here the number of farms in a county Y-axis shows frequency – here the number of counties Gray bars show the number of observations – here the number of counties for each data value Blue lines divide data into classes of aggregated data 4/4/2019

Data Classification There are nine standard classification schemes: natural breaks, optimization, nested means, mean and standard deviation, equal interval, quantile, arithmetic, geometric, and user defined Creating classes based on these schemes requires summary statistics and calculations – some simple and some difficult We will look at equal interval, quantile, standard deviation, and natural breaks 4/4/2019

Summary Statistics Mean (average) The sum of all values divided by the number of values in the set Mode The value that appears with the greatest frequency. Median The middle value of a set of values when they are ordered by rank, when there are 2 middle values (due to an even number in the set), the mean of those 2 numbers is used Standard Deviation The spread of values from their mean, calculated as the square root of the sum of the squared deviations from the mean value, divided by the number of elements. Also known as the square root of the variance 4/4/2019

Equal Interval Constant interval between classes – based on values along the x-axis Number of observations will be different from class to class Good if you want to make direct comparisons between different choropleth maps 4/4/2019

Calculating Equal Interval 1440 – 173 = 1277 1277 / 5 = 253.4 Calculating Equal Interval 173 +253 = 426; 426 + 253 = 679; etc Subtract minimum value from maximum value Divide the result of this subtraction by the number of classes you want The result of the division will be the width of each class Start with the minimum and add this value to get the width of the first class Continue adding this value to the sum of the previous class until all classes have been created 4/4/2019

Quantile Equal number of observations per class Because the number of observations will be the same from class to class, the interval between classes will be different Good classification scheme to use if certain statistical tests require equal numbers of observations 4/4/2019

Calculating Quantiles 92 / 5 = 18.4 Calculating Quantiles Divide the count of observations/features by the number of classes you want This will give you the number of features for each class Arrange your features from least to greatest value Divide them into classes so that the number of features in each class matches the result of your division equation 4/4/2019

Calculating Quantiles 92 / 5 = 18.4 Calculating Quantiles Divide the count of features by the number of classes you want This will give you the number of features for each class Arrange your features from least to greatest value Divide them into classes so that the number of features in each class matches the result of your division equation 4/4/2019

Jenks – Natural Breaks Minimizes variance within a class by dividing classes in areas where there are large breaks in the data Different sized classes, and different number of features Often the best choice for conveying information accurately to map readers Cannot be used to make direct comparisons between maps 4/4/2019

Calculating Jenks Don’t worry about this one It is a method of statistical data classification that partitions data into classes using an algorithm that calculates groupings of data values based on the data distribution. Jenks' optimization seeks to reduce variance within groups and maximize variance between groups. 4/4/2019

Mean and Standard Deviations Classes determined by the mean and deviations from the mean Best if data displays a normal distribution Usually symbolized using a diverging color scheme 4/4/2019

Classifying by Mean and Std. Dev. Calculate the mean of your data Calculate the standard deviation of your data Arrange your first class so that it straddles the mean Then add classes at intervals of std. dev. both above and below the mean class 4/4/2019

Same Data, Different Classification Schemes 4/4/2019

Jenks and Coulson’s rules Encompass the full range of the data. Have neither overlapping values nor vacant classes. Be great enough in number to avoid sacrificing the accuracy of the data, but not so numerous as to impute a greater degree of accuracy than is warranted by the nature of the collected data. Divide the data into reasonably equal groups of observations. Have a logical mathematical relationship if practical. 4/4/2019

Practice Put the following numbers into different classes Quantile – five classes Equal Interval – five classes 7, 1, 18, 20, 6, 14, 19, 13, 2, 1, 25, 2, 23, 1, 15 Quantile – (1,1,1) (2,2,6) (7,13,14) (15,18,19) (20,23,25) Equal Interval (1,1,1,2,2) (6,7) (13,14,15) (18,19,20) (23,25) 4/4/2019

Equal Interval 1440 – 170 = 1270 1270 / 5 = 254 170 + 254 = 424 + 254 = 678 + 254 = 932 + 254 = 1186 + 254 = 1440 424 678 932 1186 1440 4/4/2019

Mean and Standard Deviation Straddle the mean 630 – 125 = 505 630 + 125 = 755 505 – 250 = 255 755 + 250 = 1005 + 250 = 1255 255 505 755 1005 1255 4/4/2019