Spatial Statistics: Topic 31 Descriptive Statistics Assoc. Prof. Dr. Abdul Hamid b. Hj. Mar Iman Director Centre for Real Estate Studies Faculty of Engineering.

Slides:

Advertisements

Similar presentations

Class Session #2 Numerically Summarizing Data

Advertisements

Measures of Dispersion

Numerically Summarizing Data

Descriptive Statistics

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.

Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.

Principles of Statistics Assoc. Prof. Dr. Abdul Hamid b. Hj. Mar Iman Former Director, Centre for Real Estate Studies Faculty of Geoinformation Science.

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.

Techniques of Data Analysis (Basic Statistical Theory) Assoc. Prof. Dr. Abdul Hamid b. Hj. Mar Iman Former Director Centre for Real Estate Studies Faculty.

Descriptive Statistics

Analysis of Research Data

Introduction to Educational Statistics

B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.

Review What you have learned in QA 128 Business Statistics I.

Data observation and Descriptive Statistics

Central Tendency and Variability

12.3 – Measures of Dispersion

Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.

Describing Data: Numerical

Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.

Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.

Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:

1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)

Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.

© Copyright McGraw-Hill CHAPTER 3 Data Description.

Measures of Spread Chapter 3.3 – Tools for Analyzing Data I can: calculate and interpret measures of spread MSIP/Home Learning: p. 168 #2b, 3b, 4, 6, 7,

University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.

Measures of Variability OBJECTIVES To understand the different measures of variability To determine the range, variance, quartile deviation, mean deviation.

Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.

4 - 1 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

What is Business Statistics? What Is Statistics? Collection of DataCollection of Data –Survey –Interviews Summarization and Presentation of DataSummarization.

Chapter 2 Describing Data.

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.

Lecture 3 Describing Data Using Numerical Measures.

Skewness & Kurtosis: Reference

Warm up The following graphs show foot sizes of gongshowhockey.com users. What shape are the distributions? Calculate the mean, median and mode for one.

PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?

An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.

Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.

Fundamentals of Data Analysis Lecture 3 Basics of statistics.

Descriptive Statistics: Presenting and Describing Data.

Chapter 9 Statistics.

Central Tendency & Dispersion

ENGR 610 Applied Statistics Fall Week 2 Marshall University CITE Jack Smith.

Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.

Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8

Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.

LIS 570 Summarising and presenting data - Univariate analysis.

Introduction to statistics I Sophia King Rm. P24 HWB

Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,

1 Day 1 Quantitative Methods for Investment Management by Binam Ghimire.

Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.

Chapter 2 Describing and Presenting a Distribution of Scores.

Chapter ( 2 ) Strategies for understanding the meanings of Data : Learning outcomes Understand how data can be appropriately organized and displayed Understand.

Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.

Fundamentals of Data Analysis Lecture 3 Basics of statistics.

Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.

3.3 Measures of Spread Chapter 3 - Tools for Analyzing Data Learning goal: calculate and interpret measures of spread Due now: p. 159 #4, 5, 6, 8,

Data analysis and basic statistics KSU Fellowship in Clinical Pathology Clinical Biochemistry Unit

Analysis and Empirical Results

Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.

Descriptive Statistics

Description of Data (Summary and Variability measures)

Data analysis and basic statistics

PROBABILITY DISTRIBUTION

MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.

Presentation transcript:

Spatial Statistics: Topic 31 Descriptive Statistics Assoc. Prof. Dr. Abdul Hamid b. Hj. Mar Iman Director Centre for Real Estate Studies Faculty of Engineering and Geoinformation Science Universiti Tekbnologi Malaysia Skudai, Johor Spatial Statistics (SGG 2413)

Spatial Statistics: Topic 32 Learning Objectives Overall: To give students a basic understanding of descriptive statistics Specific: Students will be able to: * understand the basic concept of descriptive statistics * understand the concept of distribution * can calculate measures of central tendency dispersion * can calculate measures of kurtosis and skewness

Spatial Statistics: Topic 33 Contents What is descriptive statistics Central tendency, dispersion, kurtosis, skewness Distribution

Spatial Statistics: Topic 34 Use sample information to explain/make abstraction of population “phenomena”. Common “phenomena”: * Association (e.g. σ 1,2.3 = 0.75) * Tendency (left-skew, right-skew) * Trend, pattern, location, dispersion, range * Causal relationship (e.g. if X then Y) Emphasis on meaningful characterisation of data (e.g. central tendency, variability), graphics, and description Use non-parametric analysis (e.g.  2, t-test, 2-way anova) Descriptive Statistics

Spatial Statistics: Topic 35 E.g. of Abstraction of phenomena

Spatial Statistics: Topic 36 Using sample statistics to infer some “phenomena” of population parameters Common “phenomena”: cause-and-effect * One-way r/ship * Feedback r/ship * Recursive Use parametric analysis (e.g. α and  ) through regression analysis Emphasis on hypothesis testing Y1 = f(Y2, X, e1) Y2 = f(Y1, Z, e2) Y 1 = f(X, e 1 ) Y 2 = f(Y 1, Z, e 2 ) Y = f(X) Inferential Statistics

Spatial Statistics: Topic 37 Statistical analysis that attempts to explain the population parameter using a sample E.g. of statistical parameters: mean, variance, std. dev., R 2, t-value, F-ratio,  xy, etc. It assumes that the distributions of the variables being assessed belong to known parameterised families of probability distributionsprobability distributions Parametric statistics

Spatial Statistics: Topic 38 Examples of parametric relationship Dep=9t – Dep=7t – 192.6

Spatial Statistics: Topic 39 First used by Wolfowitz (1942) Statistical analysis that attempts to explain the population parameter using a sample without making assumption about the frequency distribution of the assessed variable In other words, the variable being assessed is distribution-free E.g. of non-parametric statistics: histogram, stochastic kernel, non-parametric regression Non-parametric statistics

Spatial Statistics: Topic 310 DS gather information about a population characteristic (e.g. income) and describe it with a parameter of interest (e.g. mean) IS uses the parameter to test a hypothesis pertaining to that characteristic. E.g. H o : mean income = RM 4,000 H 1 : mean income < RM 4,000) The result for hypothesis testing is used to make inference about the characteristic of interest (e.g. Malaysian  upper middle income) Descriptive & Inferential Statistics (DS & IS)

Spatial Statistics: Topic 311 MeasureAdvantagesDisadvantages Mean (Sum of all values ÷ no. of values)  Best known average  Exactly calculable  Make use of all data  Useful for statistical analysis  Affected by extreme values  Can be absurd for discrete data (e.g. Family size = 4.5 person)  Cannot be obtained graphically Median (middle value)  Not influenced by extreme values  Obtainable even if data distribution unknown (e.g. group/aggregate data)  Unaffected by irregular class width  Unaffected by open-ended class  Needs interpolation for group/ aggregate data (cumulative frequency curve)  May not be characteristic of group when: (1) items are only few; (2) distribution irregular  Very limited statistical use Mode (most frequent value)  Unaffected by extreme values  Easy to obtain from histogram  Determinable from only values near the modal class  Cannot be determined exactly in group data  Very limited statistical use Sample Statistics: Central Tendency

Spatial Statistics: Topic 312 Central Tendency – Mean For individual observations,. E.g. X = {3,5,7,7,8,8,8,9,9,10,10,12} = 96 ; n = 12 Thus, = 96/12 = 8 The above observations can be organised into a frequency table and mean calculated on the basis of frequencies = 96; = 12 Thus, = 96/12 = 8 x f fx

Spatial Statistics: Topic 313 Central Tendency - Mean and Mid-point Let say we have data like this: LocationMinMax Town A Town B Price (RM ‘000/unit) of Shop Houses in Skudai Can you calculate the mean?

Spatial Statistics: Topic 314 Central Tendency - Mean and Mid-point (contd.) Let’s calculate: Town A: ( )/2 = 339 Town B: ( )/2 = 375 Are these figures means? M = ½(Min + Max)

Spatial Statistics: Topic 315 Central Tendency - Mean and Mid-point (contd.) Let’s say we have price data as follows: Town A: 228, 295, 310, 420, 450 Town B: 320, 295, 310, 400, 430 Calculate the means? Town A: Town B: Are the results same as previously?  Be careful about mean and “mid-point”!

Spatial Statistics: Topic 316 Central Tendency – Mean of Grouped Data House rental or prices in the PMR are frequently tabulated as a range of values. E.g. What is the mean rental across the areas? = 23; = Thus, = /23 = Rental (RM/month) Mid-point value (x) Number of Taman (f)59621 fx

Spatial Statistics: Topic 317 Central Tendency – Median Let say house rentals in a particular town are tabulated: Calculation of “median” rental needs a graphical aids→ Rental (RM/month) Number of Taman (f)35962 Rental (RM/month)>135> 140> 145> 150> 155 Cumulative frequency Median = (n+1)/2 = (25+1)/2 =13 th. Taman 2. (i.e. between 10 – 15 points on the vertical axis of ogive). 3. Corresponds to RM /month on the horizontal axis 4. There are (17-8) = 9 Taman in the range of RM /month 5. Taman 13 th. is 5 th. out of the 9 Taman 6. The rental interval width is 5 7. Therefore, the median rental can be calculated as: (5/9 x 5) = RM 142.8

Spatial Statistics: Topic 318 Central Tendency – Median (contd.)

Spatial Statistics: Topic 319 Central Tendency – Quartiles (contd.) Upper quartile = ¾(n+1) = 19.5 th. Taman UQ = (3/7 x 5) = RM 147.1/month Lower quartile = (n+1)/4 = 26/4 = 6.5 th. Taman LQ = (3.5/5 x 5) = RM138.5/month Inter-quartile = UQ – LQ = – = 8.6 th. Taman IQ = (4/5 x 5) = RM 142.5/month Following the same process as in calculating “median”:

Spatial Statistics: Topic 320 Variability Indicates dispersion, spread, variation, deviation For single population or sample data: where σ 2 and s 2 = population and sample variance respectively, x i = individual observations, μ = population mean, = sample mean, and n = total number of individual observations. The square roots are: standard deviation standard deviation

Spatial Statistics: Topic 321 Variability (contd.) Why “measure of dispersion” important? Consider yields of two plant species: * Plant A (ton) = {1.8, 1.9, 2.0, 2.1, 3.6} * Plant B (ton) = {1.0, 1.5, 2.0, 3.0, 3.9} Mean A = mean B = 2.28% But, different variability! Var(A) = 0.557, Var(B) = * Would you choose to grow plant A or B?

Spatial Statistics: Topic 322 Variability (contd.) Coefficient of variation – CV – std. deviation as % of the mean: A better measure compared to std. dev. in case where samples have different means. E.g. * Plant X (ton/ha) = {1.2, 1.4, 2.6, 2.7, 3.9} * Plant Y (ton/ha) = {1.4, 1.5, 2.1, 3.2, 3.9}

Spatial Statistics: Topic 323 Farm No. Yield (ton/ha) Species X Species Y Mean Var Variability (cont.) Calculate CV for both species. CV x = (1.2/2.36) x 100 = 50.97% CV y = (1.2/2.42) x 100 = 49.46%  Species X is a little more variable than species Y

Spatial Statistics: Topic 324 Variability (cont.) Std. dev. of a frequency distribution E.g. age distribution of second-home buyers (SHB):

Spatial Statistics: Topic 325 Probability distribution If there 20 lecturers, the probability that A becomes a professor is: p = 1/20 = 0.05 Out of 100 births, half of them were girls (p=0.5), as the number increased to 1,000, two-third were girls (p=0.67) but from a record of 10,000 new-born babies, three-quarter were girls (p=0.75) The probability of a drug addict recovering from addiction is 50:50 General rule: No. of times event X occurs Pr (event X) = Total number of occurrences Probability of certain event X to occur has a specific form of distribution Logical probability: Experiential probability: Subjective probability:

Spatial Statistics: Topic 326 Probability Distribution Dice1 Dice Classical example oftossing What is the distribution of the sum of tosses?

Spatial Statistics: Topic 327 Probability Distribution (contd.) Values of x are discrete (discontinuous) Sum of lengths of vertical bars  p(X=x) = 1 all x Discrete variable

Spatial Statistics: Topic 328 Probability Distribution (cont.) AgeFreqProb Total Age distribution of second-home buyers in probability histogram Pr (Area under curve) = 1 Continuous variable Mean = 39.5 Std. dev = 2.45

Spatial Statistics: Topic 329 Pr (Age ≤ 36) = 0.02 Pr (Age ≤ 37) = Pr (Age ≤ 36) + Pr (Age = 37) = = 0.09 Pr (Age ≤ 38) = Pr (Age ≤ 37) + Pr (Age = 38) = = 0.13 Pr (Age ≤ 39) = Pr (Age ≤ 38) + Pr (Age = 39) = = 0.31 Pr (Age ≤ 40) = Pr (Age ≤ 39) + Pr (Age = 40) = = 0.67 Pr (Age ≤ 41) = Pr (Age ≤ 40) + Pr (Age = 41) = = 0.81 Pr (Age ≤ 42) = Pr (Age ≤ 41) + Pr (Age = 42) = = 0.91 Pr (Age ≤ 43) = Pr (Age ≤ 42) + Pr (Age = 43) = = 1.00 Probability Distribution (cont.)  Cumulative probability corresponds to the left tail of a distribution

Spatial Statistics: Topic 330 As larger and larger samples are drawn, the probability distribution is getting smoother Tens of different types of probability distribution: Z, t, F, gamma, etc Most important: normal distribution Larger sample Very large sample Probability Distribution (cont.)

Spatial Statistics: Topic 331 Normal Distribution - ND Salient features of ND: * Bell-shaped, symmetrical * Total area under curve = 1 * Area under curve between any two points = prob. of values in that range (shaded area) * Prob. of any exact value = 0 * Has a function of: μ = mean of variable x; σ = std. dev. of x; π = ratio of circumference of a circle to its diameter = 3.14; e = base of natural log =

Spatial Statistics: Topic 332 Normal Distribution - ND Population 1 Population 2  1  2 11 22 * A larger population has narrower base (smaller variance) *  determines location while  determines shape of ND

Spatial Statistics: Topic 333 Normal Distribution (cont.) * Has a mean  and a variance  2, i.e. X  N( ,  2 ) * Has the following distribution of observation: “Home-buyers example…” Mean age = 39.3 Std. dev = 2.42

Spatial Statistics: Topic 334 Standard Normal Distribution (SND) Since different populations have different  and  (thus, locations and shapes of distribution), they have to be standardised. Most common standardisation: standard normal distribution (SND) or called Z-distribution  (X=x) is given by area under curve Has no standard algebraic method of integration → Z ~ N(0,1) To transform f(x) into f(z): x - µ Z = ~ N(0, 1) σ

Spatial Statistics: Topic 335 Z-Distribution Probability is such a way that: * Approx. 68% -1< z <1 * Approx. 95% < z < 1.96 * Approx. 99% < z < 2.58

Spatial Statistics: Topic 336 Z-distribution (cont.) When X= μ, Z = 0, i.e. When X = μ + σ, Z = 1 When X = μ + 2σ, Z = 2 When X = μ + 3σ, Z = 3 and so on. It can be proven that P(X 1 <X< X k ) = P(Z 1 <Z< Z k ) SND shows the probability to the right of any particular value of Z.

Spatial Statistics: Topic 337 Normal distribution…Questions A study found that the mean age, A of second-home buyers in Johor Bahru is 39.3 years old with a variance of RM 2.45.Assuming normality, how sure are you that the mean age is: (a) ≥ 40 years old; (b) 39 to 42 years old? Answer (a): P(A ≥ 40) = P[Z ≥ (40 – 39.3)/2.4] = P(Z ≥  ) = (b) P(39 ≤ A ≤ 42) = P(A ≥ 39) – P(A ≥ 42) = – P[A ≥ ( )/2.4] = – P(A ≥ 1.125) = – = Always remember: to convert to SND, subtract the mean and divide by the std. dev. Use Z-table!Z-table

Spatial Statistics: Topic 338 “Student’s t-Distribution” Similar to Z-distribution (bell-shaped, symmetrical) Has a function of where  = gamma distribution; v = n-1 = d.o.f;  = Flatter with thicker tails Distributed with t  (0,σ) and -∞ < t < +∞ As n→∞ t  (0,σ) → N(0,1) Probability calculation requires information on d.o.f.

Spatial Statistics: Topic 339 How Are t-dist. and Z-dist. Related? Using central limit theorem,  N( ,  2 /n) will become z  N(0, 1) as n→∞  For a large sample, t-dist. of a variable or a parameter is given by: The interval of critical values for variable, x is:

Spatial Statistics: Topic 340 Skewness, m 3 & Kurtosis, m 4 Skewness, m 3 measures degree of symmetry of distribution Kurtosis, m 4 measures its degree of peakness Both are useful when comparing sample distributions with different shapes Useful in data analysis X i = indivudal sample observation, = sample mean;  = std. deviation; n = sample size

Spatial Statistics: Topic 341 Skewness BimodalUniformJ-shaped Perfectly normal (zero skew) Right (+ve) skewLeft (-ve) skew   

Spatial Statistics: Topic 342 Kurtosis Mesokurtic (normal) (zero kurtosis) Leptokurtic (high peak) (+ve kurtosis) Platykurtic (low peak) (-ve kurtosis) Mesokurtic distribution…kurtosis = 3 Leptokurtic distribution…kurtosis < 3 Platykurtoc distribution…kurtosis > 3

Spatial Statistics: Topic 343 X-coord. (000) Y-coord. (000) Trees with Ganoderma Occurrence of ganoderma X-coord. (000) Y-coord. (000) Trees with ganoderma Occurrence of ganoderma

Spatial Statistics: Topic 344 Al p.p.m.Freq E.g. Al H 2 ++ O -- → Al 2 O + H 2 sum mean     skew0.77 kurtosis13.44 Aluminium residues in the soil

Spatial Statistics: Topic 345 E.g. W C M = (( ) 2 + ( ) 2 ) 0.5 = ( ) 0.5 = 2.28 (i.e. 2,280 m) Measures of spatial separation Weighted mean centre (X coord. ) = Weighted mean centre (Y coord. ) = Standard distance = Distance (x1,y1) and (x2,y2) =

Spatial Statistics: Topic 346 Occurrence of ganoderma Sum  f = X w = Y w = (X w - ) 2 =588.46(Y w - ) 2 = Weighted mean centre Standard distance1.84 Point to point distance (e.g.) x-dist.5.00 y-dist.0.17 Distance Wc-M2.27 Spatial distribution –

Spatial Statistics: Topic 347 Spatial distribution – point data Ethnic distribution of residence

Spatial Statistics: Topic 348 Ethnic distribution of residence k =  (fx) -1 Test statistics -8.15tctc 0.12  CV 0.02CV 0.01 2  (x- ) 2 fxfx H o :  2 = (pattern is random) H 1 :  2 > (pattern is clustered) or  2 < (pattern is scattered) X = no. of observations per quadrat; f = frequency of quadrats; =  (fx)/  f;  2 =  (x- ) 2 /  (fx) -1; CV =  2 / ;  CV = (2/(k-1)) ½. Reject Ho…residence pattern is scattered