22.1 IntroductionDescriptive statistics involves the arrangement, summary, and presentation of data, to enable meaningful interpretation, and to support decision making.Descriptive statistics methods make use ofgraphical techniquesnumerical descriptive measures.The methods presented apply to boththe entire populationthe population sample
32.2 Types of data and information A variable - a characteristic of population or sample that is of interest for us.Cereal choiceCapital expenditureThe waiting time for medical servicesData - the actual values of variablesInterval data are numerical observationsNominal data are categorical observationsOrdinal data are ordered categorical observations
5Types of data - examples Interval dataNominal dataWith nominal data,all we can do is,calculate the proportionof data that falls intoeach category.Age - income. .Weight gain+10+5.IBM Dell Compaq Other Total50% % % %
6Types of data – analysis Knowing the type of data is necessary to properly select the technique to be used when analyzing data.Type of analysis allowed for each type of dataInterval data – arithmetic calculationsNominal data – counting the number of observation in each categoryOrdinal data - computations based on an ordering process
7Cross-Sectional/Time-Series Data Cross sectional data is collected at a certain point in timeMarketing survey (observe preferences by gender, age)Test score in a statistics courseStarting salaries of an MBA program graduatesTime series data is collected over successive points in timeWeekly closing price of goldAmount of crude oil imported monthly
82.3 Graphical Techniques for Interval Data Example 2.1: Providing information concerning the monthly bills of new subscribers in the first month after signing on with a telephone company.Collect dataPrepare a frequency distributionDraw a histogram
9Example 2.1: Providing information Collect dataPrepare a frequency distributionHow many classes to use?Number of observations Number of classesLess then,1,000 – 5,5, ,More than 50,Class width = [Range] / [# of classes][ ] /  =(There are 200 data pointsLargestobservationLargestobservationLargestobservationLargestobservationSmallestobservationSmallestobservationSmallestobservationSmallestobservation
10Example 2.1: Providing information Draw a Histogram
11Example 2.1: Providing information What information can we extract from this histogramRelatively,large numberof large billsAbout half of allthe bills are smallA few bills are inthe middle range8071+37=108=32=6060Frequency4020153045607590105120Bills
12Relative frequencyIt is often preferable to show the relative frequency (proportion) of observations falling into each class, rather than the frequency itself.Relative frequencies should be used whenthe population relative frequencies are studiedcomparing two or more histogramsthe number of observations of the samples studied are differentClass relative frequency =Class frequencyTotal number of observations
13Class widthIt is generally best to use equal class width, but sometimes unequal class width are called for.Unequal class width is used when the frequency associated with some classes is too low. Then,several classes are combined together to form a wider and “more populated” class.It is possible to form an open ended class at the higher end or lower end of the histogram.
14Shapes of histograms Symmetry There are four typical shape characteristics
15Shapes of histogramsSkewnessNegatively skewedPositively skewed
16Modal classes A unimodal histogram A modal class is the one with the largest number of observations.A unimodal histogramThe modal class
17Modal classesA bimodal histogramA modal classA modal class
18Bell shaped histograms Many statistical techniques require that the population be bell shaped.Drawing the histogram helps verify the shape of the population in question
19Interpreting histograms Example 2.2: Selecting an investmentAn investor is considering investing in one out of two investments.The returns on these investments were recorded.From the two histograms, how can the investor interpret theExpected returnsThe spread of the return (the risk involved with each investment)
20Example 2.2 - Histograms Return on investment A Return on investment B The center for BThe centerfor A18-16-14-12-10-8-6-4-2-0-18-16-14-12-10-8-6-4-2-0-Return on investment AReturn on investment BInterpretation: The center of the returns of Investment A is slightly lower than that for Investment B
21Example 2.2 - Histograms Return on investment A Return on investment B Sample size =50Sample size =5018-16-14-12-10-8-6-4-2-0-18-16-14-12-10-8-6-4-2-0-171634264643Return on investment AReturn on investment BInterpretation: The spread of returns for Investment A is less than that for investment B
22Example 2.2 - Histograms Return on investment A Return on investment B 18-16-14-12-10-8-6-4-2-0-18-16-14-12-10-8-6-4-2-0-Return on investment AReturn on investment BInterpretation: Both histograms are slightly positively skewed. There is a possibility of large returns.
23Providing information Example 2.2: ConclusionIt seems that investment A is better, because:Its expected return is only slightly below that of investment BThe risk from investing in A is smaller.The possibility of having a high rate of return exists for both investment.
24Interpreting histograms Example 2.3: Comparing students’ performanceStudents’ performance in two statistics classes were compared.The two classes differed in their teaching emphasisClass A – mathematical analysis and development of theory.Class B – applications and computer based analysis.The final mark for each student in each course was recorded.Draw histograms and interpret the results.
25Interpreting histograms The mathematical emphasiscreates two groups, and a larger spread.
26Stem and Leaf DisplayThis is a graphical technique most often used in a preliminary analysis.Stem and leaf diagrams use the actual value of the original observations (whereas, the histogram does not).
27Stem and Leaf Display Split each observation into two parts. There are several ways of doing that:Observation:Stem Leaf 42 19Stem Leaf4 2A stem and leaf display for Example 2.1 will use this method next.
28Stem and Leaf DisplayA stem and leaf display for Example 2.1 Stem LeafThe length of each line represents the frequency of the class defined bythe stem.
29} Ogives Ogives are cumulative relative frequency distributions. Example continued1201.000105.93090.790}}75.70060.650.540.605.355153045
302.4 Graphical Techniques for Nominal data The only allowable calculation on nominal data is to count the frequency of each value of a variable.When the raw data can be naturally categorized in a meaningful manner, we can display frequencies byBar charts – emphasize frequency of occurrences of the different categories.Pie chart – emphasize the proportion of occurrences of each category.
31The Pie ChartThe pie chart is a circle, subdivided into a number of slices that represent the various categories.The size of each slice is proportional to the percentage corresponding to the category it represents.
32The Pie ChartExample 2.4The student placement office at a university wanted to determine the general areas of employment of last year school graduates.Data was collected, and the count of the occurrences was recorded for each area.These counts were converted to proportions and the results were presented as a pie chart and a bar chart.
33The Pie Chart (28.9 /100)(3600) = 1040 Other 11.1% Accounting 28.9% Generalmanagement14.2%Finance20.6%Marketing25.3%
34The Bar Chart Rectangles represent each category. The height of the rectangle represents the frequency.The base of the rectangle is arbitrary7364523628
35The Bar ChartUse bar charts also when the order in which nominal data are presented is meaningful.Total number of new products introduced in North America in the years 1989,…,199420,00015,00010,0005,000‘ ‘ ‘ ‘ ‘ ‘94
362.5 Describing the Relationship Between Two Variables We are interested in the relationship between two interval variables.Example 2.7A real estate agent wants to study the relationship between house price and house sizeTwelve houses recently sold are sampled and there size and price recordedUse graphical technique to describe the relationship between size and price.Size Price315229335261……………..
372.5 Describing the Relationship Between Two Variables SolutionThe size (independent variable, X) affects the price (dependent variable, Y)We use Excel to create a scatter diagramYThe greater the house size,the greater the priceX
38Typical Patterns of Scatter Diagrams Positive linear relationshipNo relationshipNegative linear relationshipNegative nonlinear relationshipNonlinear (concave) relationshipThis is a weak linear relationship. A non linear relationship seems tofit the data better.
39Graphing the Relationship Between Two Nominal Variables We create a contingency table.This table lists the frequency for each combination of values of the two variables.We can create a bar chart that represent the frequency of occurrence of each combination of values.
40Contingency table Example 2.8 To conduct an efficient advertisement campaign the relationship between occupation and newspapers readership is studied. The following table was created (To see the data click Xm02-08a)
41Contingency table Solution If there is no relationship between occupation and newspaper read, the bar charts describing the frequency of readership of newspapers should look similar across occupations.
42Bar charts for a contingency table Blue-collar workers preferthe “Star” and the “Sun”.White-collar workers and professionals mostly read the“Post” and the “Globe and Mail”
432.6 Describing Time-Series Data Data can be classified according to the time it is collected.Cross-sectional data are all collected at the same time.Time-series data are collected at successive points in time.Time-series data is often depicted on a line chart (a plot of the variable over time).
44Line ChartExample 2.9The total amount of income tax paid by individuals in 1987 through 1999 are listed below.Draw a graph of this data and describe the information produced
45Line Chart For the first five years – total tax was relatively flat From 1993 there was a rapid increase in tax revenues.Line charts can be used to describe nominal data time series.