Presentation is loading. Please wait.

Presentation is loading. Please wait.

Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U.

Similar presentations


Presentation on theme: "Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U."— Presentation transcript:

1 Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

2 Variables In computer science and mathematics, a variable is a symbol denoting a quantity or symbolic representation. In mathematics, a variable often represents an unknown quantity; in computer science, it represents a place where a quantity can be stored. Variables are often contrasted with constants, which are known and unchanging. (Wikipedia, 2004)computer sciencemathematicsquantity symbolic representationconstants

3 The Two Types of Variables Independent Variable  a variable whose values are arbitrarily chosen  placed on the horizontal-axis  time is always independent (why?) Dependent Variable  a variable whose values depend on the independent variable  placed on the vertical-axis

4 Scatter Plots a graphical method of showing the joint distribution of two variables where each point on the graph indicates a pair of variables may show a trend or not a trend indicates a correlation that may be strong or weak, positive or negative, linear or non-linear

5 What is a trend? a pattern of average behavior that occurs over time a general “direction” that something tends toward for example there has been a trend towards increasing costs in Canada need two variables to exhibit a trend

6 An Example of a trend U.S. population from 1780 to 1960 what is the trend? is the trend linear?

7 Line of Best Fit the line of best fit is a line which best represents the trend in the data and is used for making predictions these can be drawn by hand but there are also methods for mathematically calculating them (median-median and least squares methods are examples that we will study) gives no indication of the strength of the trend (use the r or r 2 value)

8 An example of the line of best fit this is temperature data from New York over time, with a median-median line added what type of trend are we looking at? see p35 for method for creating a median-median line

9 Creating a Median-Median Line Divide the points into 3 symmetric groups  If there is 1 extra point, include it in the middle group  If there are 2 extra points, group one in each end Calculate the median x- and y-coordinates for each group and plot the median point (x, y) If the median points are on a straight line, connect them  Otherwise, line up the two outer points, move 1/3 of the way to the middle point and draw a line of best fit

10 Median-Median Line (10 points)

11 Median-Median Line (14 points)

12 Exercises try page 37 #2, 3, 6, 8

13 Trends in Data Using Technology Chapter 1.4 – Trends in Technology Mathematics of Data Management (Nelson) MDM 4U

14 Categories of Correlation correlation scatter plots can be positive or negative, strong or weak try looking at the examples in this website to help you understand: http://www.seeingstatistics.com/seeing1999/gallery/CorrelationPicture.html

15 Regression a process of fitting a line or curve to a set of data if a line is used, it is linear regression if a curve is used, it may be quadratic regression, cubic regression, etc. why do we do this? what can we do with the resulting function? http://www.seeingstatistics.com/seeing1999/g allery/CorrelationPicture.html http://www.seeingstatistics.com/seeing1999/g allery/CorrelationPicture.html

16 Correlation Coefficient the correlation coefficient r is an indicator of the strength and direction of a linear relationship  r = 0no relationship  r = 1perfect positive correlation  r = -1perfect negative correlation r 2 is the coefficient of determination  if r 2 = 0.42, that means that 42% of the variation in y is due to x

17 Residuals a residual is the vertical distance between a point and the line of best fit if the model you are considering is a good fit, the residuals should be small and have no noticeable pattern why? http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html

18 Creating a Median-Median Line Using Technology Copy the following file to your M:\ drive  N: \ LIEFF \ MDM4U \ 1.3 Best Fit Lines \ armspan vs height.stu.ftm Right-click the file | Open With | Choose Program | Browse Program Files \ Fathom \ fathom.exe

19 Exercises Page 51 #1-6, 7 b,c,d, 8

20 References Wikipedia (2004). Online Encyclopedia. Retrieved September 1, 2004 from http://en.wikipedia.org/wiki/Main_Page

21 The Power of Data Chapter 1.5 – The Media Mathematics of Data Management (Nelson) MDM 4U There are 3 kinds of lies: lies, damn lies and statistics.

22 ‘4 out of 5 dentists recommend Trident sugarless gum to their patients who chew gum’ In small groups discuss how this statistical statement could be misleading

23 Trident conclusions How many dentists did they ask? 5? 4 out of 5 is convincing but reasonable  5 out of 5 is preposterous  3 out of 5 is good but not great Recommend Trident over what?  Chewing sugared gum? Is Trident the “best” sugarless gum? What variables were considered? What did the 5 th dentist recommend?

24 “More people stay with [Bell Mobility] than any other provider.” In small groups, discuss:  1) What variables would be recorded in this study?  2) How could the data be used to arrive at this conclusion falsely?

25 1) What variables would be recorded in this study? Number of Bell Mobility subscribers Number of renewed contracts Contract renewed? Time of Renewal (during contract / upon completion of contract) Contract Length Contract Type (business or home)

26 2) How could the data be used to arrive at this conclusion falsely? Does not specify how many more customers stay with Bell.  e.g. Percentage of customers renewing their plan: Bell: 30% Rogers: 29% Telus: 25% Fido: 28% Did they only count totals? What does it mean to “stay with Bell”? Honour entire contract? Renew contract at the end of a term? Are early terminations factored in? If so, does Bell have a higher cost for early terminations? Competitors’ renewal rates may have decreased due to family plans Does the data include Private / Corporate plans?

27 How does the media use (misuse) data? To inform the public about world events in an objective manner It sometimes gives misleading or false impressions to sway the public or to increase ratings It is important to:  Study statistics to understand how information is represented or misrepresented  Correctly interpret tables/charts presented by the media

28 Exercises p. 60 #1-6 Final Project – Manipulating Data


Download ppt "Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U."

Similar presentations


Ads by Google