Presentation on theme: "ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds."— Presentation transcript:
ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds
ENV 2006 2.2 Lecture Outline Preliminaries –Definitions –Datatypes Simple Data Presentation –Graphs and charts
ENV 2006 2.3 Fundamentals Basic Datatypes correspond to different levels of measurement Data can be: –Categorical - labels –Numerical – numbers Categorical –Nominal No sense of order Apples, oranges,… –Ordinal Ordered in sequence January, February,.. Numerical –Continuous Real numbers Height of students in class –Discrete Typically whole numbers Marks in an exam
ENV 2006 2.4 Question Give an example for each class in which numbers are involved… Categorical - nominal Categorical - ordinal Numerical – continuous Numerical - discrete
ENV 2006 2.5 Exploratory Data Analysis Pioneering figure is John Tukey New approach to data analysis, heavily based on visualization, as an alternative to classical data analysis See wikipedia Two stage process: –Exploratory: Search for evidence using all tools available –Confirmatory: evaluate strength of evidence using classical data analysis
ENV 2006 2.7 Simple Data Presentation Simple data tables are often presented as line graphs, bar graphs, pie charts, dot graphs, histograms… Which should we use and when?
ENV 2006 2.8 Line Graph Fundamental technique of data presentation Used to compare two variables –X-axis is often the control variable –Y-axis is the response variable Good at: –Showing specific values –Trends –Trends in groups (using multiple line graphs) Students participating in sporting activities Mobile Phone use Note: graph labelling is fundamental Any critical comments here?
ENV 2006 2.9 Simple Representations – Bar Graph Bar graph –Presents categorical variables –Height of bar indicates value –Double bar graph allows comparison –Note spacing between bars –Can be horizontal (when would you use this?) Internet use at a school Number of police officers Note more space for labels
ENV 2006 2.10 Dot Graph Very simple but effective… Horizontal to give more space for labelling
ENV 2006 2.11 Pie Chart Pie chart summarises a set of categorical/nominal data But use with care… … too many segments are harder to compare than in a bar chart Should we have a long lecture? Favourite movie genres
ENV 2006 2.12 Histograms Histograms summarise discrete or continuous data that are measured on an interval scale No gaps if variable is continuous Distribution of salaries in a company
ENV 2006 2.13 Scatter Plot Used to present measurements of two variables Effective if a relationship exists between the two variables Car ownership by household income Example taken from NIST Handbook – Evidence of strong positive correlation
ENV 2006 2.14 Scatter Plots in Excel The scatter plot is a fundamental tool in Excel Chart type XY (Scatter) and subtype Unconnected Points http://www2.ncsu.edu:8010/ncsu/chemistry/resource/excel/excel.html
ENV 2006 2.15 Regression Line Excel allows you to add a linear regression line (trend line) Remember: correlation does not imply causality… ie a relationship exists but one is not necessarily causing the other – there may be a third factor?
ENV 2006 2.17 Box Plots In some situations we have, not a single data value at a point, but a number of data values, or even a probability distribution When might this occur? Tukey proposed the idea of a boxplot to visualize the distribution of values For explanation and some history, see: http://mathworld.wolfram.com/Box-and- WhiskerPlot.html http://en.wikipedia.org/wiki/Box_plot M – median Q1, Q3 – quarrtiles Whiskers – 1.5 * interquartile range Dots - outliers http://www.upscale.utoronto.ca/GeneralInterest/Harrison/Visualisation/Visualisation.html Darwins plant study
ENV 2006 2.18 Acknowledgement Thanks to Statistics Canada – an excellent web site for simple data presentation –http://www.statcan.ca/english/edu/power/toc/contents.htmhttp://www.statcan.ca/english/edu/power/toc/contents.htm
ENV 2006 2.19 Exercise for next week Understand a bit more about the merits of pie charts and bar graphs Create a dataset with roughly equal numbers in each class Which is best if the task is to discriminate?
ENV 2006 2.20 Exercise for next week Over the next week look for examples of basic graphs –In newspapers, magazines or other print media –On news web sites or other electronic media Analyse two examples –One should be a example where you think the use of graphics is good –One should be bad Be ready next week to present these results to the class…
ENV 2006 2.21 Envisioning Information : Practical Work Gnuplot R Excel