Presentation is loading. Please wait.

Presentation is loading. Please wait.

Teaching with Large Data Sets

Similar presentations


Presentation on theme: "Teaching with Large Data Sets"— Presentation transcript:

1

2 Teaching with Large Data Sets
Please download the datasets from This session will demonstrate some of the techniques and then give teachers an opportunity to try them.

3 Data Sets: Key Points Each awarding body will supply one or more datasets; questions will be set which assume familiarity with the dataset. Students must be familiar with both the data and its context; they need to know the origin of the data and how it might have been collected. Technology must be used to explore the data set. Increased emphasis on interpretation over calculation and being able to select the correct representation or model Emphasise that students will be better prepared for the exam if they are familiar with the context and the best way to do this is through exploring the datasets with technology.

4 The Data Sets Body Format Description of data Lifetime Edexcel
Spreadsheet with multiple sheets. Met Office weather for 5 different stations over 2 different time periods along with 3 international weather stations Until further notice MEI Single sheet spreadsheet 2012 Olympics Medals and demographic data by country 3 data sets will be used in rotation. Other 2 sets available in June. OCR Spreadsheet with 4 sheets Methods of Travel by Local Authority AQA Selected data from the family food survey by region. Teachers might find it useful to explore the datasets for other awarding organisations – these can form a bank of useful contexts for teaching statistics.

5 In the exam: Students should be expected to interpret output from a spreadsheet or statistical software. Students should be expected to interpret and explain terminology which has been introduced via the data set. Students should be asked to explain what effect missing data would have on a model that has been derived. Students should be asked to explain how they would collect data and to describe the drawbacks and advantages of particular sampling methods. The style of questions in statistics is changing more than in mechanics. The next few slides feature a question from the OCR Sample materials that demonstrates this.

6 OCR Large Data Set Explain a bit about the OCR LDS: data for mode of travel to work for different local authorities.

7 OCR SAM question This is an example question – there is a lot of text to read but it is much easier to read, and take in, if you are familiar with the context.

8 OCR SAM question This question is only worth 5 marks in total – there is a lot to read for these 5 marks.

9 The issue here is do we need to have worked with the LDS to do the question? Is it easier if we have? Teachers will have this on a handout with the questions. Emphasize the need to have used technology to explore the dataset, hence why we are doing this.

10 The Edexcel Data Set If we look at the Edexcel data files in excel. Questions we might ask: Does it need to be cleansed? If so how? Does it include the source of the data so that learners can understand how it was collected? Is there a glossary to help learners understand the data and associated terminology? Is it clear whether it is (essentially) a population, or a sample from a larger population? Little cleansing needed although we need to decide what to do with a trace of rainfall and the day when no wind recordings were made. If we work out the average wind speed excel will ignore the n/a and divide the total by 183 data items which is fine as that will give a realistic mean value. However if we do the same for the rainfall, it will ignore the trace values, of which there are many and we will a value which is too high. The data is probably best regarded as a subset of a larger population of weather data.

11 How might students use technology?
Sorting and searching; identifying outliers; cleansing; Producing and using summary statistics to make conclusions about the data; Producing graphs such as histograms and box-plots that allow comparisons to be made; Producing scattergraphs and modelling using trend lines or curves; Selecting random samples and comparing them to the population to illustrate variation; Checking to see if data fits a particular model or distribution; hypothesis testing. This is just a few ideas. The main aim is for students to become familiar with statistical ideas and the context of their data set through exploring it with technology.

12 Spreadsheets and statistical software
What are the options? Advantages and Disadvantages? Excel GeoGebra This presentation uses only Excel and GeoGebra. Other technology is available. Gnumeric is a free spreadsheet which has better statistical functions than excel. This session concentrates on using GeoGebra for Statistics. There is a lot more help for using GeoGebra in pure maths at

13 Sorting and Filtering: The Data Set
Sorting can help establish if there are rogue values or outliers Filtering can help focus on a subset of the data In the next few slides (14-21) various techniques that can be demonstrated. Spend about mins demonstrating and then let teachers work on the workbook. The slides show what can be demonstrated and this should be done live. The workbook also shows how to do this. Here show how to Sort and Filter on a few different fields using Excel, including a multiple level sort.

14 (Not) Drawing Graphs Using Excel
Drawing meaningful graphs using Excel is very difficult. Show how graphs can be drawn on Excel. Or can’t be – it’s pretty hopeless. Might be ok for line graphs/time series. The add-in can used but need to set up bins first and filter out n/a as excel can’t cope with this. Excel 2016 has more built-in features such as box-plots.

15 GeoGebra Spreadsheet View!
GeoGebra has its own spreadsheet view: You can paste from an Excel file into it. Copy and Paste some maximum temperature data from the Excel into a couple of columns. E.g. for Camborne 1987 and Heathrow 1987

16 GeoGebra: One Variable Analysis
Select the first column and click on one variable analysis and then Analyse.

17 GeoGebra: Charts Show the different graphs that can be plotted and how they can be changed by the use of the slider.

18 GeoGebra: Multiple Variable Analysis
Select both columns and then Multiple Variable Analysis. 2 box plots can be drawn side by side. What do these show?

19 GeoGebra: Using the statistics box
By clicking on the sigma x icon we can get the statistics box (demo this).

20 Making Comparisons Using the statistics box I have summarised the Weather in Camborne in October 1987 and in October 2015. What do you notice? Why was Oct 1987 different? What happened?

21 Correlation and Regression
Both Excel and GeoGebra can calculate correlation coefficients. GeoGebra has tools to plot regression ‘lines’. Select some bivariate data (can use the 2 sets of max temps at different locations). Click on the Two Variable Regression Analysis option and also reveal the statistics box. Geogebra which will fit different functions and work out y-values based on the x-values (can also swap x and y roles and get x on y regression line instead of y on x). Students are expected to model with different functions, not just linear. Change the model and explore which curve is the best fit. The SSE gives a measure of the squared deviations from the curve.

22 Workbook Using the dataset of your choice, work through the activities in the workbook. You’ll need: Laptop or tablet The Dataset Spreadsheet and GeoGebra Workbook for your dataset. Allow 40 minutes for delegates to work through activities and float round and answer questions. You may want to refer back to earlier slides.

23 Next Steps… Lead a discussion of what teachers might do from here. What do they need to consider in planning? Hardware? (What sort of devices, can students use their own) Software? Costs? (Should be none for Software). How to integrate use of tech with LDS into their SOW? Most of the stats could be taught through the LDS.

24 Resources Awarding body SAMs Awarding body PD and supporting materials
Integral 2017 has resources to support the teaching of statistics through the LDS, including videos will have applets to demonstrate particular statistical concepts Anything else?


Download ppt "Teaching with Large Data Sets"

Similar presentations


Ads by Google