2SADC Course in Statistics 28/03/2017ObjectivesAt the end of this session students will be able to:Define statisticsEnter simple datasets once the data entry form is set upRecognise the type of each variable in a datasetKnow some ways to summarise data of each main typeExplain how statistical investigations deal with variabilityDifferentiate between descriptive and inferential statistics
3SADC Course in Statistics 28/03/2017ActivitiesThis introductionEntry of the data from the CAST surveyDiscussion/presentation on statistical conceptsUsing the data enteredAnd other case studiesThe statistical glossaryFor when you need to remind yourself about terminology
4SADC Course in Statistics 28/03/2017What is statistics - 1?From RSS webpage:1. Statistics changes numbers into information.2. Statistics is the art and science of deciding:what are the appropriate data to collect,deciding how to collect them efficientlyand then using them to give information,answer questions,draw inferencesand make decisions.
5SADC Course in Statistics 28/03/2017What is statistics - 2?3. Statistics is making decisions when there is uncertainty.We have to make decisions all the time,in everyday life,and as part of our jobs.Statistics helps us make better decisions.4. Statistics is NOT just collecting a lot of numbersIt is collecting numbers for a purpose
6SADC Course in Statistics 28/03/2017What is statistics - 3?From Wikipedia:5. Statistics is a mathematical science pertaining to thecollection,analysis,interpretation or explanationand presentationof data.6. Statistics are used for making informed decisionsand misused for other reasonsin all areas of business and government
7SADC Course in Statistics 28/03/2017What is statistics - 4?From the book “Statistics: A guide to the unknown”:7. Statistics is the science of learning from data.Question 1 in the practical sheetFrom these 7 definitions – in the practical sheeteither chose the one you think is most appropriateor make your owna) A one – line definitionb) A longer definition
8Data checking and entry – Question 2 What can we learn from the data you collected?Work in pairs or small groupsFirst check the data from the CAST surveyCheck each others, not your ownIs it legible?Can it be entered into the computer?Is the response to the open-ended question clear?Can the text be simplified?If there are many points, ask the respondent to state which are the most important 2 or 3.Brief notes (as a report) to be made in the exercise sheetto establish the data are ready for entry
9Just type the number. The label is automatic Data entry into ExcelJust type the number. The label is automatic
10Data entry and checking – Question 3 The data are now enteredThis can be a class exerciseon a single computerData is entered by someone elsefor each respondent (never by themselves)Then it must be checkedread it outcheck by reading backPut the record number from the Excel formon your original sheetor add your names as another field in the Excel sheetWhy might it be better to just have a number?
11Data entry and checking You should now have completed question 3On the practical sheetHow long to you estimateFor 1000 records to be entered?
12Once the data are entered Remember:“Statistics is the science of learning from data.”To learn as much as possiblewe must have confidence in the dataso they must be entered and checked wellThis is what we have done in the groupsNow the data are ready for the analysisBefore that, look at some other data setsLook for the common pointsThat apply to all the setsand look for differences
13Types of data - 1 The analysis depends on the type of data What are the types here?For questions 1 to 6Your answer was one of 5 categoriese.g. 1: Strongly agree, 2: Agree, … 5: Strongly disagreeThese categories have an orderingfrom strongly agree to strongly disagreeThis type of data are calledcategoricalor factoror qualitativeWith the ordering, they are sometimes calledordered categorical data
14Types of data - 2 The last question in the survey was a sentence or two that was writtenThis is also an example of qualitative dataIt is an open-ended responseThese data can be reportedand reporting the sentences can be very usefulSo it is good if they are entered as they standTo summarise perhaps the responses can be coded?
15Coding open-ended questions –Question 4 This is question 4 in the practical sheetLooking at the responses in your groupsCould you code them?What different codes would you have?How would you enter the codes?Might you lose anything by codingFor a quick analysisCould you enter the complete textsAnd analyse the other columnsAnd then code later?What might you lose by coding?
16Coding and entering open-ended data Discuss the suggestions for the codes.If some points are made by many students then prepare a summary,how many as a frequencyand as a percentageWith the small number of responsesthere is no need to enter them into the computerBut discuss how it could be doneIt is an example of a multiple response questionbecause respondents may give no pointsor more than one pointIf you ask for the most important observationthen it becomes a single qualitative response
17Other data sets Zambia rainfall data Tanzania agriculture survey Look for the layout of the datais it the same as for the simple CAST survey?Look for the types of dataWhich are the qualitative variables?are they ordered?Which are the quantitative variables?which of them are discrete?and which are continuous?have any been coded to become qualitative?
21Discussion- Question 5 The layout of the data Each row is a record Was always the same!In a rectangleEach row is a recordThere are as many records (rows of data)as there were respondents, or students, or unitsEach column is a variableVariables can be qualitativeor they can be quantitativeDiscuss which type they areFor each data setscomplete the tables in the practical sheet, question 5
22Qualitative variables They are categoricalThey may be nominal, (which implies there is no ordering)Give some examples from the Tanzania surveyThey may be ordered – as in the CAST surveyGive an ordered example from the Tanzania survey
23Examples of analysis – Tanzania survey Question 6 There are 3223 records,but just take the 18 you can see in the figureCount the values for Q0123 – head of householdThere were 6 Females and 12 MalesSo 2/3 of the 18 households had a male headThat’s about 70%but percentages are a bit misleading with so few numbersNow you give a similar summary for Q021type of agricultural householdAnd also Q3464how often did the household have food problems
24Add a simple chart A simple chart can also be sketched Here is one by ExcelBut a sketch can be “by hand”Excel will be used for these tasks from Session 4
25Examples of analysis – CAST survey Question 7 Do a similar analysis of the CAST surveyTo make it quickeach group could initially process just one questionthen report the results to the classInclude a hand drawn chartSketch a simple bar chartand include the numbers on the chartas shown earlier
26Quantitative variables- Question 8 They may be discrete (whole numbers)Give examples from the climatic dataAnd the Tanzania surveyThey may be (conceptually) continuousGive examples from the data setsAlso they may be coded into (ordered) categoriesGive an example from the Tanzania survey
27Examples of analysis – Tanzania survey An analysis of the 18 values in Q3462The number of times meat was eaten last weekminimum = 0maximum = 5adding the values: total = 31,so the mean = 31/18 about 1.7 times per weekNote: the mean does not have to be an integerjust because the individual values are whole numbersRepeat this analysisfor Q3463 – times fish eaten last weekand HHsize
28Data analysis As the layout of the data is always the same Once you know how to analyse one data setYou will have the principles to analyse them allAnd we have just done one analysis!You have seen thatThe appropriate analysis depends on the type of dataSo what are the principlesof analysing (summarising) dataof the different types?
29The methods of analysis How many?are questions for qualitative variablesfor example the CAST survey, the Tanzania surveyYou used summariesLike counts, or proportions or percentagesHow large?How variable?are questions for quantitative variablesfor example the climatic data or the Tanzania surveyWe used summariesLike averages, extremes and measures of spread
30A toolkit for analysis Different types of graph are also used Qualitative data“how many”Quantitative datahow largehow variable
31Statistics and variation In the CAST survey - why not just ask one student?In the climatic data - why not just use one year?In the agriculture survey - why not just use one household?Because there is variation between the responsesRemember this definition?“Statistics is making decisionswhen there is uncertainty.”
32Variation is everywhere! In the book “Statistics a guide to the unknown”“Variation is everywhere.Individuals varyRepeated measurements on the same individual varyThe science of statisticsprovides tools for dealing with variation”So statistics is concerned with making sense from data, when there is variation
33Fighting the curse of variation To do good statistics you musttame variationfight the curse of variationYou have 2 main strategies for overcoming variation1. Take enough observationsIn the Tanzania survey there were 3223 households just from this one region2. Measure characteristics that explain variationVariation itself is not necessarily the problemVariation you do not understand is the problem
34An example: explaining variation Take the CAST surveyAdd a new record for an imaginary studentMake it VERY DIFFERENT to the existing recordsSo if most students were positive about CASTThen make this record very negative, etcYou have added variationNow what could you (should you) have measuredto explain this variation?
35What you could have measured This little survey only asked about CASTIt did not ask about you, e.g.male/femaleexperienceagecomputer accessetcThese measurements could helpto understand the difference with this new studentThe Tanzania survey also asked aboutEducationPossessions, etcWhy – to be able to understand/explain variation
36Analysis and variation together For statistical analysis you have:summarised columns of datai.e. summarised individual variablesYou did this for qualitative and quantitative variablesTo fight the curse of variationYou take measurementsSo you add to the rows of dataThat helps you to explain the variationThat’s statistics for you!You analyse the columns, i.e. the variablesAnd you understand variability by looking at the rows
37Types of statistics Wikepedia says roughly: Statistical methods can be used to summarizeor describe a collection of data;this is called descriptive statistics.In addition, patterns in the data may be modelledand then used to draw inferences about the process or population being studied;this is called inferential statistics.Both descriptive and inferential statisticscomprise applied statistics.
38Descriptive and inferential statistics We have just done descriptive statisticsWe will only do descriptive statistics in this moduleThe sample in the Tanzania agricultural surveywas 3223 householdsThat’s just under 1% of the households in the regionSee the column called WT – with values like 137So each observation “represents 137 householdsBut with such a large sampleThe inferences for the whole regionWill be quite preciseSo most of what we need now is descriptive toolsIn the Higher level moduleswe add ideas of inferential statistics
39Glossary of statistical terms Each subject becomes easierwhen you understand the termsA glossary is suppliedCalled the SSC Statistical GlossaryIt explains most of the termsFor the 3 levels of this courseSo some terms may be new to you nowAn example is on the next slideYou can print the glossary if you wishBut it is good to look on-lineThen all the terms in blue are linksSo you can easily move about in the document
40Example from the glossary Descriptive statisticsIf you have a large set of data, then descriptive statistics provides graphical (e.g. boxplots) and numerical (e.g. summary tables, means, quartiles) ways to make sense of the data.The branch of statistics devoted to the exploration, summary and presentation of data is called descriptive statistics.If you need to do more than descriptive summaries and presentations it is to use the data to make inferences about some larger population.Inferential statistics is the branch of statistics devoted to making generalizations.
41Learning objectives Define statistics Enter simple datasets once the data entry form is set upRecognise the type of each variable in a datasetKnow some ways to summarise data of each main typeExplain how statistical investigations deal with variabilityDifferentiate between descriptive and inferential statistics
42SADC Course in Statistics 28/03/2017The endNext we move to the use of ExcelTo produce the tables and graphsSo you can analyse all 3223 records – not just 18