Presentation on theme: "The W’s of Data. Data Does have to be numbers? It can be doesn’t have to be. Without context, it’s useless! Consider 17, 21, 44, and 76 Are."— Presentation transcript:
The Five W’s of Data Answering the Five W’s of Data provide the context of the data. Who What When Where Why And if possible How
Who Rows of data correspond to individual cases about whom (or which if not people) we record some characteristics Respondents – individuals who answer a survey Subjects or participants – people on whom we experiment Experimental units – inanimate subjects for experiments Data values may also be called observations without being clear about the Who
What Variables – the characteristics recorded about each individual Variables are usually recorded in the columns of a data table Variables identify What has been measured They may seem simple but think! Variables have measurement units – it’s natural to count how many cases belong in each category. The units tell how each value has been measured (scale)
Variables Categorical variables – name categories and answers how cases fall into these categories. Can also be a qualitative variable Ex. Gender, Year in school, nationality, etc. Quantitative variable – answers a question about the quantity of what is measured Ex. Height, weight, income, etc. Just because the data are numbers does not make it quantitative Ex. Zip codes
Why It’s the questions we ask a variable that shape how we think about it. Ex. An end of class survey asks “How valuable do you think this course will be to you?” 1 = worthless2 = slightly3 = middling 4 = reasonably5 = invaluable Is the educational value categorical or quantitative?
From the data sheet Are variables qualitative or quantitative? Why?
Counts count When Amazon offers free shipping, they might first analyze how purchases are shipped. Counting summarizes the categorical variable, shipping method. We also use counts to measure quantities such as the number of classes you are taking or how many songs you own. Two ways to use counts: Count the cases in each category of a categorical variable, the category label are the What and the individuals counted are the Who The counts themselves are not data, but they are something to summarize about the data
Example Back to Amazon’s shipping What is the categorical variable? What? Who? Why? Shipping MethodNo. of purchases Ground20,345 Second-day7,890 Overnight5,432
The second way is when the focus is on the number of something, which is measured by counting. Ex. Amazon might track the growth in the number of teenage customers each month to forecast CD sales. What? Who? Units? Why? Is teen a category? Is it a quantitative variable? MonthNo. of Teenage Customers January123,456 February234,567 March345,678 April456,789
Identifiers Is your student ID number a quantitative variable? Why? Other examples of identifiers include UPS tracking numbers, social security numbers, driver’s license numbers Identifier variables do not tell us anything useful about the category because there is exactly one individual in each. The are used to: Combine data from different sources Protect confidentiality Provide unique labels
We must know Who, What, and Why to analyze but understand more we would also like to know When, Where, and How. When can make a difference in the data. Example Number of women with jobs outside the home in 1900 and the number of women with jobs outside the home in 2000. Where can make a difference in the data Example Number of high school students participating in ice hockey in Florida and Number participating in ice hockey in Minnesota We need more information…
How data is collected matters Survey, interviews, observation, etc. How could surveys be flawed, especially internet surveys?
Example Medical researchers at a large city hospital investigated the impact of prenatal care on newborn health collected data from 882 births during 1998-2000. They kept track of the mother’s age, the number of weeks the pregnancy lasted, the type of birth (cesarean, induced, natural), the level of prenatal care the mother had (none, minimal, adequate), the birth weight and sex of the baby, and whether the baby exhibited health problems (none, minor, major). Identify the W’s, name the variables, specify for each variable whether its use indicates it should be treated as categorical or quantitative, identify the units in which it was measured or note that they were not provided.