Units of Analysis The Basics. Outline An illustration Definitions Elements of the unit of analysis Complexity Data structure.

Slides:



Advertisements
Similar presentations
Units of Analysis The Basics Chuck Humphrey ACCOLEDS/DLI Training December, 2001.
Advertisements

Aggregate Data and Statistics
CHAPTER 1 WHAT IS RESEARCH?.
Chapter 12 File Processing and Data Management Concepts
National Center for Health Statistics DCC CENTERS FOR DISEASE CONTROL AND PREVENTION Changes in Race Differentials: The Impact of the New OMB Standards.
Fundamentals, Design, and Implementation, 9/e Appendix A Data Structures for Database Processing.
Chuck Humphrey Data Library University of Alberta.
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta September 29, 2008.
Demystifying Data Reference Helping non-specialists make sense of data.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 26, 2009.
Chapter 14 Getting to First Base: Introduction to Database Concepts.
SOWK 6003 Social Work Research Week 10 Quantitative Data Analysis
Searching the University of Alberta Library’s Statistics Canada-based Websites 2001 Census of Canada Canadian Centre for Justice Statistics Canadian Business.
GEOG 1230 Lecture 2 Types and Sources of Geographical Data.
Responding driven sampling Principles of Sampling Session 1.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library March 6, 2009.
CJ 526 Statistical Analysis Research methods and statistics.
Statistics and Data for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 27, 2008.
EAS 293 Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 14, 2008.
Census Bureau – Fernando Casimiro, Coordinator Lisboa IPUMS - Portugal Country Report.
6-1 Chapter Six DESIGN STRATEGIES. 6-2 What is Research Design? A plan for selecting the sources and types of information used to answer research questions.
Canadian Travel Survey, 1998 Throughout 1998, Statistics Canada interviewed approximately 180,000 Canadians across the country about their trips in Canada,
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta ACCOLEDS 2007.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
Chapter 3 Goals After completing this chapter, you should be able to: Describe key data collection methods Know key definitions:  Population vs. Sample.
Concepts of Database Management, Fifth Edition
Liesl Eathington Iowa Community Indicators Program Iowa State University October 2014.
CSCI 3140 Module 2 – Conceptual Database Design Theodore Chiasson Dalhousie University.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
Units of Analysis The Basics. Outline Definitions Elements of the unit of analysis Data structure.
Role of Statistics in Geography
Data and Social Research Chuck Humphrey Data Library Rutherford North Library.
Chapter 1:Statistics: The Art and Science of Learning from Data 1.1: How Can You Investigate Using Data? 1.2: We Learn about Populations Using Samples.
1 The 2001 Census PUMFS Odyssey Sponsored by HAL and PALS Presented by Chuck Humphrey.
DLI Workshop -- Mar Hosted by Dalhousie University March 2000 DLI Training Workshop.
HOW TO WRITE RESEARCH PROPOSAL BY DR. NIK MAHERAN NIK MUHAMMAD.
Chapter 12 View Design and Integration. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Motivation for view design.
Research Design.
1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.
Census Unit Fernando Casimiro and Paula Paulino Geneva, October 2009 Portugal – Changes in the residence of third level students «
Data Reference The data reference interview And… Cool tools and strategies.
Fanny Widadie, S.P, M.Agr 1 Database Management Systems.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Vocabulary of Statistics Part One. Stastistics Original word came from: Original word came from: State Arithmetic.
QUANTITATIVE RESEARCH Presented by SANIA IQBAL M.Ed Course Instructor SIR RASOOL BUKSH RAISANI.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Sociology 343 Chuck Humphrey Data Library University of Alberta.
RESEARCH METHODS Lecture 26
The Integrated Public Use Microdata Series database IPUMSwww.ipums.org Lab 1 Background on the IPUMS and SPSS.
1 Working with Canadian Census Microdata Martine Grenier and Mokili Mbuluyo Census Operations Division, Statistics Canada December 2007.
Stretching Your Data Management Skills Chuck Humphrey University of Alberta Atlantic DLI Workshop 2003.
Data in context Chapter 1 of Data Basics. Frameworks Today, we will be presenting two frameworks for thinking about the content of data services. A.Statistics.
3 1 Chapter 3 The Relational Database Model Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Hosted by the University of Regina Library December 1999 DLI Training Workshop Chuck Humphrey.
What is GIS? “A powerful set of tools for collecting, storing, retrieving, transforming and displaying spatial data”
Welcome! Seminar – Monday 6:00 EST HS Seminar Unit 1 Prof. Jocelyn Ramos.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
Workshop on World Programme for the Census of Agriculture 2020 Amman, Jordan May 2016 Theme 8: Demographic and social characteristics Technical Session.
Geo-referenced data and DLI aggregate data sources
Module 11: File Structure
Data Virtualization Demoette… Flat-File Data Sources
Tabulations and Statistics
Units of Analysis The Basics.
University of Regina Library
Getting to First Base: Introduction to Database Concepts
Getting to First Base: Introduction to Database Concepts
Getting to First Base: Introduction to Database Concepts
New Perspectives on Microsoft
Presentation transcript:

Units of Analysis The Basics

Outline An illustration Definitions Elements of the unit of analysis Complexity Data structure

An Illustration A group of students in an econometrics class were sent to the Data Library to find some data for an assignment.

An Illustration A typical request was like this one. “I want to look at crime rates and a person’s level of education.”

An Illustration  crime rates are usually associated with spatial units or a time series  a person’s education is an attribute of individuals This request raises problems.

An Illustration  does the student want crime rates and the percentage of the population with certain education levels for specific cities? This would be data aggregated over geography. What are we looking for?

An Illustration  does the student want the crime rate for one city over time, such as the number of homicides in Edmonton over the past 40 years. This would be data aggregated over time. What are we looking for?

An Illustration  does the student want the education level of criminals? This would be a special subpopulation of individuals convicted of crimes and consist of a microdata file of criminals. What are we looking for?

An Illustration  does the student want the education level of victims of crimes? This would be a special subpopulation of individuals who were victimized and consist of a microdata file of victims. What are we looking for?

An Illustration Looking at crime rates and level of education can differ depending upon the unit of analysis. individuals geographic areas changes over time

An Illustration After walking the student through these steps, he chose to build a model predicting income on the basis of highest educational attainment and a few other variables from the Census individual-level public use microdata file. He completely abandoned his interest in crime!

An Illustration Unfortunately, the student’s initial request not only failed to specify a clear unit of analysis, it included a mix of different units, which suggests that the concept was not understood.

The Point of the Illustration The unit of analysis is fundamental to the data and statistical reference interview. Early identification of the unit of analysis will help focus a search on (a) statistics, (b) aggregate data, or (c) microdata.

The Point of the Illustration Furthermore, the unit of analysis is fundamental to secondary data analysis. It may be that knowledge of the unit of analysis is even more crucial in secondary analysis than in primary analysis, where the unit is implicit in the sample design, if not otherwise explicit.

The Point of the Illustration Finally, the unit of analysis is a fundamental characteristic of statistical data structures, which are the formal ways in which data are organized for processing.

Where We’re Headed Let’s look closer at the concepts behind the unit of analysis and then we’ll look at how these concepts end up being converted into data structures.

Definitions The unit of analysis is the basic entity or object  about which generalizations are to be made based on an analysis, and  for which data have been collected

Definitions How does the unit of analysis relate to the unit of observation? The unit of observation is the entity in primary research that is observed and about which information is systematically collected.

Definitions The unit of observation and the unit of analysis are the same when the generalizations being made from a statistical analysis are attributed to the unit of observation.

 Unit of Observation – in original data collections, the unit of observation is determined by the method by which observations are selected  Unit of Analysis – in secondary analysis, the unit of analysis is determined by an interest in exploring or explaining a specific phenomenon Definitions

Identifying a Unit of Analysis As hinted in the earlier illustration, the unit of analysis is shaped by three attributes: – social entities – time – space

Research Outputs Let’s begin by looking at a finished product to examine these attributes more closely. We’ll use a table from the Health Indicators Database about suicide.

Social Characteristics Emphasized Geography and Time held constant

Ordered by Time Geography and Age held constant

Geography Emphasized Time and Age held constant

Social Entities  observations of a single social entity, such as a person or an institution  observations of multiple entities with a defined relationship, such as family, employer-employee

Social Phenomena  transactional observations that are the result of actions among entities, such as labour strikes or international conflicts, including wars

Time  observations made at one point in time; commonly referred to as a cross-sectional study

Time  observations made at multiple points in time  the data may be organized by time; commonly referred to as a time series  time may structure some form of repeated measures of content or subjects

Space  observations made within a specific spatial area  observations made within a hierarchy of spatial areas

Substituting Units There may be requests for which data for a desired unit of analysis can’t be delivered but for which data are available summarized over one of the other attributes of the unit of analysis.

Substituting Units Example:  Request for firm-level data for NAICS 312 Beverage and Tobacco Product Manufacturing  Ideal source: microdata on companies from the Canadian Census of Manufacturers  No access to enterprise microdata

Substituting Units Example: NAICS 312  Alternatives: are there aggregate data summarizing the firms within NAICS 312?  Possibilities: summaries over time (time series) or geography (small- area business statistics)

Complexity Complexity occurs when multiple entities are introduced within the same study. Examples parent  child  teacher person  activities  time person  cars  trips

Complexity Complexity can arise within one of the attributes just discussed. – a study of parents, children, and teachers, which are all social units or between attributes – a study of people, their daily activities, and the length of time of each activity

Complexity Complexity is often represented in an hierarchy when the units can be grouped or nested within one another. For example, children may be grouped with their parents.

Complexity Children grouped (nested) with Parents. Parent 1Parent 2 Child 1Child 2Child 3

Complexity Parents and their children may be grouped into families and families grouped into households. Household 1 Family A Person i Person ii Household 2 Family A Person i Person ii

Complexity Complexity may also be represented by combinations of entities among units. Those entities that are associated with one another are combined and those that aren’t associated, aren’t combined.

Complexity These combinations are often described as having been crossed. For example, activities may be crossed with people.

Complexity Activities crossed with people. Activity 1Activity 2 Activity 4 Activity 3 Activity 5Activity 6 X = Person B Person A Person A Activity 3 Activity 6 Person B Activity 1 Activity 5

Complexity Up to this point, complexity has been described conceptually. We’ve mentioned how complexity can be created through multiple units of analysis and the ways in which these units are related.

Complexity Complexity also manifests itself structurally through the ways in which data are organized to represent the nesting or crossing of multiple units of analysis.

Thinking about Units of Analysis Conceptually – What is the content? This is what we’ve been reviewing up to this point. Structurally – How is this complexity organized? This takes us to a discussion about data structure.

Let’s review basic data structure. The unit of analysis defines the underlying structure of a data file. Statistical Data Structure

This structure consists of a series of rows with each row containing the data of one member of the unit of the unit of analysis. This simple structure is known as the flat, rectangular data matrix. Statistical Data Structure

Case 1 Case 2 Case 3 * Case n * * Case n-1 Statistical Data Structure

All of the information collected for each member of the unit of analysis is organized in a fixed location in the file called fields or variables. Statistical Data Structure

Case 1 Case 2 Case 3 * Case n * * Field 1 * Field2 Field 3 * Field k-1 Field k Case n-1 Statistical Data Structure

Case 1 Case 2 Case 3 * Case n * * Field 1 * Field2 Field 3 * Field k-1 Field k Case n-1 Statistical Data Structure

This structure looks like the grid of a spreadsheet. However, there is one very important difference between a statistical data structure and a spreadsheet. Statistical Data Structure

The spread sheet is organized around individual cells, while the statistical data structure is organized around the rows. Statistical Data Structure

Spreadsheet Statistical Data Structure

Cell B2 Cell E3 Cell C5 Cell F7 Spreadsheet Statistical Data Structure

Row 1 Row 3 Row k-1 Statistical Data Structure

The next slide presents the way that this simple statistical data structure appears in SPSS. Statistical Data Structure

Row 1

Row 8

Row 1 Row 8 Row 15

Row 1 Row 8 Row 15 Field 8

Person: GSS 10 Main

RECID WGHTFNL PROV DVSEX DVAGECAP

Adding Complexity to Data Structurally, three methods are used: – hierarchical : different record types for separate unit of analysis, each with a different record layout, in the same file – relational : 1 to n relations identified through keys or linkage variables in multiple files – compound records : combination of units crossed on a single record

Complex Data Structure Household 1 Person 1 Person 2 Household 2 Household 3 Person 1 Person 2 Person 3 Hierarchical Data Structure

RM RM RM T T T RM RM Geography: 1991 Census N9101 Population 15 years and over by age groups (17) and marital status (6a), showing labour force activity (8) and sex (3)

RM RM RM T T T RM RM PROVFED EACDCSD CSD Type CCS CMA/CA

RM RM RM T T T RM RM PROVFED EACDCSD CSD Type CCS CMA/CA

Complex Data Structure Relational Data Structure R1 R2 R3 R4 R5 R1 C1 R1 C2 R1 C3 R1 C4 R3 C1 R3 C2 R4 C1 R5 C1 R5 C2 One to Many

GSS 10 MainGSS 10 Union

GSS 10 MainGSS 10 Union

Complex Data Structure Crossed Data Structure R1 x T1 x A1R1 x T2 x A4R1 x T3 x A7R1 x T4 x A3R1 x T4 x A1R2 x T1 x A2R2 x T2 x A9

GSS 2 Episode

SEQNUM DDAY NO_EPISO ACT_CODE

SEQNUM DDAY NO_EPISO ACT_CODE

We’ll now use this background to look at the files in the Canadian Travel Survey. Complex Data Structure