Units of Analysis The Basics.

Slides:



Advertisements
Similar presentations
Units of Analysis The Basics Chuck Humphrey ACCOLEDS/DLI Training December, 2001.
Advertisements

Aggregate Data and Statistics
Chapter 12 File Processing and Data Management Concepts
Chuck Humphrey Data Library University of Alberta.
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta September 29, 2008.
Demystifying Data Reference Helping non-specialists make sense of data.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 26, 2009.
Chapter 14 Getting to First Base: Introduction to Database Concepts.
GEOG 1230 Lecture 2 Types and Sources of Geographical Data.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library March 6, 2009.
Statistics and Data for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 27, 2008.
EAS 293 Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 14, 2008.
Census Bureau – Fernando Casimiro, Coordinator Lisboa IPUMS - Portugal Country Report.
Canadian Travel Survey, 1998 Throughout 1998, Statistics Canada interviewed approximately 180,000 Canadians across the country about their trips in Canada,
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta ACCOLEDS 2007.
Concepts of Database Management, Fifth Edition
Units of Analysis The Basics. Outline Definitions Elements of the unit of analysis Data structure.
Role of Statistics in Geography
Data and Social Research Chuck Humphrey Data Library Rutherford North Library.
1 The 2001 Census PUMFS Odyssey Sponsored by HAL and PALS Presented by Chuck Humphrey.
DLI Workshop -- Mar Hosted by Dalhousie University March 2000 DLI Training Workshop.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Data Reference The data reference interview And… Cool tools and strategies.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
Fanny Widadie, S.P, M.Agr 1 Database Management Systems.
Units of Analysis The Basics. Outline An illustration Definitions Elements of the unit of analysis Complexity Data structure.
QUANTITATIVE RESEARCH Presented by SANIA IQBAL M.Ed Course Instructor SIR RASOOL BUKSH RAISANI.
Stretching Your Data Management Skills Chuck Humphrey University of Alberta Atlantic DLI Workshop 2003.
Hosted by the University of Regina Library December 1999 DLI Training Workshop Chuck Humphrey.
What is GIS? “A powerful set of tools for collecting, storing, retrieving, transforming and displaying spatial data”
DBS201: Data Modeling. Agenda Data Modeling Types of Models Entity Relationship Model.
Research Design
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Workshop on World Programme for the Census of Agriculture 2020 Amman, Jordan May 2016 Theme 8: Demographic and social characteristics Technical Session.
© Yosa A. Alzuhdy - UNY © Yosa A. Alzuhdy – FBS-UNY 2b. HOW and WHY of RESEARCH Quantitative Research © Yosa A. Alzuhdy, M.Hum. English.
N5 Databases Notes Information Systems Design & Development: Structures and links.
Geo-referenced data and DLI aggregate data sources
Module 11: File Structure
Databases Chapter 16.
Lecture Slides Elementary Statistics Twelfth Edition
General Social Survey Enquête sociale générale
Census Bureau – Fernando Casimiro, Coordinator
Databases and Information Management
Chapter Three Research Design.
General Social Survey Enquête sociale générale
What is a Database and Why Use One?
Tabulations and Statistics
LECTURE 34: Database Introduction
Databases and Structured Files: What is a database?
WORKSHOP ON THE DATA COLLECTION OF OCCUPATIONAL DATA Luxembourg, 28 November 2008 Occupation as a core variable in social surveys Sylvain Jouhette
URBDP 422 Urban and Regional Geo-Spatial Analysis
Databases and Information Management
Lifestyles and socialisation
SDMX Information Model: An Introduction
Vocabulary of Statistics
University of Regina Library
Getting to First Base: Introduction to Database Concepts
Spreadsheets, Modelling & Databases
Exploring Microsoft Office Access 2010
Populations and object types
Database Processing: David M. Kroenke’s Chapter Five:
Getting to First Base: Introduction to Database Concepts
A review of the 2011 census round in the EU, including the successful implementation of a detailed European legal base First meeting of the Technical Coordination.
Getting to First Base: Introduction to Database Concepts
RESEARCH METHODS Lecture 26
LECTURE 33: Database Introduction
Chapter 8 SAMPLING and SAMPLING METHODS
New Perspectives on Microsoft
GSIM overview Mauro Scanu ISTAT
Presentation transcript:

Units of Analysis The Basics

Outline An illustration Definitions Elements of the unit of analysis Complexity Data structure

An Illustration A group of students in an econometrics class were sent to the Data Library to find some data for an assignment.

An Illustration A typical request was like this one. “I want to look at crime rates and a person’s level of education.”

An Illustration This request raises problems. crime rates are usually associated with spatial units or a time series a person’s education is an attribute of individuals

An Illustration What are we looking for? does the student want crime rates and the percentage of the population with certain education levels for specific cities? This would be data aggregated over geography.

An Illustration What are we looking for? does the student want the crime rate for one city over time, such as the number of homicides in Edmonton over the past 40 years. This would be data aggregated over time.

An Illustration What are we looking for? does the student want the education level of criminals? This would be a special subpopulation of individuals convicted of crimes and consist of a microdata file of criminals.

An Illustration What are we looking for? does the student want the education level of victims of crimes? This would be a special subpopulation of individuals who were victimized and consist of a microdata file of victims.

An Illustration Looking at crime rates and level of education can differ depending upon the unit of analysis. individuals geographic areas changes over time

An Illustration After walking the student through these steps, he chose to build a model predicting income on the basis of highest educational attainment and a few other variables from the Census individual-level public use microdata file. He completely abandoned his interest in crime!

An Illustration Unfortunately, the student’s initial request not only failed to specify a clear unit of analysis, it included a mix of different units, which suggests that the concept was not understood.

The Point of the Illustration The unit of analysis is fundamental to the data reference interview. Early identification of the unit of analysis will help focus a search on statistics, aggregate data, or microdata.

The Point of the Illustration Furthermore, the unit of analysis is fundamental to secondary data analysis. It may be that knowledge of the unit of analysis is even more crucial in secondary analysis than in primary analysis, where the unit is implicit in the sample design, if not otherwise explicit.

The Point of the Illustration Finally, the unit of analysis is a fundamental characteristic of statistical data structures, which are the formal ways in which data are organized for processing.

Definitions The unit of analysis is the basic entity or object about which generalizations are to be made based on an analysis, and for which data have been collected

Definitions How does the unit of analysis relate to the unit of observation? The unit of observation is the entity in primary research that is observed and about which information is systematically collected.

Definitions The unit of observation and the unit of analysis are the same when the generalizations being made from a statistical analysis are attributed to the unit of observation.

Definitions Unit of Observation Unit of Analysis in original data collections, the unit of observation is determined by the method by which observations are selected Unit of Analysis the unit of analysis is determined by an interest in exploring or explaining a specific phenomenon

Identifying a Unit of Analysis As hinted in the earlier illustration, the unit of analysis is shaped by three attributes: Social Phenomena Time Space

Research Outputs Let’s begin by looking at a finished product to display these attributes. We’ll use a table from the Health Indicators Database about suicide.

Social Characteristics Geography and Time held constant

Geography and Age held constant Ordered by Time

Time and Age held constant Geography Emphasized

Social Phenomena observations of a single social entity, such as a person or an institution observations of multiple entities with a defined relationship, such as family, employer-employee

Social Phenomena transactional observations that are the result of actions among entities, such as labour strikes or international conflicts, including wars

Time observations made at one point in time; commonly referred to as a cross-sectional study

Time observations made at multiple points in time the data may be organized by time; commonly referred to as a time series time may structure some form of repeated measures of content or subjects

Space observations made within a specific spatial area observations made within a hierarchy of spatial areas

Complexity Complexity occurs when multiple types of entities are introduced within the same study. Examples parent  child  teacher person  activity  time person  car  trips

Complexity This complexity can arise within one of the attributes just discussed. a study of parents, children, and teachers, which are all social units or between attributes a study of people, their daily activities, and the length of time of each activity

Complexity Complexity is often represented in an hierarchy when the units can be grouped or nested within one another. For example, children may be grouped with their parents.

Complexity Children grouped (nested) with Parents. Parent 1 Parent 2

Complexity Parents and their children may be grouped into families and families grouped into households. Household 1 Family A Person i Person ii Household 2 Family A Person i Person ii

Complexity Complexity may also be represented by combinations of entities among units. Those entities that are associated with one another are combined and those that aren’t associated, aren’t combined.

Complexity These combinations are often described as having been crossed. For example, activities may be crossed with people.

Complexity X = Activities crossed with people. Activity 1 Activity 2 Person B Person A Person A Activity 3 Activity 6 Person B Activity 1 Activity 5

Complexity Up to this point, complexity has been described conceptually. We’ve mentioned how multiple units of analysis and the ways in which they are related can create complexity.

Complexity Complexity also manifests itself structurally through the ways in which data are organized to represent the nesting or crossing of multiple units of analysis.

Thinking about Units of Analysis Conceptually What is the content? This is what we’ve been reviewing up to this point. Structurally How is it organized? This takes us to a discussion about data structure.

Statistical Data Structure Let’s review basic data structure. The unit of analysis defines the underlying structure of a data file.

Statistical Data Structure This structure consists of a series of rows with each row containing the data of one member of the unit of the unit of analysis. This simple structure is known as the flat, rectangular data matrix.

Statistical Data Structure Case 1 Case 2 Case 3 * Case n Case n-1

Statistical Data Structure All of the information collected for each member of the unit of analysis is organized in a fixed location in the file called fields or variables.

Statistical Data Structure Case 1 Case 2 Case 3 * Case n Field 1 Field2 Field 3 Field k-1 Field k Case n-1

Statistical Data Structure Case 1 Case 2 Case 3 * Case n Field 1 Field2 Field 3 Field k-1 Field k Case n-1

Statistical Data Structure This structure looks like the grid of a spreadsheet. However, there is one very important difference between a statistical data structure and a spreadsheet.

Statistical Data Structure The spread sheet is organized around individual cells, while the statistical data structure is organized around the rows.

Statistical Data Structure Spreadsheet

Statistical Data Structure Spreadsheet Cell B2 Cell C5 Cell E3 Cell F7

Statistical Data Structure Row 1 Row 3 Row k-1

Statistical Data Structure The next slide presents the way that this simple statistical data structure appears in SPSS.

Row 1

Row 1 Row 8

Row 1 Row 8 Row 15

Field 8 Row 1 Row 8 Row 15

Person: GSS 10 Main 00001 1698957146206912669121413072202511 00002 2122943624103005230120703022303521 00003 617378410203706337121406032202511 00004 1519625424202804228069797974410620 00005 1695875212202303123521003022403121 00006 1737832824203806338649797971407550 00007 884349547103005230320703022403521 00008 760621824203005230069797971101570 00009 5814763024102604226369797973310620 00010 1234850712204407344949797972212570

WGHTFNL DVAGECAP DVSEX RECID PROV 00001 1698957146206912669121413072202511 00002 2122943624103005230120703022303521 00003 617378410203706337121406032202511 00004 1519625424202804228069797974410620 00005 1695875212202303123521003022403121 00006 1737832824203806338649797971407550 00007 884349547103005230320703022403521 00008 760621824203005230069797971101570 00009 5814763024102604226369797973310620 00010 1234850712204407344949797972212570

Adding Complexity to Data Structurally hierarchical : order & different record layouts for different units of analysis relational : 1 to n relations compound records : combination of units represented on each record

Complex Data Structure Hierarchical Data Structure Household 1 Person 1 Person 2 Household 2 Household 3 Person 3

Geography: 1991 Census N9101 Population 15 years and over by age groups (17) and marital status (6a), showing labour force activity (8) and sex (3) 4600000000000 000000 00000000 4600100000000 000000 00000000 4600100105024RM 024000 5010 82090000 4600100205024RM 024000 5010 82090000 4600100305024RM 024000 5010 82090000 4600100405027T 024000 5010 82090410 4600100505027T 024000 5010 82090410 4600100605027T 024000 5010 82090410 4600100705031RM 031000 5011 82100000 4600100805031RM 031000 5011 82100000

CSD Type CCS CMA/CA PROV FED EA CD CSD 4600000000000 000000 00000000 4600100000000 000000 00000000 4600100105024RM 024000 5010 82090000 4600100205024RM 024000 5010 82090000 4600100305024RM 024000 5010 82090000 4600100405027T 024000 5010 82090410 4600100505027T 024000 5010 82090410 4600100605027T 024000 5010 82090410 4600100705031RM 031000 5011 82100000 4600100805031RM 031000 5011 82100000

CSD Type CCS CMA/CA PROV FED EA CD CSD 4600000000000 000000 00000000 4600100000000 000000 00000000 4600100105024RM 024000 5010 82090000 4600100205024RM 024000 5010 82090000 4600100305024RM 024000 5010 82090000 4600100405027T 024000 5010 82090410 4600100505027T 024000 5010 82090410 4600100605027T 024000 5010 82090410 4600100705031RM 031000 5011 82100000 4600100805031RM 031000 5011 82100000

Complex Data Structure Relational Data Structure R1 C1 R1 C2 R1 R2 R3 R4 R5 R1 C3 R1 C4 R3 C1 R3 C2 R4 C1 R5 C1 One to Many R5 C2

Person: GSS 10 Union 0000111169122244421472240699799799779979 0000211130112194420772190699799799779979 0000311137122934421472930699799799779979 0000511123522094421072090699799799779979 0000611338622804410472800199999736019999 0000711130312034420772030699799799779979 0000831330021854421079970723099799780459 0000832330022353310979970726399799720289 0001011344921934420921930128799729350949 0001032344923202220879970736399799720439

UNIONTYP UNIONRNK RECID 0000111169122244421472240699799799779979 0000211130112194420772190699799799779979 0000311137122934421472930699799799779979 0000511123522094421072090699799799779979 0000611338622804410472800199999736019999 0000711130312034420772030699799799779979 0000831330021854421079970723099799780459 0000832330022353310979970726399799720289 0001011344921934420921930128799729350949 0001032344923202220879970736399799720439

GSS 10 Main GSS 10 Union 00001 16989571462069 00002 21229436241030 00003 6173784102037 00004 15196254242028 00005 16958752122023 00006 17378328242038 00007 8843495471030 00008 7606218242030 00009 58147630241026 00010 12348507122044 00001111691222444214 00002111301121944207 00003111371229344214 00005111235220944210 00006113386228044104 00007111303120344207 00008313300218544210 00008323300223533109 00010113449219344209 00010323449232022208

GSS 10 Main GSS 10 Union 00001 16989571462069 00002 21229436241030 00003 6173784102037 00004 15196254242028 00005 16958752122023 00006 17378328242038 00007 8843495471030 00008 7606218242030 00009 58147630241026 00010 12348507122044 00001111691222444214 00002111301121944207 00003111371229344214 00005111235220944210 00006113386228044104 00007111303120344207 00008313300218544210 00008323300223533109 00010113449219344209 00010323449232022208

Complex Data Structure Compound Data Structure R1 x T1 x A1 R1 x T2 x A4 R1 x T3 x A7 R1 x T4 x A3 R1 x T4 x A1 R2 x T1 x A2 R2 x T2 x A9

GSS 2 Episode 000041144504000800024010000000012518733 000041144308000900006011222220012518733 000041141709000930003031222220012518733 000041141709301100009031222220012518733 000041141211001330015011222220012518733 000041149113301630018011222220012518733 000041141216301800009011222220012518733 000041143018002000012031222220012518733 000041147920002015001541222220012518733 000041143720152130007531222220012518733

NO_EPISO ACT_CODE SEQNUM DDAY 000041144504000800024010000000012518733 000041144308000900006011222220012518733 000041141709000930003031222220012518733 000041141709301100009031222220012518733 000041141211001330015011222220012518733 000041149113301630018011222220012518733 000041141216301800009011222220012518733 000041143018002000012031222220012518733 000041147920002015001541222220012518733 000041143720152130007531222220012518733

NO_EPISO ACT_CODE SEQNUM DDAY 000041144504000800024010000000012518733 000041144308000900006011222220012518733 000041141709000930003031222220012518733 000041141709301100009031222220012518733 000041141211001330015011222220012518733 000041149113301630018011222220012518733 000041141216301800009011222220012518733 000041143018002000012031222220012518733 000041147920002015001541222220012518733 000041143720152130007531222220012518733