Aggregate Data and Statistics

Slides:



Advertisements
Similar presentations
DLI Orientation: Concepts
Advertisements

McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Extended Learning Module D (Office 2007 Version) Decision Analysis.
DDI for the Uninitiated ACCOLEDS /DLI Training: December 2003 Ernie Boyko Statistics Canada Chuck Humphrey University of Alberta.
DLI Orientation: Concepts A Framework for Thinking about Statistical Information Train the Trainers Montreal, March 9, 2004 Chuck Humphrey Data Library.
Unit 8: Presenting Data in Charts, Graphs and Tables
Contingency tables enable us to compare one characteristic of the sample, e.g. degree of religious fundamentalism, for groups or subsets of cases defined.
Relationships Between Two Variables: Cross-Tabulation
Using American FactFinder John DeWitt Project Manager Social Science Data Analysis Network Lisa Neidert Data Services Population Studies Center.
INSERT BOOK COVER 1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Excel 2010 by Robert Grauer, Keith.
Chi-Square and Analysis of Variance (ANOVA)
Labour Force Historical Review Sandra Keys, University of Waterloo DLI OntarioTraining University of Guelph, Guelph, ON April 12, 2006.
Household Projections for England Yolanda Ruiz DCLG 16 th July 2012.
© The McGraw-Hill Companies, Inc., Chapter 12 Chi-Square.
Chapter 18: The Chi-Square Statistic
Chapter 11 Other Chi-Squared Tests
Taking the Pulse of our Members: Creating a Healthy Data Services Environment Wendy Watkins Carleton University Michel Seguin Statistics Canada May, 2009IASSIST.
Chuck Humphrey Data Library University of Alberta.
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta September 29, 2008.
Demystifying Data Reference Helping non-specialists make sense of data.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 26, 2009.
Chuck Humphrey, Leah Vanderjagt and Anna Bombak University of Alberta The Winter Institute on Statistical Literacy for Librarians Demystifying statistics.
Chuck Humphrey & Lynne Robinson University of Alberta Surviving Statistics Strategies for dealing with statistical questions on the reference desk.
Chi-square Test of Independence
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library March 6, 2009.
Statistics and Data for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 27, 2008.
EAS 293 Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 14, 2008.
Problem 1: Relationship between Two Variables-1 (1)
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta ACCOLEDS 2007.
Organizing Your Data for Statistical Analysis in SPSS
1 Chapter 5: Creating Summarized Output 5.1 Generating Summary Statistics 5.2 Creating a Summary Report with the Summary Tables Task 5.3 Creating and Applying.
Modeling Possibilities
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Units of Analysis The Basics. Outline Definitions Elements of the unit of analysis Data structure.
XP 1 Excel Tables Purpose of tables – Process data in a group – Used to facilitate calculations – Used to enhance readability of output Types of tables.
Data and Social Research Chuck Humphrey Data Library Rutherford North Library.
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
Chuck Humphrey, University of Alberta Atlantic DLI Training, 2008 DLI Orientation: Concepts A Framework for Thinking about Data and Statistics.
1 The 2001 Census PUMFS Odyssey Sponsored by HAL and PALS Presented by Chuck Humphrey.
DLI Workshop -- Mar Hosted by Dalhousie University March 2000 DLI Training Workshop.
Project 6 Using The Analysis ToolPak To Analyze Sales Transactions Jason C. H. Chen, Ph.D. Professor of Management Information Systems School of Business.
BPS - 5TH ED.CHAPTER 6 1 An important measure of the performance of a locomotive is its "adhesion," which is the locomotive's pulling force as a multiple.
Framework of Statistical Information. This is a typology of the categories or classes of statistical information. Remember the relationship between statistics.
New and easier ways of working with aggregate data and geographies from UK censuses Justin Hayes UK Data Service Census Support.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Recap of data analysis and procedures Food Security Indicators Training Bangkok January 2009.
Units of Analysis The Basics. Outline An illustration Definitions Elements of the unit of analysis Complexity Data structure.
XP. Objectives Sort data and filter data Summarize an Excel table Insert subtotals into a range of data Outline buttons to show or hide details Create.
Project? Microdata? Say what? TRY Conference May 5, 2008 Suzette Giles, Ryerson University Laine Ruus, University of Toronto.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
Two-Way Tables Categorical Data. Chapter 4 1.  In this chapter we will study the relationship between two categorical variables (variables whose values.
Applied Quantitative Analysis and Practices
Aim: How do we analyze data with a two-way table?
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Sociology 343 Chuck Humphrey Data Library University of Alberta.
DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In prior chapters we studied the relationship between two quantitative variables with.
1 Chapter 3: Getting Started with Tasks 3.1 Introduction to Task Dialogs 3.2 Creating a Listing Report 3.3 Creating a Frequency Report 3.4 Creating a Two-Way.
Stretching Your Data Management Skills Chuck Humphrey University of Alberta Atlantic DLI Workshop 2003.
Data in context Chapter 1 of Data Basics. Frameworks Today, we will be presenting two frameworks for thinking about the content of data services. A.Statistics.
Hosted by the University of Regina Library December 1999 DLI Training Workshop Chuck Humphrey.
1 ES9 A random sample of registered voters was selected and each was asked his or her opinion on Proposal 129, a property tax reform bill. The distribution.
BMTRY 789 Lecture9: Proc Tabulate Readings – Chapter 11 & Selected SUGI Reading Lab Problems , 11.2 Homework Due Next Week– HW6.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Descriptive Statistics using R. Summary Commands An essential starting point with any set of data is to get an overview of what you are dealing with You.
Geo-referenced data and DLI aggregate data sources
Units of Analysis The Basics.
Descriptive Analysis and Presentation of Bivariate Data
Producing Descriptive Statistics
Chapter 18: The Chi-Square Statistic
Presentation transcript:

Aggregate Data and Statistics Wendy Watkins Carleton University Chuck Humphrey University of Alberta Title page Greetings: Good morning everyone, glad to be with you for this workshop. If you have questions through out our presentation please just ask and we can elaborate more on these questions you have. Statistics Canada Data Liberation Initiative CAPDU/DLI Training May 29th, 2002

Outline What are aggregate data? Why aggregate? How to aggregate? Computing exercise

What are aggregate data? Let’s start with the relationship between statistics and data.

Statistics and Data Data Statistics numeric facts/figures numeric files created and organized for analysis requires processing not ready for display Statistics numeric facts/figures created from data, i.e, already processed presentation-ready

Statistics and Data

Statistics and Data

Statistics and Data In short, statistics are created from data and represent summaries of the detail observed in the data.

What is aggregation? Building on this previous example, let’s explore aggregation. We see a table with the number of smokers summarized over categories for age, education, sex, geography, and different time points.

Categories of Periods A Statistic Categories of Sex Categories of Region

What is aggregation? Aggregation involves tabulating a summary statistic across all of the categories or levels of a set of variables.

The summary statistic The summary statistic in this example is the total number of smokers.

Variables and categories The variables and their categories are: Region (11): Canada and the ten provinces Age (5) : Total, 15-19, 20-44, 45-64, 65+ Sex (3) : Total, Female, Male Education (4) : Total, Some secondary or less, Secondary graduate or more, Not stated Periods (5) : 1985, 1989, 1991, 1994-95, 1996-97

Variables and categories The tabulation consists of determining the combinations of all categories across variables and then counting the number of smokers within each of these combinations. 11 x 5 x 3 x 4 x 5 = 3300 category combinations

Tabulating or aggregating One might be wondering if there is a difference between tabulating and aggregating? Usually, they are the same thing.

Tabulating = aggregating In creating tables from data, the variables are arranged in various combinations along the columns and the rows.

Tabulating = aggregating Placing multiple variables along the columns or rows is called nesting. Tables may have variables nested on both the columns and rows.

Categories of Sex nested within Periods

Categories of Education nested within Sex Categories of Sex nested within Region

A quick summary Up to this point, we have noted that statistics are created from data aggregations consist of tabulating statistics within the categories of select variables variables may be nested within columns and rows to display these tabulations

What are aggregate data? What is the difference between a tabulation or aggregation and aggregate data? The display of the aggregation, that is, the structure of the tabulated output.

What are aggregate data? A statistical data structure is a fixed, two-dimensional matrix with the variables in the columns and cases in the rows. V1 V2 V3 V4 V5 V6 V7 Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Case 7

What are aggregate data? Aggregate data require the same type of statistical data structure. Consequently, aggregate data are a special type of tabulation where variables are nested along the rows but not along the columns.

(11) (5) (3) (4)

Aggregate Data Structure To create an aggregate data structure for the example tabulation, the combination of categories representing geography (region), three social variables (age, sex, and education), and time (period) must all be nested along the rows, as shown in the previous slide.

Another example This time the table consists of the average length of stay in hospital by sex, age, diagnostic chapter, region, and time period.

Variables and categories diagnostic chapter : 19 levels sex : 3 levels age : 6 levels region : 13 levels period : 28 levels

Aggregate data structure The number of category combinations is equal to: 13 x 28 x 3 x 6 x 19 = 124,488 category combinations

Aggregate average length of hospital stays in days The aggregate structure is represented by the 124,488 cells created by the combination of all categories from these five variables. The statistic is the average length of stay in the hospital in days.

What are aggregate data? Definition: Statistical summaries over categorical variables representing social phenomena, geography, and time that are organized in a specific data structure.

Time series aggregate data When the data structure of the summaries is organized around time, these aggregate statistics are called a time series.

Time Series aggregate data structure

Annual Time Series

Geo-spatial aggregate data When the data structure of the summaries is organized around geography, we recognize these aggregations as geo-spatial or geo-referenced statistics.

Geo-spatial aggregate data structure

Province Census Divisions Census Sub-divisions

Why aggregate? Statistics Canada creates aggregate statistics from its major surveys, including the Census, as a way of publishing selected findings. The release of aggregate statistics is a partial safeguard against the possible disclosure of respondents.

Why aggregate? Furthermore, the geographic distribution of statistics in Canada is important. As a result, aggregate statistics are released by Statistics Canada for different levels of geography – from the nation to small areas.

Why aggregate? Statistics organized into time series is another way in which Statistics Canada publishes a large amount of statistical information. These time series reflect summaries of data that are repeatedly collected over time and permit studies about trends and change.

Why aggregate? To publish findings To safeguard against disclosure To provide geographic distributions of statistics To present statistics over time

Why aggregate? Other reasons to aggregate To modify geo-referenced statistics for GIS applications for example, finding postal codes within their corresponding EA and then aggregating data from the postal code level up to the EA level

Why aggregate? Other reasons to aggregate To change the unit of analysis for the purposes of a specific research question to create a common, higher-level unit of analysis that can be used in merging files

How does one aggregate? Identify the grouping structure that represents all of the variables and their categories over which the aggregation is to be conducted. This group structure defines a new unit of analysis.

How does one aggregate? Establish the sort order for the grouping variables, i.e., decide which variable increments the fastest, the next fastest, until you reach the variable that changes the slowest. Select the summary statistics, such as sums, averages, minimums, maximums, etc.

How does one aggregate? The actual aggregation is performed using statistical software such as SAS or SPSS. SAS offers a couple of different procedures and the Data step that can be used to aggregate data, including Proc Summary, Proc Tabulate, and Proc Means.

How does one aggregate? SPSS has the Aggregate command.

Aggregation Only nesting of the row variables Multiple levels of geography and time Aggregation

Tabulating = aggregating Furthermore, geography and time may not play a prominent role in the data and consequently, tables from these data will not include variables for geography and time.

Tabulating Aggregating Geography and time are each a single category Sex and age nested in the column variables Tabulating Aggregating