Patricia RugglesCatherine Ruggles 240-350-6457213-324-4234 Managing and Analyzing Longitudinal Data.

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

Support.ebsco.com EBSCOhost Collection Manager Approver Account Functions Tutorial.
Retrieving Full Text Articles in EBSCO Databases A guide for Allied Health Professionals Medline, CINAHL and SPORTDiscus on the Electronic Health Library.
CRB Database Introduction Press F5 to maximise this presentation.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
Welcome to this Basic Tutorial designed as a guide to the simple searching of the CABI Databases using the CAB Direct interface. To view this tutorial.
Tools of the Trade: An Introduction to SPSS Presenter: Michael Duggan, Suffolk University
©2004, 2006, 2008 UIW Department of Instructional Technology Meat and Potatoes SPSS Presented by Terence Peak.
Epidemiologic study designs
Welcome to Florida International University Online J.O.B.S. Link Applicant Tutorial.
WELCOME TO THE ANALYSIS PLATFORM V4.1. HOME The updated tool has been simplified and developed to be more intuitive and quicker to use: 3 modes for all.
17a.Accessing Data: Manipulating Variables in SPSS ®
1 ADVANCED MICROSOFT POWERPOINT Lesson 5 – Using Advanced Text Features Microsoft Office 2003: Advanced.
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
Welcome to the Brookdale Community College Online Employment System Applicant Tutorial.
Customizing Word Microsoft Office Word 2007 Illustrated Complete.
1 Welcome to the Colgate University Online Employment System Applicant Tutorial.
Version 4 for Windows NEX T. Welcome to SphinxSurvey Version 4,4, the integrated solution for all your survey needs... Question list Questionnaire Design.
Access Tutorial 3 Maintaining and Querying a Database
The Research Process. Purposes of Research  Exploration gaining some familiarity with a topic, discovering some of its main dimensions, and possibly.
Basic Concept of Data Coding Codes, Variables, and File Structures.
Garland Library Online Orientation. Introduction  This portion of the Online orientation is intended to help library users gain the basic knowledge and.
Using Microsoft Outlook: Basics. Objectives Guided Tour of Outlook –Identification –Views Basics –Contacts –Folders –Web Access Q&A.
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
EASY TEAM MANAGER By Dave Abineri EASYWARE: PO Box 231, Milford, OHIO (Cincinnati) Phone: (513) Use UP arrow to move to the NEXT slide Use.
Integrate your people maximize your knowledge Tel SalesBase Customer.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. M I C R O S O F T ® Animating and Using Multimedia Effects Lesson 10.
Welcome to the University of Florida Online Employment System Applicant Tutorial.
Welcome to the Alaska Statewide System Online Employment System Applicant Tutorial.
Introduction to Systems Analysis and Design Trisha Cummings.
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
Getting started on informaworld™ How do I register my institution with informaworld™? How is my institution’s online access activated? What do I do if.
XP New Perspectives on Introducing Microsoft Office XP Tutorial 1 1 Introducing Microsoft Office XP Tutorial 1.
XP New Perspectives on Microsoft Access 2002 Tutorial 51 Microsoft Access 2002 Tutorial 5 – Enhancing a Table’s Design, and Creating Advanced Queries and.
DE&T (QuickVic) Reporting Software Overview Term
11 Exploring Microsoft Office Excel 2007 Chapter 4: Working with Large Worksheets and Tables Chapter 04 - Lecture Notes (CSIT 104)
1 State Records Center Entering New Inventory  Versatile web address:  Look for any new ‘Special Updates’ each.
4/22/2017 5:36 PM EViews Training Creating Workfiles.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
TheDataWeb & DataFerrett Rebecca Blash Bill Hazard The DataWeb Applications Branch U.S. Census Bureau.
Instructors begin using McGraw-Hill’s Homework Manager by creating a unique class Web site in the system. The Class Homepage becomes the entry point for.
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
Key Applications Module Lesson 21 — Access Essentials
XP New Perspectives on Microsoft Access 2002 Tutorial 1 1 Microsoft Access 2002 Tutorial 1 – Introduction To Microsoft Access 2002.
MS Access 2007 Management Information Systems 1. Overview 2  What is MS Access?  Access Terminology  Access Window  Database Window  Create New Database.
Enhancing Forms with OLE Fields, Hyperlinks, and Subforms – Project 5.
Dr. Fowler AFM Unit 8-1 Organizing & Visualizing Data Organize data in a frequency table. Visualizing data in a bar chart, and stem and leaf display.
Comparison of different output options from Stata
Computing Fundamentals Module Lesson 7 — The Windows Operating System Computer Literacy BASICS.
Office of Housing Choice Voucher Program Voucher Management System – VMS Version Released October 2011.
Independent Living Services and Outcomes Reporting Christine Lenske Beth Rudy.
14b. Accessing Data Files in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
16a. Accessing Data: Means in SPSS ®. 16a. Accessing Data: Means in SSPS ® 1 Prerequisites Recommended modules to complete before viewing this module.
Analysis of Experiments
Advanced Website Training: June, 2010 Insert Images as Your Background Using Google Docs for Document Hosting Custom Contact Forms on Your Website.
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
When the program is first started a wizard will start to setup your Lemming App. Enter your company name and owner in the fields designated “Company Name”
17b.Accessing Data: Manipulating Variables in SAS ®
Personal Planning System The Adding a Program to the Library Enterprise Edition.
Destiny: Your Library’s Online Catalog Finding Books, Magazines Websites & More! Basic Search Power Search Visual Search Destiny Quest.
The NCCS Data Web: An Introduction The National Center for Charitable Statistics at the Urban Institute January.
NextGen Trustee GL/Accounting This class will cover NextGen Financial Management for Trustee Offices. We will look at GL accounts, Transactions, Bank Reconciliation,
NOODLETOOLS SIGN-IN Student ID #
Data Virtualization Demoette… ODBC Clients
Project Management: Messages
What’s New in Colectica 5.3 Part 1
Benchmark Series Microsoft Word 2016 Level 2
Secondary Data Analysis Lec 10
Tutorial 7 – Integrating Access With the Web and With Other Programs
Presentation transcript:

Patricia RugglesCatherine Ruggles Managing and Analyzing Longitudinal Data COPAFS Quarterly Meeting June 1, 2012

Longitudinal Data are Hard to Use Longitudinal databases tend to be very complex Complex documentation and record linkage issues: searching and understanding variable lists, record structures, and other features requires patience and persistence Creating analysis files typically involves major data restructuring Files are often hierarchical as well as linked across time periods; variables need to be moved across record types, new variables need to be created involving more than one record type, etc. Longitudinal analyses involve complex relationships across records and variables and therefore can be conceptually difficult to plan and carry out

Results: Under-use and Misuse Analysts shy away from using large longitudinal data sets such as SIPP because understanding and restructuring the data is frustrating, expensive and time-consuming When such datasets are used it is often for cross- sectional rather than longitudinal analysese.g., topical modules in SIPPor to compare two points in time, rather than to examine patterns of activity over time As a result: under-use, funding difficulties, low return on our investment in data collection and preparation

Longitudinal Analysis Steps Step 1: Understanding the Data Explore metadata and data and choose appropriate variables Step 2: Preparing Data for Analysis Recode and create variables as necessary Step 3: Performing Analyses Perform cross-sectional and longitudinal analyses as desired

Step 1: Understanding the Data Many longitudinal datasets are very large and not necessarily well documented For example: The 2008 SIPP has 48 months of data on just under 120,000 unique individuals, and contains more than 1000 variables Documentation exists in many places, but it can be hard to link specific variables to the appropriate questions in the questionnaire, and to understand issues such as the universe to which each variable applies A key need for longitudinal data users, therefore, is a better way of exploring the available data and linking it to the appropriate metadata Orlin has made the ability to search and understand both data and metadata a key feature of our system Lets do a quick tour of the data and metadata exploration system

The Welcome Page

Variable List for SIPP

Exploring SIPP Metadata and Data To see the available variables, click on the person-month record type in the metadata tab on the Welcome Page There are over 1000 variablesone of the things that makes SIPP hard to use! To find a specific variable, type its name or any other identifying information in the search box This brings up all variables meeting the search criteria e.g., typing employment will bring up the 39 variables relating to employment, along with their labels and codes To select a specific variable, click on it Will show its codes, frequencies, and summary statistics Also, hyperlinks to related variables and to all citations for this variable in questionnaires, code books, and user guide

Variable Search Results: Employment Status Recode Variable

Viewing the Data In addition to hyperlinks to other metadata, the metadata are linked directly to the data For exampleclicking on the number of cases with a specific code value in the frequency table will bring up all the case records with that value Users can choose which variables on those records they wish to inspect, using a drop down check list This aids in debugging, understanding complex variable recodes

Finding the Information You Need The search and hyper-linking features of the Orlin System address the first of the difficulties in working with SIPP discussed earlier in our presentation Many users give up before they even get to longitudinal analysis, because it can be so hard to find the right variable and its associated documentation SIPP documentation is still a bit patchy, but by hyper- linking all existing documentation for every variable the Orlin System makes it much easier to understand exactly what the variable means The system also includes a global search function, which allows users to search across all aspects of the system for any specific phrase or term

Step 2: Preparing Data for Analysis Longitudinal data require substantial manipulation and recoding before analysis, even after finding the right variables Creating usable data extracts that preserve necessary information on relationships between units of analysis and their individual components can be complex even in cross-sectional data Adding a time dimension means moving information across both record types and points in time Sample attrition, the addition of special supplements, inconsistencies in responses across waves of the survey, and weighting problems pose additional difficulties Users need help in understanding and dealing with these issues

The Longitudinal Unit of Analysis Longitudinal Surveys such as SIPP, the Health and Retirement Survey, etc. typically contain data on several potential units of analysis or record types, such as households, persons, welfare units, medical records, etc. For most types of longitudinal analysis, only units that are unchanging over time can be usefully linked across time For example cant link households over time because they change too much from period to period For most demographic surveys the person-month (or person-year) record is the basic longitudinal unitsimply a string of linked records across time for each person Information from associated units or record types must then be linked to the longitudinal unit at the appropriate point in time

Restructuring Longitudinal Data Creating the necessary links is very difficult using sequential data processing packages such as SAS The process will require several steps, each of which means a new pass through the data set For example, to track each persons household income in each month of the survey using SAS: 1. Find the correct household for person 1 this month 2. Create a summary variable for household income that month 3. Attach that variable to the person-record for that month 4. Repeat for next month for person 1 5. After creating household income variables for each month for person one, repeat for persons 2 – 50,000 This gets old fast, especially because it has to be repeated for many variablesage of head, welfare recipiencyand for many record typessubfamilies, welfare units, etc.

The Orlin Approach to Restructuring Data The Orlin system uses database technology to keep track of variables and their linkages across both record types and time This greatly simplifies the process of transforming variables as needed, creating new variables, and making sure that all variables are useable appropriately in longitudinal analyses This also simplifies the process of recoding variables and performing other data transformations that are typically needed in both cross-sectional and longitudinal analyses

SIPP Data Structure in the Orlin System We will use the 2008 SIPP panel to illustrate how the Orlin restructuring system works. The basic record type is the person-month record, which is the series of all of the months of data for a specific person. We have also created records for each unique person, family or household that ever appears in the panel. Records are stored in a database system that understands their linkages, which makes it easy to create variables that draw on data from different record types or different points in time.

Preparing Data for Analysis Finding the right variables is only the first step Even in cross-sectional analyses, variables may need to be recoded for a specific analysisfor example, by collapsing the number of codes Sometimes new variables need to be created by combining information from two or more existing variablesfor example, using income and family size to calculate equivalent income across different families Sometimes information on other people must be used in conjunction with variables on the person-month recordfor example, to identify workers with pre-school children All of these examples require data transformations and the creation of new variables

Data Transformations The Orlin System allows intelligent data transformations because records are linked internally in a database, and the system understands those links Transformations such as recodes and the calculation of new variables require two steps in the Orlin System: First, the new variable is defined, using the systems templates Second, when a satisfactory definition has been created, it is run on the data to actually create the variable New variables can be created using either a small sample of about 35,000 person-month records, or the full sample, which includes about 2.6 million records. The small sample runs in the foreground and takes up to 5 mins. The full sample runs in the background and takes considerably longer, depending on the complexity of the transformation.

Creating a New Variable Definition The first step in transforming data is to define the new variable you want to create Second step: Run the new definition on the data to create the new variable Orlin automatically tracks every change, every new definition, and all output

Template for Variable Definition

Example: Run Variable Creation for ANY_WORK

Audit Trail

Complex Transformations A particular strength of the Orlin System is its ability to handle complex data transformations, such as creating variables that use data from different record types and/or different months Example: creating AVERAGE_EARNINGS for an individual across all months of the panel Create new variable definition as before, specifying new variable name and source variable (TPEARN) Select create a complex variable Select average under function type Select sample and run

Example: Complex Data Transformations

Step 3: Performing Analyses After transforming our data as needed, we are ready to analyze them To analyze data using the Orlin System, press the Analyze button on the home page button bar Specific analyses such as crosstabs, regressions, and duration analyses can be performed by clicking on the appropriate button A template will appear asking for the information needed for the requested analysis: for example, for a regression, the type of regression, the dependent variable, and the independent variables Analyses use the R statistical system Results of data transformations can also be exported for analysis in statistical packages such as SAS, SPSS and Stata

Example: Regression Results

Longitudinal Analyses In addition to standard cross-sectional analyses, the Orlin System allows various types of time-related analyses In particular, it can perform two main types of longitudinal analysis: Analysis of transitionschanges in state such as moving from employment to unemploymentand the relationship of such changes to other variables or other changes Analysis of spellsperiods of time over which a changed state persists, such as a spell of unemploymentand the effects of other variables on the duration of such spells

Defining Transition Variables Clicking on the Create Transition Variable button in the transform area brings up a template that allows the user to define the specific state change of interest Example: STOP_WORK This variable is defined as a change from the status of working to the status of not working It uses the ANY_WORK variable we previously defined The user can choose to identify either in the last month worked or the first month not working, by choosing to compare to the previous or following month The variable uses the time variable SEQUENCE, which is simply the sequence number of the month (eg, 32 for the 32 nd month in the panel)

Create a Transition Variable: STOP_WORK

Defining Spells A spell is a period of time defined by two transitionsinto the state of interest (such as unemployment), and out of the state A spell may occur even if only one transition is observed if for example someone becomes unemployed but the panel ends before the unemployment spell does Such as spell would be right-censoredno ending can be observed Spells can also be left-censoredan ending is observed, but no beginning Statistical techniques exist to analyze spells durations, accounting for censoring

Duration Analysis Standard duration analyses essentially calculate the proportion of all those observed in a spell at a given point in time who exit the spell at that pointin other words, the hazard of leaving the spell Analyses can take into account the effects of various independent variables on predicted durations The Orlin System allows a variety of different models to be explored All of these duration models operate on the spell record

Create a Spell Record

Spell Record Variables

Example: Spell Records

Analyzing Spell Records The basic spell record includes only basics relating to the spell itself To analyze durations in conjunction with anything else, therefore, the independent variables of interest have to be moved to the spell record This can be done using the create variable definition screen, choosing the option to move a variable

Duration Analysis

Duration Analysis: Results

Conclusion Analyzing longitudinal datasets requires three steps: Finding the appropriate information Restructuring it for longitudinal analysis Performing the analysis and examining the results All of these are hard to do using analysis packages such as SAS, Stata or SPSS The goal of the Orlin System is to simplify all three steps We link and provide search capabilities across data and metadata We use database technology to keep track of both data and metadata, cross-sectionally and over time We provide easy-to-use templates to guide the analyst through the entire process If you are interested in learning more or becoming a beta user, see our website, or contact uswww.orlinresearch.com

Thank You!