William Block, Director

William Block, Director
Cornell Institute for Social and Economic Research (CISER) Kathleen Weldon, Director of Data Operations and Communications Roper Center for Public Opinion Research

Lucci, Y., & Rokkan, S. (1957). A library center of survey research data: A report of an inquiry and a proposal. New York: Columbia University, School of Library Service. While the initial donation of Elmo Roper’s Fortune surveys to Williams College was the seed from which the Roper Center grew, in many ways the fundamental groundwork for Roper really came from this report from Lucci and a committee of early survey researchers came to the conclusion that for the field to grow, the work being done by academic and commercial polling organizations needed to be stored together in one place, so that researchers could conduct secondary analysis, merge data from multiple polls, and lay the groundwork for new data collection. In this report, emphasis is place, quite rightly, on the Center’s essential functions of collecting essential information about the methods of data collection and the classification of the study as a whole and the items within it, and on the need for cataloging to make searching, finding and comparing data as easy as possible, . And this remains Roper’s central challenge – one that we are using DDI to address.

Transparency Sample methodological information from a Pew study
Meets transparency requirements Important information provided so researchers can understand the data Survey organization, dates, sample sizes, margin of error, response rates, etc. This is how it comes to Roper Could be a photograph. Still provides the information, but it doesn’t present it in a form that makes finding this information easy when working with multiple studies That’s why archives – whether internal to an organization or an archive like Roper that preserves data from multiple organizations - create metadata based on this information

Standards: Consistency
Response Rate Metadata Post-processing at Roper Studies Submitted to Roper Geographic coverage Sample types Sample sizes Mode of data collection Language File formats Margin of error Survey dates Sponsors Survey organizations Funding agencies Variables PIs Topics, etc. Any one study will have many pieces of metadata attached to it – and the processing at Roper creates more as we document the steps taken to ensure that the files remain usable, confidentiality of respondents is protected, etc. Imagine each piece of metadata as a block in a grid. Like this. Here’s response rate, just one piece of metadata in a bunch of metadata related to this study. When studies are sent to Roper, metadata can be found in many locations and many formats. Roper applies standards to create consistency across studies.

Australian Data Archive dataset
DDI Standards: Machine Readability and Interoperability ICSPR dataset Australian Data Archive dataset Roper Center datasets This is helpful to us internally and allows us to create tools for users. For these purposes, any standard that makes the structure of the metadata and the identification of the fields consistent would work. Could make up our own. But what about the user who wants to write some Python code to compare information about multiple studies or even analyze the total collection? Will she be able to work with our internal metadata structure? Maybe. But it could be difficult, depends on what we choose for a standard. By using DDI, an xml-based standard, it becomes easy for researchers to work across multiple studies in our collection. And because other organizations are also employing this standard, this researcher can integrate studies from other locations using the same code. And if she wants to use submit her own dataset created from these datasets, with her own DDI metadata, all the better. Roper Center dataset

DDI Standards: Machine Readability and Interoperability
? ? But of course DDI doesn’t just work in one direction. As more and more survey organizations themselves adopt DDI, their submissions to Roper can become streamlined, allowing us to process more data more quickly and spend more effort on other aspects of archiving, like building new user tools. And survey organizations can utilize Roper data more effectively themselves. Similarly, individual researchers who do secondary analysis can work with our data and use DDI for documenting their work, which can then be returned to us efficiently for archiving and further reuse.

Metadata Sample: National adult including an oversample of year olds Sample Notes: This study contains sampling using landline telephones and cellular phones Sample Size: 1,821 Response Rate: Landline=AAPOR RR3: 8.7 percent, Cell=AAPOR RR3: 8.6 percent So let’s take a look again at that report from Pew. This is what Roper’s database of metadata for archived studies looks like now. It contains the necessary information in an organized and standardized form, but there’s a fair amount of information in each field. To make this conform with DDI standards, we have to introduce greater granularity in the metadata Estimated Sample Error: +/- 2.5 percentage points at the 95 percent confidence level

Granularity Sample Size: 1,821
Geographical Location: US Universe: Adult population Sample1: All Type Adults Sample2: Oversample Type Age 18-33 Sample: National adult including an oversample of year olds Sample Notes: This study contains sampling using landline telephones and cellular phones Sample Size: 1,821 Response Rate: Landline=AAPOR RR3: 8.7 percent, Cell=AAPOR RR3: 8.6 percent Estimated Sample Error: +/- 2.5 percentage points at the 95 percent confidence level Sample Size1: 1,821 Sample1, Mode1 and 2: Telephone interviews/landline and Telephone interviews/cell phone Sample Size2: 481 Sample1, Mode1: Telephone interviews/landline Sample Size3: 1,125 Sample1, Mode2: Telephone interviews/cell phone Sample Size4: 215 Sample2, Mode2: Screened cell phone (18-33 oversample) So now you have something that looks more like this. Each individual element of the methodology is broken out into its smallest understandable component. This opens up possibilities for research, allow information to be searched and compiled easily. You can see that in addition to breaking down the metadata into smaller components, we’re also including more information – for example, showing the sample sizes for each mode of data collection. This information was always available in the pdf documentation, but now it will be available in the metadata as well – more granular and more accessible. Mode1: Telephone interviews/landline Response Rate 1: 8.7% Response Rate 1 Definition: AAPOR RR3 Mode2: Telephone interviews/cell phone Response Rate 2: 8.6% Response Rate 2: AAPOR RR3 Margin of error: +/- 2.5 (percentage points)

DDI Standards Sample Size: 1,821
Geographical Location (GeographicLocation): US Universe (StudyUnitUniverseRef): Adult population Sample1 (SourceDescription) : All (Source Type) Adults Sample2 (SourceDescription) : Oversample (Source Type) Age 18-33 Sample: National adult including an oversample of year olds Sample Notes: This study contains sampling using landline telephones and cellular phones Sample Size: 1,821 Response Rate: Landline=AAPOR RR3: 8.7 percent, Cell=AAPOR RR3: 8.6 percent Estimated Sample Error: +/- 2.5 percentage points at the 95 percent confidence level Sample Size1 (NumberofResponses): 1,821 Sample1, Mode1 and 2: Telephone interviews/landline and Telephone interviews/cell phone Sample Size2 (NumberofResponses): 481 Sample1, Mode1: Telephone interviews/landline Sample Size3 (NumberofResponses): 1,125 Sample1, Mode2: Telephone interviews/cell phone Sample Size4 (NumberofResponses): 215 Sample2, Mode2: Screened cell phone (18-33 oversample) DDI does help to drive that push toward greater granularity – as does the AAPOR Transparency Initiative by encouraging survey research organizations to provide more information about methods. But again, we could achieve that without adopting DDI. But by adopting DDI standards, we now have a way of identifying each piece of metadata we are including that is consistent with other organizations and that expedites machine readability. We’re currently in betatesting for a data processing system that reads and writes to DDI. This processing system will be used internally, but there’s an external component as well, which will allow data providers to submit data. I’m going to show you some of what that looks like. Mode1 (ModeOfCollection): Telephone interviews/landline Response Rate 1 (SpecificResponseRate): 8.7% Response Rate 1 Definition (Description): AAPOR RR3 Mode2 (ModeOfCollection): Telephone interviews/cell phone Response Rate 2 (SpecificResponseRate): 8.6% Response Rate 2 (Description): AAPOR RR3 Margin of error (SamplingError): +/- 2.5 (percentage points)

Depositor information
Sponsor & survey organization First page of the deposit provides basic information about the depositor and the organizations involved in the survey – which also gives essential provenance information for the archive. Grant funding Submit single survey or a group or series

Geographical coverage
Title Dates On the second page, more basic information about the dates and geographical coverage, etc. Geographical coverage Description

Population under study (universe)
Sample Mode Now, here’s we get into the nitty gritty, with fields for universe, sample, mode, etc. Let’s take a look specifically at Sample Response rate Margin of sampling error

Using that Pew example, we can collect metadata on the different samples in the poll, providing as much information as possible about each

Similarly, we can break out the response rate by the modes of data collection, with information about how the response rate was calculated.

The final page is where files can be uploaded
The final page is where files can be uploaded. This is our external site for data providers, but our internal site for processing data is built around a similar model, though of course there is more metadata that we include than what is asked of data providers.

AAPOR Disclosure Elements
DDI Roper Name of the organization that conducted the survey StudyUnit.DataCollection.CollectionEvent.DataCollectorOrganizationReference The exact wording of the questions being released StudyUnit.DataCollection.QuestionScheme.QuestionItem.QuestionText Currently in PDF documentation (questionnaire) A definition of the population under study. StudyUnit.Universe.Label An explanation of how the respondents to the survey were selected StudyUnit.DataCollection.SamplingProcedure The method or mode of data collection StudyUnit.DataCollection.CollectionEvent.ModeOfCollection.TypeOfModeOfCollection The dates and location of data collection StudyUnit.DataCollection.CollectionEvent.DataCollectionDate StudyUnit.Coverage.SpatialCoverage.CountryCode A description of how the data were weighted (or a statement that they were not weighted), and any estimating procedures used to produce the final results Weighting (StandardWeight, TypeofWeighting) If the survey reports findings based on parts of the sample rather than the total sample, then the size of the subgroups reported should be disclosed StudyUnit.PhysicalInstance.StatisticalSummary.VariableStatistics.UnfilteredCategoryStatistics.VariableCategory.CategoryStatistic Currently available in dataset AAPOR’s disclosure elements – location in DDI – availability in metadata at Roper

AAPOR Disclosure Elements
DDI Roper Name of the survey sponsor StudyUnit.UserAttribute (key: StudySponsor, value OrganizationReference) The total sample size StudyUnit.UserAttribute key: Sampling, value: { sampleSize: 1500, partial: true, coverage: “text", probability: false, additionalInformation: “text" } Estimates of sampling error, if appropriate StudyUnit.UserAttribute key: SamplingErrorEstimate { samplingErrorEstimate: .05, ConfidenceLevel: .95, AdditionalInformation: “text" } In DDI 3.2, some items we want to track are not included. For example, the sort of complex sampling descriptions we need are not well-supported. DDI is working on a more comprehensive sampling scheme for 3.3. But DDI is flexible – you can extend the options with user-defined attributes. So we’ve created attributes that allow us to set a survey sponsor role for organizations, to describe a margin of error estimate with a confidence level, and to describe sample sizes for oversamples. Not included here, because it is not an AAPOR disclosure requirement, but we also use this flexibility to allow us to indicate the response for a portion of a sample and also provide the particular response rate definition – AAPOR 3, etc.

Thank you. (And please archive your data. Researchers of the future will thank you.)

William Block, Director

Similar presentations

Presentation on theme: "William Block, Director"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

William Block, Director

Similar presentations

Presentation on theme: "William Block, Director"— Presentation transcript:

Similar presentations

About project

Feedback