ODaF Europe 2009 Virtual Research and Collaborative Center Pascal Heus, Open Data Foundation Tim Mulcahy, National Opinion Research Center

2 http://www.opendatafoundation.orgODaF Euope 2009 Background Demand for socio-economic data has grown dramatically in the past decade –Connectivity / network speed –Globalization / Economic crisis Access to microdata has improved –Better archiving / preservation –Adoption of metadata standards such as DDI and related practices But many challenges remain: –Discovery, access remain an issue (lack of visibility) –Usability: documentation is still an issue, complexity of datasets is a barrier –No community knowledge –Dataset are still typically made available using simple / static web based interfaces –There is a lack of researchers tools that leverage on metadata

3 http://www.opendatafoundation.orgODaF Euope 2009 Putting some ideas together… Internet technologies –Community driven virtual spaces are now very common –Social networking is widely accepted –User driven knowledge management works (for large groups) Social science –Large number of public datasets are available –Surveys can now be easily be documented using the Data Documentation Initiative –Metadata related XML technologies can significantly automate tasks and maintain linkages across the life cycle Researcher –User needs are different from the producers: they have a custom view of the data (their project) –Outputs should be preserved / captured / shared (not limited to a paper) –Need community space to foster dialog / share knowledge (within and outside research projects)

4 http://www.opendatafoundation.orgODaF Euope 2009 A Virtual Research and Collaborative Center Go beyond the static web site to provide dynamic, virtual research within a collaborative environment Leverage on Internet / XML technologies and metadata standards Provide virtual access to public use data (global) –Web-based remote access: for discovery, analysis, publication –Enhanced analytical tools: data and documentation customization –Advanced collaboration, communication and dissemination tools: community knowledge capture, collaboration, social networking, information sharing/reuse Approach –New tools based on DDI metadata and related standards –Leverage on Web 2.0 technologies –Provide research oriented environment –Build upon open source solutions

5 http://www.opendatafoundation.orgODaF Euope 2009 Researcher ServicesCollaborative Space My Datasets Create custom view of the data for use in project or sharing with community My Projects Bring together researchers in a virtual environment to share research ideas, data, documentation, and scripts. My Publications Package research outputs (papers, documents, scripts/programs, secondary data) for preservation, dissemination and sharing My Profile Provide individual background information, research interests, set privacy options and configure notifications services Wiki Capture knowledge surrounding the data. Initial content will be seeded with survey metadata. Communication Events and news, Community driven discussion groups, FAQ/Answers, Chat Library Searchable libraries of papers/references/documentation, scripts/programs, primary and secondary data. Most of the content is extracted automatically from the research space. Services Researcher Directory, Project Directory, Call for collaboration, Notification, Support, Training Infrastructure Primary and researcher data and metadata storage, databases, security (access, backups), web services Admin Services System and data usage reports, data/metadata management, user administration, etc. Home Welcome, background information, contact, simple access to public data and documentation

6 http://www.opendatafoundation.orgODaF Euope 2009 General features Everything is publicly available (read only) Registered users can manage research projects and contribute to the content –Registration will likely be based on OpenID (no need to create a new account) User will optionally provide (with privacy control) –Demographics: name, nickname, email, social networks –Affiliations: institutions, memberships –Academic background –Research interests

7 http://www.opendatafoundation.orgODaF Euope 2009 Analytical Tool: My Datasets Researcher rarely use the full set of variables available in a single survey Instead derived a virtual dataset off one of more data sources Description of virtual dataset can be captured using DDI like metadata Scripts to generate that particular view can then get automatically created for various statistical packages Benefits –Hides the complexity of merging, filtering, recoding files –Independent of statistical package –Customized documentation can be produced dynamically –Virtual datasets can be versioned, shared with other, refreshed with new data, etc. –This also provides valuable usage information to data provider

8 http://www.opendatafoundation.orgODaF Euope 2009 Analytical Tools: My Projects Provide virtual space for research team Brings together virtual datasets, documents, scripts, outputs, collaborative tools Primary Investigator can bring in collaborators Knowledge exchange tools: blog, IM, optional wiki File sharing tools: –Documents: referenced, research, outputs –Citations: within and outside project, –Scripts: shared research processes –Secondary data: microdata and aggregates –Can be marked for preservation / dissemination (see My Publications) –Can draw from community libraries Project description contains topics that provides valuable metadata for usage and collaboration

9 http://www.opendatafoundation.orgODaF Euope 2009 Dissemination Tools: My Publications Typically research output is a PDF –This is insufficient to meet Gary Kings Replication Standard –Leads to poor preservation and reuse Need tool to package as enhanced publications –For preservation: contains everything that needs to be archived (from My Projects) –For dissemination: contains all necessary information to reproduce research process (not just the paper) Files in projects can be marked for archiving and/or dissemination –Extra metadata can be provide for each file (Dublin Core citation, etc.) –Archived files will be stored for several years –Dissemination package will be made available on the web Research paper –Can be circulated for peer review –Will be shared with the community, can be automatically sent to libraries, citation repositories, integrated into printed publications, etc. Scripts can be automatically tagged with header, author, etc. Data can marked as intermediate, final, public, etc. Public usage, comments, ratings will be reported to PI

10 http://www.opendatafoundation.orgODaF Euope 2009 Discovery Tools: My Profile Looking for data or documents is a significant effort for researcher A metadata driven system can greatly alleviate by bringing the information to the user (rather than the other way around) Researcher profile will provide various subscription and notification tools based on research interest Examples: –Document becomes available on a specific topic or from a particular author/group –New or updated data becomes available on a specific topic –New research paper published using a specific dataset –Resarch project looking for collaborator or reviewers

11 http://www.opendatafoundation.orgODaF Euope 2009 Collaborative: Catalogs The center community space will contain several catalogs, libraries, directories Content will be derived automatically from research projects or contributed by users/providers Data catalog: simple and complex search for dataset / variables based on survey, time,geography, topics, etc. Document library: searchable collections of research papers, survey documentation, references/methodologies, etc. Script library: statistical programs shared by projects/users searchable by dataset, language, etc. Researcher directory: lookup other researchers by interest, profile, expertise, etc. Project directory: completed, ongoing and future research projects. Also a place to advertise research opportunities

12 http://www.opendatafoundation.orgODaF Euope 2009 Collaborative: Tools Wiki: classic community driven knowledge capture –Some of the content will be seeded automatically from DDI metadata to create pages per survey, file, variable, etc Classic tools: FAQ, news, events/calendar, chat, discussion forums Collaborative tagging: –folksonomies to capture researcher perspective/feedback at the survey, dataset, variable level –Rating/comments on papers, datasets, etc. And likely more….

13 http://www.opendatafoundation.orgODaF Euope 2009 Administration Various management tools will be implemented Reporting –User demographics –Data usage: most user variables, popular research topics, quality feedback, etc. –System usage: hits/visits, number of active projects, new papers, secondary datasets, etc. Management –Data / metadata maintenance –User/Group management

14 http://www.opendatafoundation.orgODaF Euope 2009 Implementation strategy Based on metadata standards Build as open source product (and leverage on OSS) Web service based architecture Virtual / cloud server environment to ensure scalability (processing and storage) Modular system to allow for incremental development Build upon other ongoing initiatives Not only a technological chalenge: need also to address organizational / legal issues

15 http://www.opendatafoundation.orgODaF Euope 2009 Status / Next steps Project at initial stage (concept note) Partnership NORC, ODaF and other agencies Will likely start at NORC using the General Social Survey (GSS) and possibly other public use files –In discussion with other producers Planning for prototype 4Q 2009 Other options being considered: –Use for non-public dataset –Add harmonization/comparability features –Extend functionalities to aggregate data (SDMX) –Link to geography (ISO 19115 and others) –Integrate statistical engine –Integrate disclosure control features

16 http://www.opendatafoundation.orgODaF Euope 2009 Conclusion Proposal to build innovative tools to provide a dynamic environment to perform research on survey microdata Based on metadata and open technology standards to ensure a generic solution Promotes sharing and reuse Facilitates preservation and dissemination of research outputs Foster collaboration and support community driven knowledge base Provides better understanding on the usage of the data For further information, contact –Tim Mulcahy, National Opinion Research Center (NORC), –Pascal Heus, Open Data Foundation (ODaF),


18 http://www.opendatafoundation.orgODaF Euope 2009 XML metadata specifications for socio-economic data Statistical Data and Metadata Exchange (SDMX) –Macrodata, time series, indicators, registries – Data Documentation Initiative (DDI) –Microdata (surveys, studies) – ISO 11179 –Semantic modeling, concepts, registries – ISO 19115 –Geography – Dublin Core –Resources (documentation, images, multimedia) –

19 http://www.opendatafoundation.orgODaF Euope 2009 The Data Documentation Initiative (DDI) International XML based specification for the documentation of social and behavioral data –Started in 1995, now driven by DDI Alliance (30+ members) –Became XML specification in 2000 (v1.0) –Current version is 2.1 with focus on archiving (survey/codebook) New Version 3.0 (2008) –Focus on entire survey Life Cycle –Provide comprehensive metadata on the entire survey process and usage –Aligned on other metadata standards (DC, MARC, ISO 11179, SDMX, …) –Include machine actionable elements to facilitate processing, discovery and analysis

