Presentation is loading. Please wait.

Presentation is loading. Please wait.

CDI: DATA MANAGEMENT WORKING GROUP Heather Henkel and Viv Hutchison.

Similar presentations


Presentation on theme: "CDI: DATA MANAGEMENT WORKING GROUP Heather Henkel and Viv Hutchison."— Presentation transcript:

1 CDI: DATA MANAGEMENT WORKING GROUP Heather Henkel and Viv Hutchison

2 Outline  Data Management in USGS : Why?  Evolution of the Data Management Working Group  Monthly Presentations  Sub-Team Accomplishments  Powell Center Proposal and Alternative Results  FY12 Proposals

3 Data Management in USGS : Why? About the Data Rescue Program: “It is both a great and terrible thing that we have such a program at the USGS” (J. Faundeen, August 16, 2011) CDI Recognizes: Good data management is a prerequisite for data integration.

4 Data Management in USGS : Why? Credit: DataONE

5 Data Management Working Group Goals  The Data Management Working Group will:  seek mechanisms for incorporating data management into USGS science  develop ways to educate its scientists of its value  The group seeks to elevate the practice of data management such that it is seen as a critical partner in the pursuit of science in USGS

6 What is Data Management? “The business function that develops and executes plans, policies, practices and projects that acquire, control, protect, deliver and enhance the value of data and information.” Source: DAMA Dictionary of Data Management, 1st Ed.

7 Evolution of the Data Management Working Group  2010 CDI Meeting: formation of group  Monthly telecons with ~50 Working Group participants from across the USGS Mission Areas and partner agencies:

8 Data Management Working Group Wiki  my.usgs.gov/confluence/display/cdi/Data+Manageme nt+Working+Group  Monthly meeting notes, presentations, sub-teams, membership information

9 DMWG: Presentations  Basics of Using Mendeley - Natalie Latysh, USGS  EROS Scientific Records Appraisal Process - John Faundeen, USGS  Ocean Biodiversity Information System (OBIS) - Phillip Goldstein, University of Colorado-Boulder  USGS Professional/Profile Pages and Sharepoint – James Sayer, USGS

10 DMWG: Presentations  National Geological and Geophysical Data Preservation Program's Best Practices for Data Preservation Project - Brian Buczkowski, USGS  Data Dissemination thru Cloud Computing: the Next Generation of Data.gov - Ray Obuch, USGS  Data Dissemination thru Cloud Computing: the Next Generation of Data.gov - Ray Obuch, USGS  USGS Survey Manual Policy Development and Status on Policy of Interest to the CDI - Carolyn Reid, USGS  Presentation on DataBasin - Denny Grossman, Jim Strittholt, and Brendan Ward, Data Basin

11 DMWG: Monthly Meetings and Presentations  Topics covered during CDI DM WG Calls:  Charter for group  Powell Center Proposal  Coordination between Tech Stack Group  Development of three sub-teams (Policy (RGE/EDGE), Best Practices, Data Management Workshop/Meeting)  Discussion of abilities and specialties among CDI DM working group members  Encouraging people to create USGS Professional Pages to highlight data management work and experience  FY 12 proposals

12 DMWG Sub-team: Data Policy

13 DMWG Sub-Team: Data Policy Goals  Work toward formal incorporation of data management into the Survey Manual  Explore opportunities to relate good data management to the Research Grade Evaluation (RGE) and Equipment Development Grade Evaluation (EDGE)  Partner with Office of Science Quality and Integrity to:  review existing USGS policies on data management  help write new policies  provide feedback to RGE-EDGE processes

14 DMWG Sub-Team: Data Policy  Leads  Terry D'Erchia  Sally Holl  Participants  Christina Bartlett  John Faundeen  Robin Fegeas  Heather Henkel  Viv Hutchison  Scott McEwen  Carol Reiss  Elizabeth Sellers

15 DMWG Data Policy Sub-Team: Survey Manual Chapters  New Policy Chapters in Progress, in varying stages of review:  Survey Manual Chapter 502.x – Fundamental Science Practices: Metadata for Datasets and Information Products  Survey Manual Chapter 502.x - Fundamental Science Practices: Safeguarding Unpublished U.S. Geological Survey Data and Information  Survey Manual Chapter.XXX - Release of Computer Databases and Computer Programs

16 DMWG Data Policy Sub-Team: RGE Review  Where and how do we enable our scientists to report data management activities in RGE?  Actions:  Attended peer-review panel (March 2011)  Reviewed USGS RGE Enhancement Team charter (2007)  Interviewed RGE scientists  Reviewed RGE/EDGE Guides/Evaluation Forms  Presented to the RGE Panel (August 4, 2011)

17 DMWG Data Policy Sub-Team  Questioned 4 RGE scientists about data management practices and needs  Short-term Result: Input form created to discover and track data management needs of scientists

18 DMWG Data Policy Sub-Team  “Never quite sure a dataset is the most recent. Some datasets don't have an author/creator listed so tracking down if it is the most recent is a challenge.”  “Our data are located all over and it is a real effort just to locate data.”  “It should be obvious to a researcher how to cite a dataset. Peer-reviewed papers that point to how scientists find out about a dataset are critical.” ********************************************************************  Sub-Team learned from Scientists about their needs and could make recommendations to the RGE Panel based on findings

19 DMWG Data Policy Sub-Team: Recommendations to RGE Panel  Make it easier for USGS scientists to do data management (CDI)  Incentivize good data management thru RGE (RGE)  Make it easier to document data (CDI)  Allow easier reporting of data management in RGE– modify self-evaluation documents (RGE)  Develop criteria for RGE panel to recognize and reward good data management (CDI-RGE)

20 DMWG Data Policy Sub-Team: Next Steps  Continue work with RGE Coordinators on recommendations and on feedback from their input  Explore opportunities to include informatics professionals in the RGE/EDGE process  Assist in completion of Survey Manual chapters to publication  Assist in ‘refresh’ of new scientist Orientation Checklist and Exit Survey  Review Survey Manual for relevant data management language

21 DMWG Sub-team: Data Best Practices

22 DMWG Sub-team: Data Best Practices Goals  The Best Practices Sub-team was formed in early 2011 to:  compile a suite of best practices, lessons learned, and learning opportunities, regarding data management  organize this information and make it available through a website or portal

23 Participants  John Faundeen (Lead)  Brian Buczkowski  Tom Burley  Jennifer Carlino  Robin Fegeas  Dave Govoni  Heather Henkel  Sally Holl  Donn Holmes  Richard Huffine  Viv Hutchison  Tim Kern  Tim Mancuso  Elizabeth Martin  Scott McEwen  Ellyn Montgomery  Cassandra Ladino  Daniel Sandhaus  Steve Tessler  Jessica Thompson  Lisa Zolly  Joseph Kalfsbeek

24 Participants

25 Work Approach  Monthly Webex Sessions  March 9  April 6  May 4  June 1  July 6  August 3  Workshop  July 26-27 Reston

26 Beginning Steps  Step 1: Data Lifecycle Model…  Develop/adopt a data lifecycle model that accurately reflects how USGS science data does or should travel through its life.  Foundational for Sub-Team  Goals: Simplicity, Intuitive, Identify Roles “As the government looks to its plan for open government through the development of tools such as Data.gov, it is important to integrate these tools into the overall federal architecture and project lifecycle.” Harnessing the Power of Digital Data: Taking the Next Step. Scientific Data Management (SDM) for Government Agencies: report from the Workshop to Improve SDM held June 29 – July 1, 2010, Washington, DC.

27 Work Item: Data Life Cycle Model  “Literature Search”  Compilation  Review  NSF Workshop  USGS Workshop

28 Guidance  “The business function that develops and executes plans, policies, practices and projects that acquire, control, protect, deliver and enhance the value of data and information.” Source: DAMA Dictionary of Data Management, 1 st Ed.

29 Draft Data Lifecycle Model PLAN ACQUIRE & PROCESS ANALYZE PRESERVE PUBLISH/ SHARE

30 Output: Plan  Business Requirements  Data Management Plan  Metadata Inception  Propose  Documentation

31 Output: Acquire & Process  Ingest  Gather  Collection  Acquire  Discover  Evaluate  Assemble  Get  Generate  Create  Record  Monitor  Observe  Measure  QA/QC  Appraise  Transcribe  Organize  Process  Prepare  Integrate  QA/QC  Normalize  Transform  Evaluate  Transcribe  Format  Resample  Select  Sample  Organize  Package  Combine  Improve  Enhance  Structure

32 Output: Analyze  Analyze  Experiment  Interpret  Model  Test  Visualize  Appraise  Evaluate  Review  Conclude  Deduce  Question  Normalize  Synthesis  Discovery  Knowledge Transfer  Add Value  Understand  Enhance

33 Output: Preserve  Preserve  Transform  Transcribe  Migrate  Save  Protect  Store  Archive  Manage  Replicate  Package  Curate  Transfer  Rights Management  Control  Planning  Embargo  Rescue  Appraise  Select  Repository  Backup  Deposit

34 Output: Publish/Share  Share  Release  Submit  Knowledge Transfer  Prepare  Write  Author  Produce  Disseminate  Create  Distribute  Transfer  Present  Communicate  Upload  Package  Data Deposit  Deliver  Embargo  Web Serve  Select Repository  Access  Produce Share Embargo Release Disseminate Distribute Share Web Serve Select Repository Replicate Submit Discover Access Publish

35 CDI Data Blast Poster  “Write-On” Poster

36 DMWG Sub-team: Data Best Practices Next Steps  Digest Data Blast Comments  Receive CDI Sponsor Feedback  Assign Roles to Model  Finalize Graphic  Establish Science Review  FY12 Validation  Beginning of Outreach Effort  Communicate Final Model to USGS  Start Aligning Best Practices to Model  Determine Gaps

37 Funded FY11 Data Management Projects

38  Group convened December, 2011 to put together data management proposal to the Powell Center  Heather Henkel, Sally Holl, Viv Hutchison, Steve Tessler, Jessica Thompson, Lisa Zolly  Proposal not funded, but instead received support from Powell Center to have proposal funded at a higher (enterprise) level  Proposal modified and resubmitted  June 20th funding received from Core Science Systems (CSS) and CDI  Work begun in July

39 Funded Data Management Projects  Creation of data management website:  Provide one place for best practices, tools, education, key points, recommended reading, checklists  Internal initially, plans to expand to external site FY12

40 Funded Data Management Projects  Categorization of existing data management materials:  Creation of bibliography  Content for website  Purchase of Enterprise-wide license for  DAMA Dictionary of Data Management  DAMA Data Management Body of Knowledge

41 Funded Data Management Projects  DM training for team:  Expose team to same DM background  Build upon same core training  Intent to provide focused DM training to others, based upon initial training  DM Education Products:  Educate and encourage data management practices  Repurpose existing materials created through DataONE  Make available on website  DM Planning Tool:  Template to guide users through the creation of a DM plan  Build upon exiting work done by DataONE and USGS

42 FY12: Proposals from the Working Group

43 Moving Forward  Proposals requested from anyone within the CDI-DM working group  Initial discussion during monthly telecons  20 proposals submitted  Presentation of draft submissions during Tuesday afternoon’s working group session  Work done on combining similar proposals, tasks  Identification of cross-cutting tasks  Creation of slides for this presentation FY12 Proposals

44 Data Management Website (Phase 2)  Summary: A critical activity needed for data integration is well-managed data. Enhancement to the Phase 1 data management website will provide USSG researchers with the information they need about how to implement data management practices in their work.  Deliverables: Internal (eventually migrating to a public-facing), usability-tested, data management website to underscore the Bureau’s understanding of the importance of data management. USGS researchers have easy access to the standards, tools, and best practices that will ensure adherence to data management. FY12 Proposals

45 Data Management Framework  Summary: A critical activity needed for data integration is well-managed data. With a framework for USGS researchers to use to guide planning to preservation of their data, the USGS can offer better access to data ready for integration.  Deliverables: Cross-Mission Area, agreed-upon, framework for standardizing data management planning, ultimately resulting in improved access to and integration of research data products. Outreach and training materials will accompany the framework to facilitate communication about the framework to USGS scientists and science managers. FY12 Proposals

46 USGS Science Center Adaptable Data Management Plan Framework  Summary: Devise an adaptable baseline framework for science center data management plans:  Conduct analysis of existing USGS and external DMPs  Address project, data, and business model variations  Refine and test proposed framework through implementations at the Alaska (Integrated) Science Center and Texas Water Science Center  Deliverables: Publish a wiki version of the DMP framework to enhance future participation and development FY12 Proposals

47 Validate Data Life Cycle Model Summary: Because this model is intended to be the conceptual foundation from which our data management best practices, tools, policies and procedures will emanate, it is vital that it be reviewed extensively…  Directly Engage our Scientists & Management Deliverables:  Formal Science Review (all Regions & Mission Areas)  USGS-Wide Opportunity to Comment  Through Data Management Website  Town Hall Sessions (Reston & Denver as largest USGS numbers)  Communicate Final Model to Bureau (outreach element) FY12 Proposals

48 Data Preservation Mechanisms for USGS Researchers Summary: Identify and provide information to USGS researchers about available data preservation mechanisms they can use and where they can submit their critical data for preservation. Deliverables:  Summary reports of potential data preservation mechanisms and what USGS researchers need to participate in data preservation activities.  A data preservation webpage in the USGS Data Management website.  Identified most feasible data preservation mechanism(s) that USGS researchers can use to preserve their critical data and information on how to participate in those efforts. FY12 Proposals

49 National Vegetation Classification Standard Summary: In this proposal we have identified 3 possible sub- tasks related to the implementation of the National Vegetation Classification Standard (FGDC 2008). The content for the NVC is currently being developed through a variety of FGDC and Ecological Society of America Vegetation Panel. Each of the sub-tasks is a critical component of the full cyber- infrastructure needed to support the standard. Currently prototypes for several of these components exists but they have each been developed independently, and ultimately need to be linked in common framework. FY12 Proposals

50 National Vegetation Classification Standard (cont.) Deliverables:  Vegetation classification – interim database design  Supports content (community types and descriptions) being developed through grant funding and linking to the NVC website.  Vegetation Plot Database - migration plan  Provide a centralized database of vegetation plots (currently housed at NCEAS – VegBank) and linkages to existing plot databases in partner agencies (distributed network)  Peer Review Infrastructure – workplan  A prototype software exists – a data management work flow and document management system is needed. FY12 Proposals

51 Data Mgt – CHA CHA “like” Proposal Summary: A texting, Internet, and Chat based service for rapid response and networking USGS Data Management questions, activities, and support. Deliverables:  Network of <10 USGS Data Managers  Mechanisms to submit text, e-mail, chat, and web Q&A  Integration with USGS Data Management Site  Mobile Submission Application  Training for CHA CHA Experts  Promotion/Outreach/Education Materials  9 Month Evaluation of effectiveness, next steps, etc. Outcomes:  Network of USGS Data Managers to support Data Mgt.  Development of Architecture & Services for USGS CHA CHA Service.  An easy to use, multi-submission method, for rapid response to USGS Data Management questions & issues  More effective Data Mgt practices, awareness, & leveraging expertise FY12 Proposals

52 Data Integration Potential from Linking Monitoring Protocols Summary: Efforts to identify, collect, and characterize online monitoring protocol libraries will provide a valuable reference resource to USGS scientists and foster coordinated science and integration opportunities. Deliverables:  Centralized access to existing tools that collect documented monitoring protocols through the Data Management website  Leverage existing resources, expertise, technology, and content of existing efforts such as Natural Resources Monitoring Partnership, Pacific Northwest Aquatic Monitoring Partnership, and National Environmental Methods Index  Common elements identified that will enable interoperability among the systems  Leverage the Data Management website as a mechanism for collecting USGS scientist needs for specific protocols and promote additional content into the monitoring library resources FY12 Proposals

53 Quick Response Team to Web-enable Data Summary: High-level (OSTP, NSTC, DOI Secretary, USGS Director, USGS Assistant Director) initiatives require timely response from the agency of relevant data and tools. Mobilizing a team of a metadata creation expert and a Web/map service IT expert will assist scientists to address these data requests in a timely manner, and demonstrate USGS relevance and competency meet the information needs of the Department and higher. Deliverables:  Process developed for handling data-release activities that could be transferred to other data management activities under development  Undetermined number of datasets broadly available for specific purpose as well as ancillary benefits showcased  Leverage development of the GOS to Data.gov migration to develop a process that will be sustainable for future initiatives  Leverage thesauri and other metadata standards and existing tools  Leverage Document Production process of the Records Management Office FY12 Proposals

54 Develop A Data Standard Process For USGS, Using A TIME Standard As The Pilot Summary: Data Standards are generally lacking across the USGS landscape and the inconsistencies in how we name, describe, and populate various common data elements are impediments to effective data integration. There is currently no process in place on how to establish a data standard. How we name TIME fields and characterize our temporal data is critical to fostering data integration across the enterprise. Also, TIME is not a simple data element as temporal data can represent a full date-time, or only a year, month, day, time interval (range), or a timestamp in a data system, and concepts of ‘valid time’ also need to be considered (the time interval over which a value is valid). Deliverable:  Establish a formal process for proposing, evaluating, approving, and implementing a data standard within the USGS. TIME is a ubiquitous data element made up of date and time components, and can serve as the pilot data standard. FY12 Proposals

55 Write A ‘How To…’ Publication On How To Identify And Resolve Issues Involving Non-uniform (Mixed) Time Scales When Integrating Data For Research Use Summary: Facilitate best practices for the integration of data in temporal dimensions. Deliverables/Work:  Organize subject matter experts to outline and discuss the problems, existing solutions, and use cases at project and program levels.  Prepare a publication on how to identify and resolve these issues in order to use data from various sources and studies for USGS research. FY12 Proposals

56 Write A ‘How To…’ Publication On How To Identify And Resolve Issues Involving Non-uniform (Mixed) Spatial Scales When Integrating Data For Research Use Summary: Facilitate best practices for the integration of data in spatial dimensions. Deliverables/Work:  Organize subject matter experts to outline and discuss the problems, existing solutions, and use cases at project and program levels.  Prepare a publication on how to identify and resolve these issues in order to use data from various sources and studies for USGS research. FY12 Proposals

57 Survey of USGS Scientists about DM  Summary: In order to inform our future actions to assist USGS in the management of its data, a survey of current practices will help us to identify where USGS is performing really well, and where some gaps may exist that we can look to improve.  Deliverables: A USGS survey, leveraged from DataONE survey of scientists, with results compiled and analyzed. FY12 Proposals

58 Data Exit Survey for USGS Scientists  Summary: To prevent loss of information about data from exiting employees, an exit interview about the data is necessary in our administrative processes.  Deliverables: A USGS exit survey/interview, given to existing employees that asks such questions as “Has your data been archived?”, Is the metadata complete?”, “Where is the data located?” FY12 Proposals

59 Thank you! Questions? Comments?

60 Titles of Proposals  Data Management Website (Phase 2)  Data Management Framework for USGS  USGS Science Center Adaptable Data Management Plan Framework  Validate Data Life Cycle Model  Data Preservation Mechanisms for USGS Researchers  National Vegetation Classification Standard  Data Mgt – CHA CHA “like” Proposal  Data Integration Potential from Linking Monitoring Protocols  Quick Response Team to Web-enable Data  Develop A Data Standard Process For USGS, Using A TIME Standard As The Pilot  Write A ‘How To…’ Publication ---Time Scales  Write A ‘How To…’ Publication Spatial Scales  Survey of USGS Scientists about DM  Data Exit Survey for USGS Scientists


Download ppt "CDI: DATA MANAGEMENT WORKING GROUP Heather Henkel and Viv Hutchison."

Similar presentations


Ads by Google