Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.

Similar presentations


Presentation on theme: "Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some."— Presentation transcript:

1 Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some of them. A Process Can Exist Anywhere Within the Data Lifecycle The Process “stage” of the data lifecycle is not limited to data preparation activities after Acquisition and before Analysis, but includes all data handling activities from obtaining data and initial storage, through basic data screening and preparation, iterating with data changes prompted during analysis, and culminating with actions that prepare data for long-term preservation and sharing. Processes may also be created for producing documentation, managing data quality, and data protection PROCESS Landing Page Strawman Data Processing covers any data manipulation activity resulting in the alteration or integration of source data, including the preparation of data for preservation and sharing. Process components can support retrieval, filtering, screening, transformation, translation, classification, transfer, and integration, among others. Data Processing typically produces data ready for use, but can also result in graphs and reports. PROCESS Process Documentation, Diagrams, and Workflow Tools Capturing and communicating information about how data were processed is critical for reproducible science. In addition to descriptive metadata, the use of flow charts, data flow diagrams, and workflow tools can help. The Importance of Standards to Data Processing The use of data standards facilitates the creation of automated data processing procedures and scripts. For example, the use of common data models provides a structural consistency for creating and sharing reusable process components and tools to serve maintenance and analytical needs for multiple projects using the same kind of data. ETL – Extract, Transform, and Load ETL is a term representing the overall process of moving data from one form or environment to another. ETL integrates and chains together processes that (1) gather data from a source, (2) screen and transform it, and (3) load it into a target data store. ETL processes are usually automated to support data warehouses, online portals, and integrated data environments such as The National Map.

2 Data Management: Data Processing PROCESS Landing Page Strawman Cont’d Examples of Data Processing at USGS USGS produces extensive datasets and interpretive products using a variety of data processing techniques and methods. This section provides examples of data processing for satellite imagery, sensor networks (earthquakes, real-time stream data), telemetry from ocean-going vessels and wandering animals, and for the production of aggregate datasets in portals and data warehouse access points. ETL – Extract, Transform, and Load ETL is a term used to represent a very common chain of integrated process activities. Extraction of data from one or more sources is followed by screening and transformation of the data into a form that is then loaded into a target data store. ETL processes are frequently automated and used to keep data current in online Portals, data warehouses, and integrated data environments such as The National Map. PROCESS Process Component Library Current best-practices for coding promote the creation of reusable modular components to manipulate datasets and other objects in a consistent and documented way. USGS shares components via GitHub and other venues. Process Automation and Scripting Data processing can range from a manual set of actions performed by a single person to meet specific research needs, to a fully automated operation using scripts or programs to ensure repeated production of high-quality datasets in a consistent and documented way. Automating even simple processes helps to provide documented consistency and repeatability, and generate necessary documentation. [R projects]

3 Data Management: Data Processing PROCESS Landing Page Strawman Cont’d What the U.S. Geological Survey Manual Says: Policies that apply to the Process stage largely deal with providing appropriate documentation of the methods and actions used to modify data from its raw form to the form used for research or produced for sharing. Metadata standards (FGDC, ISO) include sections for describing the ‘provenance’ of data, meaning that enough information is provided for the user to determine where data originated and what changes were made to get to the form being described. The USGS Manual Chapter 502.2 - Fundamental Science Practices: Planning and Conducting Data Collection and Research discusses the requirements for data documentation:Chapter 502.2 - Fundamental Science Practices: Planning and Conducting Data Collection and Research "Documentation: Data collected for publication in databases or information products, regardless of the manner in which they are published (such as USGS reports, journal articles, and Web pages), must be documented to describe the methods or techniques used to collect, process, and analyze data (including computer modeling software and tools produced by USGS); the structure of the output; description of accuracy and precision; standards for metadata; and methods of quality assurance." Further: "Standard USGS methods are employed for distinct research activities that are conducted on a frequent or ongoing basis and for types of data that are produced in large quantities. Methods must be documented to describe the processes used and the quality-assurance procedures applied." The USGS Manual Chapter 502.4 - Fundamental Science Practices: Review, Approval, and Release of Information Products covers the documentation of methodology:Chapter 502.4 - Fundamental Science Practices: Review, Approval, and Release of Information Products "Methods used to collect data and produce results must be defensible and adequately documented." Software Release --- put a reference here that describes how scripts and software that perform data processing need to be fully documented, reviewed, and released. PROCESS

4 Data Management: Data Processing PROCESS Landing Page Strawman Cont’d Recommended Reading: References: PROCESS

5 Sub Part Definition More Defs Etc Best Practices What the Survey Manual Says References Key Points Bubble Etc Recommended Reading 1 st Sublevel Page Special Call-out PROCESS


Download ppt "Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some."

Similar presentations


Ads by Google