Introduction to Data Science Section 2 Data Matters 2015 Sponsored by the Odum Institute, RENCI, and NCDS Thomas M. Carsey 1.

Slides:



Advertisements
Similar presentations
Data Documentation Initiative (DDI) Workshop Carol Perry Ernie Boyko April 2005 Kingston Ontario.
Advertisements

Some Core Values, Principles, and Assumptions to Guide the Work.
An Leabharlann UCD Órna Roche UCD James Joyce Library Metadata Documenting your data
Lesson Overview 1.1 What Is Science?.
Fluff Matters! Information Governance in an Online Era Lisa Welchman.
Basic Research Methodologies
Week 3 (Sep12. 06) Introduction to Action Research.
Introduction to Implementing an Institutional Repository Delivered to Technical Services Staff Dr. John Archer Library University of Regina September 21,
Reducing Metadata Objects Dan Gillman November 14, 2014.
Improving access to digital resources: a mandate for order mandate: managing digital assets in tertiary education craig green,
Introduction to Data Science Section 3 Data Matters 2015 Sponsored by the Odum Institute, RENCI, and NCDS Thomas M. Carsey 1.
The Tools of Environmental Science
Research Methods and Design
Introduction to Data Science Section 1 Data Matters 2015 Sponsored by the Odum Institute, RENCI, and NCDS Thomas M. Carsey 1.
Publishing Digital Content to a LOR Publishing Digital Content to a LOR 1.
Integrating Digital Curation in a Digital Library curriculum: the International Master DILL case study Anna Maria Tammaro University of Parma Florence,
Striving for Quality Using continuous improvement strategies to increase program quality, implementation fidelity and durability Steve Goodman Director.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Final Search Terms: Archiving (digital or data) Authentication (data) Conservation (digital or data) Curation (digital or data) Cyberinfrastructure Data.
Advanced Topics in Requirement Engineering. Requirements Elicitation Elicit means to gather, acquire, extract, and obtain, etc. Requirements elicitation.
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
INTERNATIONAL SOCIETY FOR TECHNOLOGY IN EDUCATION working together to improve education with technology Using Evidence for Educational Technology Success.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
Lesson 1. Understanding Science What is scientific inquiry? What are the results of scientific investigations? How can a scientist prevent bias in a scientific.
THE NATURE OF SCIENCE -The Scientific Method -Technology – Using Science to Explore.
The Digital Library for Earth System Science: Contributing resources and collections Meeting with GLOBE 5/29/03 Holly Devaul.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
System Dynamics Simulation Slobodan P. Simonovic.
Introduction to metadata
THE NATURE OF SCIENCE CHAPTER 1 SECTION 1. SCIENCE JOURNAL Open your book to page 4. Read the title, section headings and main ideas. Read the Yellow.
Best Practices for Digital Imaging and Metadata Roy Tennant The Library, University of California, Berkeley
ECE450 - Software Engineering II1 ECE450 – Software Engineering II Today: Introduction to Software Architecture.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 5 Theory, Research, and Evidence-Based Practice.
The Scientific Method An approach to acquiring knowledge.
Review of the Scientific Method Chapter 1. Scientific Method – –Organized, logical approach to scientific research. Not a list of rules, but a general.
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
Digital Library Repositories and Instructional Support Systems: Repository Interoperability Working Group Leslie Johnston University of Virginia Library.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Welcome to Physics--Jump in!
Science Basics SNC2D. What is Science? When asked this question most students would immediately start to visualize Albert Einstein. But, is that what.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Research for Nurses: Methods and Interpretation Chapter 1 What is research? What is nursing research? What are the goals of Nursing research?
Le parc japonais est beau et calme La fille japonaise est belle mais bavarde Ritsurin Park, Takamatsu.
Thomas G. Cummings Christopher G. Worley
© Andrew F. Siegel, 1997 and 2000 Irwin/McGraw-Hill 1-1 BQT 173 BUSINESS STATISTICS.
Chapter 2 Notes Ms. Sager. Science as Inquiry What is Science? – Word derived from Latin – means “to know” – A way of knowing – How to answer questions.
Introduction to Physical Science Chapter 1 The Nature of Science.
An Extension of Table Lens CPSC 533 Information Visualization Course Project, Term 2, 2003 Fengdong Du.
IPDA Architecture Project International Planetary Data Alliance IPDA Architecture Project Report.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Chapter 1 The Science of Biology. Goals of Science to provide natural explanations for events in the natural world. to use those explanations to understand.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Chapter 1 – The Study of Life
Chapter 2: Measurements and Calculations
IT Service Management – main terms and definitions
Bell-Ringer! Think about the skills a scientist uses when designing an experiment. What are some skills you think are essential in the scientific process.
Summit 2017 Breakout Group 2: Data Management (DM)
Tuesday August 23,2016 Notes –Binder Check - 08/14, every work should be completed. GPS – SEV5. Students will recognize that human beings are part of the.
Learning Objectives I can construct and organize data into tables.
UNIT 2 – CHAPTER 2 – LESSON 7 Introduction to Data.
Physical Science Chapter 1.1.
Unit 1 Lesson 3 Representing Data
Data Management: Documentation & Metadata
Attributes and Values Describing Entities.
Project Management Process Groups
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
Presentation transcript:

Introduction to Data Science Section 2 Data Matters 2015 Sponsored by the Odum Institute, RENCI, and NCDS Thomas M. Carsey 1

The Data Lifecycle 2

Data Science is More than Analysis Data analysis gets most of the attention in data science. In that sense, many people struggle to distinguish data science from applied statistics. Analysis is obviously important, but statistical analysis skills are only useful if the data can be collected in put in a usable form. Data Science is much broader than just data analysis. 3

The Data Lifecycle Data science considers data at every stage of what is called the data lifecycle. This lifecycle generally refers to everything from collecting data to analyzing it to sharing it so others can re-analyze it. – In fact, it includes the planning process that should be in place before any other work begins. New visions of this process in particular focus on integrating every action that creates, analyzes, or otherwise touches data. These same new visions treat the process as dynamic – data archives are not just digital shoe boxes under the bed. There are many representations of the this lifecycle. 4

5

6

7

8

Lessons from the Lifecycle Data Science is more than just data analysis. Effective data science requires – Planning – Vision – Storage – Interoperability of systems – A team approach – Adaptability and Scalability 9

What is Missing? Most definitions of data science underplay or leave out discussions of: – Substantive theory – Metadata – Privacy and Ethics – Greater Consideration for missing data, representativeness, and uncertainty – More thinking about the proper Null hypothesis – Leadership on leveraging data science for the public good 10

Substantive Theory 11

The Data Generating Process (DGP) Most of the time we don’t care about the data itself. Most of the time we are trying to learn something about an underlying process that produces the data – a DGP. Technically trained folks might be good at uncovering patterns in data, but you need substantive expertise to: – Know where to look in the first place – Know what to look for – Know what you find actually might mean 12

What is the DGP? Good analysis starts with a question you want to answer. – Blind data mining can only get you so far, and really, there is no such thing as completely blind mining Answering that question requires laying out expectations of what you will find and explanations for those expectations. Those expectations and explanations rest on assumptions. If your data collection, data management, and data analysis are not compatible with those assumptions, you risk producing meaningless or misleading answers. 13

The DGP (cont.) Think of the world you are interested in as governed by dynamic processes. Those processes produce observable bits of information about themselves – data We can use data science to: – Collect, catalog, and organize those bits of information – Discover patterns in data and fit models to that data – Make predictions outside of our data – Inform explanations of both those patterns and those predictions. Real discovery is NOT about modeling patterns in observable data. It is about understanding the processes that produced that data. 14

Theories and DGPs Theories provide explanations for the processes we care about. They answer the question, Why does something work the way it does. Theories make predictions about what we should see in data. We use data to test the predictions, but we never completely test a theory. 15

Why do we need theory? Can’t we just find “truth” in the data if we have enough of it? Especially if we have all of it? No! – More data does not mean more representative data. – Every method of analysis makes some assumptions, so we are better off if we make them explicit. – Patterns without understanding are a best uninformative and at worst deeply misleading. 16

Robert Mathews Aston, “Storks Deliver Babies (P=0.008).” Teaching Statistics. Volume 22, Number 2, Summer

New Behaviors Require New Theories The Target example illustrated how existing theories about habit formation informed their data mining efforts. However, whole new behaviors exist that are creating a lot of the data that data scientists want to analyze: – Online shopping – Cell phone usage – Crowd sourced recommendation systems – Facebook, Google searching, etc. – Online mobilization of social protests We need new theories for these new behaviors. 18

Metadata 19

What is Metadata? Metadata is data about data. It is frequently ignored or misunderstood. Metadata is required to give data meaning. It includes: – Variable names and labels, value labels, information on who collected the data, when, by what methods, in what locations, for what purpose, etc. Metadata is essential to use data effectively, to reuse data, to share data, and to integrate data. Data without metadata is worthless. 20

The Value of Metadata Data by itself is just a bunch of 0’s and 1’s. Metadata – Provides meaning – Allows for cataloging – Facilitates search and discovery – Enables linking data sets 21

Types of Metadata NICO Defines three types: – Structural: describes how the components of the data are organized (columns, rows, chapters, etc.) – Descriptive: provides titles, authors, keywords, subjects, etc. that facilitate attribution and search/discovery. – Administrative: technical information on how file was created, software used, formats for storage, etc. Includes rights and preservation metadata 22

Metadata Standards There are emerging standards for metadata – The American National Standards Institute – The International Organization for Standardization Dublin Core – 15 classis metadata terms. – Title, Creator, Subject, Description, Publisher, Contributor, Data, Type, Format, Identifier, Source, Language, Relation, Coverage, Rights 23

Privacy and Ethics We will do this at the end 24