Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Similar presentations


Presentation on theme: "1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing."— Presentation transcript:

1 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing & Informatics Director, Metadata Research Center

2 2 11-20-14/Greenberg Your data is only as good as your metadata Metadata is a first class object

3 Toothbrush

4 4 11-20-14/Greenberg The topic… (DRYAD)  Good enough is not bad (DRYAD) (CAPITAL)  ROI – return on investment (CAPITAL) (COMMUNITY)…. time permitting  RDA – Research Data Alliance (COMMUNITY)…. time permitting

5

6 6 11-20-14/Greenberg

7

8 8 11-20-14/Greenberg Pre-populated metadata field

9 9 11-20-14/Greenberg

10 10 11-20-14/Greenberg Data downloads  reuse  citation Observations, motivating study of metadata capital 1.Metadata generation costs money a BIG part 2.Metadata reuse is a BIG part of Dryad’s workflow 3.Metadata reuse via OAI 4.Metadata reuse via data sharing, reuse, and repurposing Download 10678 times

11 JournalRe. Wrkfl Blackout AmNtrlNN MBENN BioRiskYN BMJ Open YN …. Y TypeTotal30 days Data packages 6781198 Data files20832957 Journals36172 Authors241663312 Downloads 63534837611 Journals (80+…PLOS): http://datadryad.org/pages/i ntegratedJournals http://datadryad.org/pages/i ntegratedJournals X >10GB = $15,$10+

12 12 11-20-14/Greenberg Technology DSpace DOIs via CDL/DataCite CC0 ( + data) Integration with specialized repositories and databases  Federated searching with TreeBASE and KNB LTER  TreeBASE submission (OAI-PMH)  GenBank (currently in development) Governance “non-profit status, 12 member Board of Directors”  Sets policy, goals science, journals, societies, OCLC, MS  2006 Dryad development – NESCent + Stakeholders: journals, publishers and scientific societies, and researchers.  2009-2012: Interim Board $ PAYMENT-Sept. 1,2014

13 13 11-20-14/Greenberg

14 14 11-20-14/Greenberg Singapore Framework Dryad DCAP, ver. 3.0  bibo (The Bibliographic Ontology)  dcterms (Dublin Core terms)  dryad (Dryad)  DwC (Darwin Core) Vision 1.Simple: automatic metadata gen; heterogeneous datasets *Data-package centric 2.Interoperable: harvesting, cross- system searching 3.Semantic Web compatible : sustainable; supporting machine processing Greenberg, et al, 2009, Metadata Best Practice for a Scientific Data Repository, JLM, DOI:10.1080/1938638090 3405090.

15 15 11-20-14/Greenberg Metadata research & development 1.Curation workflow - cognitive walkthroughs 2.Dryad metadata scheme development - crosswalk analyses (Dube, et al, 2007; Carrier, et al, 2007; White et al., 2008, Greenberg, et al, 2010; Greenberg 2009; 2010) 3.Metadata reuse - content analysis (Greenberg, IDCC Research Summit, 2010) 4.Instantiation - multi-method study (comprehensions assessment) (Greenberg, RDAP, 2010, UNAM 2012) 5.Name-authority control - exploratory study (Haven, 2009, INLS 720) 6.KO/metadata community practices - Concurrent triangulation mixed methods (survey + simulation experiment) (White, 2010, ASIST, 2010 JLM) 7.Metadata functions - quantitative categorical analysis (Willis, Greenberg, and White, 2010, CODATA, 2012, JASIST) (HIVE) 8.Vocabulary needs (HIVE) – mapping study (Greenberg, 2009, CCQ; Scherle, 2010, Code4Lib) 9.Metadata theory – deductive analysis (Greenberg, 2009)

16 Interoperability slope Dublin Core application profile OAI-PMH DOI DataCite DataONE TR: Data Citation Index Elsevier, Science Direct Semantic ontologies Researcher names Agency/ institution

17 17 11-20-14/Greenberg

18

19 Package metadata harvested from email Subj. 177 (gr. 97%, rd. 2%, bl. 1%) Contr. 101 (gr. 99%, bl. 1%)

20 20 11-20-14/Greenberg The leap - capital to metadata capital  An economic concept (Weber, 1905; Smith’s, 1776) Business and operations (net gains or losses) Finances, goods and services, and public needs Intellectual capital, social capital a tangible result, value increase  Metadata as an asset, a product Reuse of good quality metadata increase value of initial investment Poor quality may reduce metadata capital ? Metadata reuse prevalence Cooperative cataloging, CIP, ISBD, MARC, FRBR, LCC, VIAF, OAI-PMH, CrossRef, PubMed, Zotero, BibTex, DataCite. Linked data/Semantic Web, PIDs, etc.

21 Modified Capital- sigma notation Reuse  Cost / value n R + ∑ a i = R + a 1 + a 2 +a 3 + …a n i=1 R = value of the metadata record i= number of usages a = incremental increase in value n = maximum number of reuse

22 22 11-20-14/Greenberg Author/Submitter | Curator 100 metadata instantiations 8 of 12 metadata properties had reuse @ 50% or greater 5 of 8 confirmed reuse at 80% or higher. Basic bib. vs. complex

23 Author Subject Dcterms.spatial   DwC.ScientificName

24 linked data Modified Capital-sigma notation for linked data Cost / value Reuse of linked data concept/URI P = Determined by the number of terms in an ontology, labor hours to generate, integrate, etc,

25 25 HIVE) Helping Interdisciplinary Vocabulary Engineering ( HIVE) C V cost, interoperability, and usability constraints  C V cost, interoperability, and usability constraints  Linked Open Vocabulary initiative, to support inter/transdisciplinary….  SKOS (a little dumb)  AMG + machine learning approach for integrating discipline terminologies

26

27 27 11-20-14/Greenberg ~~~~Amy  Meet Amy Zanne. She is a botanist.  Like every good scientist, she publishes, and she deposits data in Dryad. Amy’s data

28 28 11-20-14/Greenberg

29 29 11-20-14/Greenberg Successive growth rates N ∑ i c = Θ (n c +1) i=1 Cycles… What about successive growth rate tied to a concept? A concept can be in ~ vernacular to canonical fall by the wayside, less popular out (deprecated)

30 30 11-20-14/Greenberg Conclusion…other Valuation Approaches  Market cap of Facebook per user: $40 – $300  Revenues per record per user: $4 – $7 per year Facebook Experian  Market prices of personal data: $0.50 for street address $2.00 for date of birth $8 for social security number $3 for driver’s license number $35 for military record SOURCE: OECD. Exploring the Economics of Personal Data: A Survey of Methodologies for Measuring Monetary Value. OECD Digital Economy Papers. Office for Economic Cooperation and Development Publishing, 2013.

31 Concluding remarks  Interest….traction  Limitations: bad data, cost/value  We should care about cost  Metadata capital can contextualize  Generic formula for further research

32 32 11-20-14/Greenberg Metadata Standards Directory Working Group…. Jane Greenberg, Alex Ball, Keith Jeffery, Rebecca Koskela

33 33 11-20-14/Greenberg “…develop a collaborative, open directory of metadata standards applicable to scientific data” Stakeholders: Researchers, data managers, data scientists, tool developers, repositories, agencies, societies (RDA’s growing community) Goals and workplan - DCC Disciplinary Directory: http://www.dcc.ac.uk/resources/metadata- standards http://www.dcc.ac.uk/resources/metadata- standards

34 34 11-20-14/Greenberg Acknowledgments  Dryad Consortium Board, journal partners, and data authors  NESCent: Laura Wendell (Executive Director), Hilmar Lapp, Heather Piwowar, Peggy Schaeffer, Ryan Scherle, Todd Vision (PI)  **Drexel/UNC : Jose R. Pérez- Agüera, Sarah Carrier, Elena Feinstein, Lina Huang, Robert Losee, Hollie White, Craig Willis, Jane Smith, Shea Swuager, Liz Turner, Christine Mayo, Adrian Ogletree, Erin Clary  U British Columbia: Michael Whitlock  NCSU Digital Libraries: Kristin Antelman  HIVE: Library of Congress, USGS, and The Getty Research Institute; and workshop hosts  Yale/TreeBASE: Youjun Guo, Bill Piel  DataONE: Rebecca Koskela, Bill Michener, Dave Veiglais, and many others  British Library: Lee-Ann Coleman, Adam Farquhar, Brian Hole  Oxford University: David Shotton

35 35 11-20-14/Greenberg http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki http://code.google.com/p/dryad dryad-users@nescent.org Facebook: Dryad Twitter: @datadryad http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/ Metsdata Reserch Center: http://cci.drexel.edu/mrc http://cci.drexel.edu/mrc http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki http://code.google.com/p/dryad dryad-users@nescent.org Facebook: Dryad Twitter: @datadryad http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/ Metsdata Reserch Center: http://cci.drexel.edu/mrc http://cci.drexel.edu/mrc

36 36 11-20-14/Greenberg Sustainability: Plan Comparison Payment PlanMemberNon-memberMinimum purchase 1. Voucher Plan USD$65 per data package USD$70 per data package 25 vouchers 2. Deferred Payment Plan USD$70 per data package USD$75 per data package 1 yr contract 3. Subscription Plan Annual fee based on USD$25 per published research article Annual fee based on USD$30 per published research article 2 yr contract For individuals: Pay on acceptance NA USD$80 per data package, payable by the submitter 1 data package

37 37 11-20-14/Greenberg More on grown and sustainability  Membership: http://datadryad.org/pages/ membershipOverviewhttp://datadryad.org/pages/ membershipOverview  Pricing and sponsorship of deposits: http://datadryad.org/pages/prici nghttp://datadryad.org/pages/prici ng  Journal integration:  http://datadryad.org/pages/journalIntegra tion http://datadryad.org/pages/journalIntegra tion


Download ppt "1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing."

Similar presentations


Ads by Google