Digging by Debating (DbyD):

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Open repositories: value added services The Socionet example Sergey Parinov, CEMI RAS and euroCRIS.
E-learning and Libraries WSIS Forum, Geneva,11 May 2010 Tullio Basaglia, CERN Scientific Information Service, Geneva.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Test Item Analysis Science Grade 6 Josh Doty John Sevier Middle School Kingsport, TN Grade Level: 6 Content Area: Earth Science Source: 2007 Tennessee.
Teacher Librarians. Contact Information Mary Cameron Iowa Department of Education (515)
The Thomson Reuters CITATION CONNECTION Digital Library st March – 3 rd April 2014, Jasná David Horký Country Manager – Central and Eastern Europe.
Data citation from the perspective of a scholarly publisher Lyubomir Penev TDWG Data Citation Workshop, New Orleans, Oct 2011 ViBRANT.
Augmented Hyperbooks through Conceptual Integration G. Falquet L. Nerima J.-C. Ziswiler Information System Interfaces – University of Geneva cui.unige.ch/isi.
Presented by: Charles Pallandt Title: Managing Director EMEA Academic & Governmental Markets Date: April 28 th, Turkey “Driving Research Excellence.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Dimitris Koureas, Vince Smith & Simon Rycroft Natural History Museum London Linking data, services and communities using Virtual Research Environments.
Concept Mapping. What is Concept Mapping ? Concept mapping is a technique for representing knowledge in graphs. This technique was developed by Professor.
Journal Status* Using the PageRank Algorithm to Rank Journals * J. Bollen, M. Rodriguez, H. Van de Sompel Scientometrics, Volume 69, n3, pp , 2006.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
RMIS - Building a Research Management Information System at the University of Glamorgan Leanne Beevers & Neil Williams.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Impact, measurement and funding Jane Tinkler RENU RESEARCH EXCELLENCE AND FUNDING 28 APRIL 2015.
The Web of Science database bibliometrics and alternative metrics
Making an impact ANU Library What is impact What (the heck) are bibliometrics Publish with impact – an overview Debate on impact How innovative are you.
Scope of geoscience education research (GER) and how it can be used: Community perspectives Laura A. Lukes, Ph.D. Assistant Director, Center for Teaching.
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
Custom driven scientific information extraction from digital libraries using integrated text mining services Betim Çiço, Adrian Besimi, Visar Shehu 14th.
Open access in action: Imperial College’s Digital Repository Deborah Shorley Director of Library Services, Imperial College London LIRGroup Annual.
Visual Interfaces to Digital Libraries Katy Börner School of Library and Information Science Indiana University, Bloomington
Innovation & Supplementary Material Eleonora Presani – Elsevier
INTRODUCTION TO RESEARCHERID. Agenda What is ResearcherID? Why do you need ResearcherID? Search ResearcherID Access ResearcherID and create a profile.
Online Interactive Maps (OIM): A new perspective in browsing and exploring data Coverage of different visualization levels Macro: Two-level views Meso:
Open access & visibility Management Digital Preservation ORA: Purposes.
Where are the Academic Jobs ? Interactive Exploration of Job Advertisements in Geospatial and Topical Space Angela M. Zoss 1, Michael Conover 2 and Katy.
Library Opportunities Mark Notess 9 December 2009 An Overview of Sakai 3.
CyberInfrastructure workshop CSG May Ann Arbor, Michigan.
Electronic Resources for Psychology Karine Barker and Kate Williams 8 th February 2006.
Science Standards Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information Visualization Laboratory, Director School of Library.
Copyright 2006 John Wiley & Sons, Inc. Chapter 1 - Introduction HCI: Designing Effective Organizational Systems Dov Te’eni Jane Carey Ping Zhang.
27-31 May 2008LREC 2008 (Marrakech, Morocco)1 The ACL ARC Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
Database collection evaluation An application of evaluative methods S519.
SciVal Spotlight Training for KU Huiling Ng, SciVal Product Sales Manager (South East Asia) Cassandra Teo, Account Manager (South East Asia) June 2013.
CyberInfrastructure for Network Analysis Importance of, contributions by network analysis Transformation of NA Support needed for NA.
RESEARCH – DOING AND ANALYSING Gavin Coney Thomson Reuters May 2009.
Managing Humanity’s Knowledge and Expertise: The InfoVis Cyberinfrastructure Katy Börner & the InfoVis Lab School of Library and Information Science
SET ACCESS TO OPEN - MENDELEY Jose Luis Andrade – President, The Americas Sujay Darji – Regional Sales Manager October 22, 2012.
Making an impact ANU Library. Topics Research data management Open access Bibliometrics Researcher profiles Where to publish 2.
Funding Opportunities and Partnerships Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information Visualization Laboratory, Director.
FUNCTIONALITY OF ResearcherID James Cook University Celebrating Research 9 OCTOBER 2009 Steven Werkheiser Manager, Customer Education & Training ANZ Thomson.
Mapping Richard M. Shiffrin's Career Time and Space: 1968 Ph.D. in Mathematical Psychology, Stanford University 1968 joins Faculty of the Department of.
WRITING SKILLS II. MAIN PURPOSE OF THE PAPER discussion of primary text by means of text interpretation (approx. 80%)
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
Katy Börner Teaching & Research Teaching & Research Katy Börner
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
The Impact of the Social Sciences Jane
Data Management: Data Analysis Types of Data Analysis at USGS There are several ways to classify Data Analysis activities at USGS, and here are some of.
Measuring Research Impact Using Bibliometrics Constance Wiebrands Manager, Library Services.
CitEc as a source for research assessment and evaluation José Manuel Barrueco Universitat de València (SPAIN) May, й Международной научно-практической.
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
Searching as Strategic Exploration Learning Goals  What are we doing today?  Students will learn how to generate ideas for their topic by engaging in.
Digging by Debating (DbyD): linking massive datasets to specific arguments UK: Prof Andrew Ravenscroft (University of East London) Dr David Bourget (University.
 Continue to develop a common understanding of what STEM education is/could/should be here at Killip.
Content Management.
Chapter 1 - Introduction
Inquiry, Pedagogy, & Technology: Automated Textual Analysis of 30 Refereed Journal Articles David A. Thomas Mathematics Center, University of Great Falls,
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Staying afloat in the sensor data deluge
Presentation transcript:

Digging by Debating (DbyD): From Big Data Text Repositories to Argument Analysis The InPhO Group, Indiana University: Colin Allen with Robert Rose, Jaimie Murdock, Jun Otsuka, Doori Lee, Indraneel Dey The Cyberinfrastructure Network Science Center, Indiana University: Katy Börner with Robert Light The International Centre for Public Pedagogy (ICPuP), University of East London (UEL): Andrew Ravenscroft with Simon McAlister ARG:dundee, University of Dundee: Chris Reed with John Lawrence Centre for Digital Philosophy, University of Western Ontario: David Bourget 1

From Big Data to Argument Analysis Linking massive datasets to specific arguments, where ‘text is data’ Project Goals Uncover and represent the key argumentative structures of digitized documents from a large philosophy/science corpus; Allow users to find and interpret detailed arguments in the broad semantic landscape of books and articles, and to support innovative interdisciplinary research and better-informed critical debates Data Sources Stanford Encyclopedia of Philosophy, HathiTrust Collection, PhilPapers 4 Levels of Analysis: Macro (Sci/Phil Maps) to Micro (detailed arguments) Visualizing points of contact between philosophy and the sciences Topic modeling to identify the volumes/pages ‘rich’ in a chosen topic; Identify and map key arguments; apply a novel analysis framework for propositions and arguments; Sentence modeling to get back to HathiTrust materials S2* project overview slide [any] -- bulleted outline of project goals as per funding -- list of data sources -- unique aspect of this project: 4 levels of analysis 2

Mapping Philosophy in the Sciences (analysis level 1) UCSD Map of Science: generated using more than 12M papers and their references from Elsevier’s Scopus and Thomson Reuters’ Web of Science (25,000 journals), see http://sci.cns.iu.edu/ucsdmap Shows 554 subdiscipline nodes aggregated into 13 color-coded disciplines. Overlaid are citations made by the Stanford Encyclopedia of Philosophy to visualize the impact of the sciences on philosophy. Node sizes scale from 0 (no circle) to 43. Highest numbers are in Humanities, Earth Sciences, and Math & Physics. In order to visualize the impact of the sciences on philosophy, we overlaid citations made by the Stanford Encyclopedia of Philosophy on the UCSD map of science. In order to normalize against the high number of citations (particularly in the humanities), node size relates to the number of SECTIONS of the Stanford Encyclopedia of Philosophy that reference each subdiscipline. Node sizes scale from 0 (no circle) to 43. Highest numbers are in Humanities, earth sciences, infectious diseases and Math & Physics. Lowest numbers are in Chemical, Mechanical and Civil Engineering and Electrical Engineering and Computer Science. Future work will try to map the influence of philosophy on science, completing the circle and the visualization of information flow to and from philosophy. 3

Topic Modeling in HT Books (analysis level 2) LDA Topic Modeling: Bayesian method generates set of "topics" – probability distributions over terms in the corpus Every topic contains every term – different probabilities in the different topics The number of topics is a user-selected parameter Finds the set of topics best able to reproduce term distributions in the documents Documents may be whole volumes, chapters, articles, single pages, even individual sentences – modelers' choice Start with 1,315 volumes by keyword search -- model at volume/book level Select 86 by topic match -- re-model at page level Select 3 by topic match -- remodel at sentence level 4

Topic Modeling in HT Books (analysis level 2) LDA Topic Modeling: Bayesian method generates set of "topics" – probability distributions over terms in the corpus Every topic contains every term – different probabilities in the different topics The number of topics is a user-selected parameter Finds the set of topics best able to reproduce term distributions in the documents Documents may be whole volumes, chapters, articles, single pages, even individual sentences – modelers' choice Start with 1,315 volumes by keyword search -- model at volume/book level Select 86 by topic match -- re-model at page level Select 3 by topic match -- remodel at sentence level 5

Argument Selection, Mapping and Analysis (analysis level 3) 1. Identifying arguments from rated pages (currently human/ manual, but developing algs for automation) 2. Mapping of key arguments with OVA mapping tool: - Provides a formal framework, or ‘lens’, for investigation and comparative analysis, e.g. role, structure, status - More indirectly: meaning, importance and context S5* analysis level 3, Andrew, will explain this slide, clarifying what detail means -- argument extraction & analysis 6

Back to the Sentences (analysis level 4) S6* analysis level 4 [Colin, Andrew] -- sentence modeling to get back to HT materials 7

Back to the Sentences (analysis level 4) S6* analysis level 4 [Colin, Andrew] -- sentence modeling to get back to HT materials 8

DbyD: Conclusions so far... Future work From loose integration to usable tools, e.g. linked to Philpapers To understand and incrementally construct argument maps (mark-up interface) AND automated extraction and mapping Funding from future DiD or similar Big Data initiatives (‘open data’?) diggingbydebating.org Project Achievements - proof-of-concept and loose integration of key components: 1. Method for visualizing points of contact between philosophy and sciences 2. Method for Text selection from Big Data using multi-scale modeling techniques 3. Identified, represented and mapped key arguments about topics (OVA) and devised a novel framework for investigation and comparative analysis 4. Used sentence-level Topic Modeling to ‘go back’ to the texts to find similar propositions to those mapped (investigating context) Think this is self explanatory, but worth emphasising ‘slipperiness’ of argument and argumentation in texts, and implications for automated extraction and mapping -- checklist of what we've done, what needs further funding -- moving from a virtual pipeline to interactive pipeline 9