Presentation is loading. Please wait.

Presentation is loading. Please wait.

SEASR Analytics for Zotero Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for.

Similar presentations


Presentation on theme: "SEASR Analytics for Zotero Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for."— Presentation transcript:

1 SEASR Analytics for Zotero Loretta Auvil lauvil@illinois.edu Automated Learning Group Data-Intensive Technologies and Applications, National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

2 Outline Brief Zotero Introduction Brief SEASR Introduction SEASR Analytics for Zotero Plugin Hands-on Learning Exercises A Little More Advanced Information

3 The Zotero Picture The WEB The WEB Zotero Store

4 What is Zotero? (from Zotero Quick Start Guide) A citation manager. It is designed to store, manage, and cite bibliographic references, such as books and articles. In Zotero, each of these references constitutes an item. An extension for the Firefox web-browser by the Center for History and New Media at George Mason University. Installed by visiting zotero.org and clicking the download button on the page.

5 Zotero Features (from zotero.org) Automatically capture citations Remotely back up and sync your library Store PDFs, images, and web pages Cite from within Word and OpenOffice Take rich-text notes in any language Wide variety of import/export options Free, open source, and extensible Collaborate with group libraries Organize with collections and tags Access your library from anywhere Automatically grab metadata for PDFs Use thousands of bibliographic styles Instantly search your PDFs and notes Advanced search and data mining tools Interface available in over 30 languages

6 What is SEASR? This project will focus on developing, integrating, deploying, and sustaining a set of reusable and expandable software components and a supporting framework. SEASR will provide a broad set of data mining applications for scholars in humanities The key goals: –Support the development of a state-of-the-art software environment for unstructured data management and analysis of digital libraries, repositories and archives –Develop user interfaces, a data flow engine and demonstration flows that provide data management, analysis and visualization capabilities –Support education and training through workshops to promote its usage among scholars The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

7 The SEASR Picture The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

8 SEASR Enables Scholarly Research Discovery –What are the words used in the corpus? –What named entities (people, locations, dates) can be extracted? –What hypothesis or rules can be generated by the “features” of the corpus? –What “features” or language of the corpus best describes the corpus? –What are the “similarities” between elements, documents, or corpuses to each other? –What patterns can be identified?

9 Enables Scholar to Ask… Pattern identification using automated learning –Which patterns are characteristic of the English language? –Which patterns are characteristic of a particular author, work, topic, or time? –Which patterns based on words, phrases, sentences, etc. can be extracted from literary bodies? –Which patterns are identified based on grammar or plot constructs? –When are correlated patterns meaningful? –Can they be categorized based on specific criteria? –Can an author’s intent be identified given an extracted pattern?

10 Locations Components Flows Meandre: Workbench Existing Flow The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Web-based UI Components and flows are retrieved from server Additional locations of components and flows can be added to server Create flow using a graphical drag and drop interface Change property values Execute the flow

11 The Zotero + SEASR Picture The WEB Zotero Store The WEB

12 SEASR Analytics for Zotero An extension for the Firefox web-browser by the SEASR Team Uses your Zotero Collections Performs analysis using SEASR Services

13 SEASR Analytics for Zotero Interface

14 Tag Cloud Viewer Given: Zotero item(s) Creates tag cloud for all items submitted (with a url), stop words filtered, common tokens (punctuation), top 100 words displayed in tag cloud viewer

15 Date Entities to Simile Timeline Extracts date entities (using OpenNLP) from all items submitted (with a url), and plots the dates that it can on the Simile Timeline

16 HITS Summarizer Finds top sentences and tokens from all items submitted (with a url) and displays them

17 Flesch-Kincaid Readability Test Given: Zotero item(s) Results show scores for each item selected –Designed to indicate comprehension difficulty when reading a passage of contemporary academic English –Flesch Reading Ease: higher scores indicate material that is easier to read; lower numbers mark passages that are more difficult to read –Flesch–Kincaid Grade Level: result is a number that corresponds with a grade level

18 Authorship Analysis Given: Zotero Collection (or multiple items selection) with Author/Co-Author Information Determine importance of given authors in this collection? –Each author is a vertex in the graph –Authors are connected with an edge if they are co-authors of an item –List of Authors ranked by the Betweenness Centrality Measure –Betweenness is a centrality measure of a vertex within a graph. Vertices that occur on many shortest paths between other vertices have higher betweenness than those that do not.

19 The Value Added Analytical Results are saved as Zotero items (View Snapshot) –Includes metadata –Item naming strategy identifies the item or collection processed –Creator indicates the Menu Label of the SEASR Analysis Related Tab links to the items processed in the Analysis No need to install the analysis, it runs as web service

20 Learning Exercises Add items to your Zotero Collection Run some of the Zotero-enabled flows on your collection –Tag Cloud Viewer –Date Entities to Simile Timeline –HITS Summarizer –Flesch-Kincaid Readability Test –Authorship Analysis

21 How to Setup Your Machine Install/Open Firefox Install Zotero –https://addons.mozilla.org/en-US/firefox/addon/3504 –http://zotero.org Install the SEASR Zotero plugin –https://addons.mozilla.org/en-US/firefox/addon/10020 The plugin points to the default services provided by SEASR (running on our server)

22 Extensible to Analysis that You Create You can deploy the flows we have on your server or request your university to host this analysis You can modify these flows and redeploy You can create new flows –Perhaps you want to see only nouns or verbs –Perhaps you want to see a list of extracted entities You can share these flows back to the community

23 SEASR Plugin Preferences Configuration files are managed in a list Each configuration file can be enabled or disabled Reload will refresh the plugin with the flows in the configuration files

24 Configuration File (XML or json) Contains 2 attribute-value pairs –name: label to use in the Zotero drop-down display –url: url for where to send the post XML json {"seasr_flows":[ {"name":"Author Centrality Analysis", "url":"http://services.seasr.org:1718/meandre://seasr.org/components/zotero/serv ice-head-post/instance/shp" }, {"name":"Flesch-Kincaid Readability Test", "url":"http://services.seasr.org:1721/meandre://seasr.org/components/zotero/serv ice-head-post/instance/shp" } ]}

25 Zotero Service Flow Components that read Zotero data from the web service Zotero Author Extractor –Extracts the author-coauthor from each item Zotero URL Extractor –Extracts the url from each item

26 Zotero Flows and Fedora Services Store and share your collections via Fedora –Works the same way you run an analysis –Just select, upload, and share

27 Repository Search & Browse Web Service Interactive Web Application Zotero Upload to Repository Zotero to SEASR : Fedora

28 Community Hub Explore existing flows to find others of interest –Keyword Cloud –Connections Find related flows Execute flow Comments

29 feedback | login | search central Categories Recently Added Top 50 Submit About RSS Featured Component [read more] Word Counter by Jane Doe Description Amazing component that given text stream, counts all the different words that appear on the text Rights: NCSA/UofI open source license Featured Component [read more] Word Counter by Jane Doe Description Amazing component that given text stream, counts all the different words that appear on the text Rights: NCSA/UofI open source license Featured Flow [read more] FPGrowth by Joe Does Browse By Joe Doe Rights: NCSA/UofI Description: Webservices given a Zotero entry tries to retrieve the content and measure its By Joe Doe Rights: NCSA/UofI Description: Webservices given a Zotero entry tries to retrieve the content and measure its Type Component Flows Categories Image JSTOR Zotero Name Author Centrality Readability Upload Fedora SEASR Central Sharing and finding flows and components

30 Discussion Questions What kinds of data assets would you be creating in Zotero? What other analysis would you like to use against this data?


Download ppt "SEASR Analytics for Zotero Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for."

Similar presentations


Ads by Google