Presentation is loading. Please wait.

Presentation is loading. Please wait.

The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.

Similar presentations


Presentation on theme: "The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National."— Presentation transcript:

1 The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National Center for Supercomputing Applications University of Illinois at Urbana-Champaign [lauvil or acs1]@illinois.edu

2 Outline Overview of Workshop SEASR Overview and Motivation Team Presentations –Digital Humanities Observatory, Susan Schreibman –Brown University, Andrew Ashton –JSTOR, Clare Llewellyn, Michael Krot –VUE, Anoop Kumar New SEASR Data Flows and Components Future

3 SEASR Overview

4 SEASR This project will focus on developing, integrating, deploying, and sustaining a set of reusable and expandable software components and a supporting framework, SEASR that will benefit a broad set of data mining applications for scholars in humanities The key goals established for this effort are a set of software centric directives: –Support the development of a state-of-the-art software environment for unstructured data management and analysis of digital libraries, repositories and archives, as well as educational platforms that are expected to contribute to many of the humanities breakthroughs of the 21st century. –Support the continued development, expansion, and maintenance of end-to-end software system – user interfaces, workflow engines, data management, analysis and visualization tools, collaborative tools, and other software integrated into a complete environment SEASR – to bring the full power of data analytics to the scholars. –Support education and training for use of this software environment for analysis through workshops to promote its usage among scholars

5 Workshop Objective The objective of the workshop is to: Provide current status of SEASR Indicate where SEASR is headed Learn what you have done or are planning to do with SEASR

6 The SEASR Picture

7 SEASR Architecture

8 Data Driven Models

9 SEASR: Reach + Relevance + Reuse + Repeatability SEASR emphasizes flexibility, scalability, modularity, provides community hub and access to heterogeneous data and computational systems –Semantic driven environment for SOA interoperability –Encourages sharing and participation for building communities –Modular construction allows flows to be modified and configured to encourage reusability within and across domains –Enables a mashup and integration of tools –Data-intensive flows can be executed on a simple desktop or a large cluster(s) without modification –Computation can be created for distributed execution on servers where the content lives –User accessibility to control trust and compliance with required copyright license of content –Relies on standardized Resource Description Framework (RDF) to define components and flow

10 SEASR Enables Scholarly Research Discovery –What hypothesis or rules can be generated by the “features” of the corpus? –What “features” or language of the corpus best describes the corpus? –What are the “similarities” between elements, documents, or corpuses to each other? –What patterns can be identified?

11 Enables Humanist to Ask… Pattern identification using automated learning –Which patterns are characteristic of the English language? –Which patterns are characteristic of a particular author, work, topic, or time? –Which patterns based on words, phrases, sentences, etc. can be extracted from literary bodies; –Which patterns are identified based on grammar or plot constructs? –When are correlated patterns meaningful? –Can they be categorized based on specific criteria? –Can an author’s intent be identified given an extracted pattern?

12 SEASR @ Work – Tag Cloud Counts tokens Several different filtering options supported

13 SEASR @ Work – Dunning Loglikelihood Feature Comparison of Tokens Specify an analysis document/collection Specify a reference document/collection Perform Statistics comparison using Dunning Loglikelihood Example showing over-represented Analysis Set: The Project Gutenberg EBook of A Tale of Two Cities, by Charles Dickens Reference Set: The Project Gutenberg EBook of Great Expectations, by Charles Dickens Example showing over-represented Analysis Set: The Project Gutenberg EBook of A Tale of Two Cities, by Charles Dickens Reference Set: The Project Gutenberg EBook of Great Expectations, by Charles Dickens

14 SEASR @ Work – Date Entities to Simile Timeline Entity Extraction with OpenNLP Dates viewed on Simile Timeline Locations viewed on Google Map

15 SEASR @ Work – Text Clustering Clustering of Text by token counts Filtering options for stop words, Part of Speech Dendogram Visualization

16 Meandre: Infrastructure SEASR/Meandre Infrastructure: –Dataflow execution paradigm –Semantic-web driven –Web Oriented –Supports publishing services –Modular components –Encapsulation and execution mechanism –Promotes reuse, sharing, and collaboration The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

17 Meandre: Semantic Web Concepts Relies on the usage of the resource description framework (RDF) which uses simple notation to express graph relations written usually as XML to provide a set of conventions and common means to exchange information Provides a common framework to share and reuse data across application, enterprise, and community boundaries Focuses on common formats for integration and combination of data drawn from diverse sources Pays special attention to the language used for recording how the data relates to real world objects Allows navigation to sets of data resources that are semantically connected. The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

18 Meandre: Dataflow Example Dataflow Addition Example –Logical Operation ‘+’ –Requires two inputs –Produces one output When two inputs are available –Logical operation can be preformed –Sum is output When output is produced –Reset internal values –Wait for two new input values to become available The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Value1 Value2 Sum

19 Meandre: Create, Publish, & Share “Components” and “Flows” have RDF descriptors –Easily shared, fosters sharing, & reuse –Allow machines to read and interpret –Independent of the implementations –Combine different implementation & platforms –Components: Java, Python, Lisp, Web Services –Execution: On a Laptop or a High Performance Cluster A “Location” is RDF descriptor of one to many components, one to many flows, and their implementations The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

20 Meandre: Repository & Locations Each location represents a set components/flows Users can –Combine different locations together –Create components –Assemble flows –Share components and flows Repositories Help –Administrate complex environments –Organize components and flows The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

21 Meandre: Programming Paradigm The programming paradigm creates complex tasks by linking together a bunch of specialized components. Meandre's publishing mechanism allows components developed by third parties to be assembled in a new flow. There are two ways to develop flows : –Meandre’s Workbench visual programming tool –Meandre’s ZigZag scripting language The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

22 Locations Components Flows Meandre: Workbench Existing Flow The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Web-based UI Components and flows are retrieved from server Additional locations of components and flows can be added to server Create flow using a graphical drag and drop interface Change property values Execute the flow

23 Meandre: ZigZag Script Language ZigZag is a simple language for describing data- intensive flows – Modeled on Python for simplicity. –ZigZag is declarative language for expressing the directed graphs that describe flows. Command-line tools allow ZigZag files to compile and execute. –A compiler is provided to transform a ZigZag program (.zz) into Meandre archive unit (.mau). –Mau(s) can then be executed by a Meandre engine. The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

24 SEASR @ Work – Zotero Plugin to Firefox Zotero manages the collection Launch SEASR Analytics –Citation Analysis uses the JUNG network importance algorithms to rank the authors in the citation network that is exported as RDF data from Zotero to SEASR –Zotero Export to Fedora through SEASR –Saves results from SEASR Analytics to a Collection

25 Repository Search & Browse Web Service Interactive Web Application Zotero Upload to Repository SEASR @ Work – Fedora

26 Community Hub Explore existing flows to find others of interest –Keyword Cloud –Connections Find related flows Execute flow Comments

27 Detail View of Application Detail View with Related Flows

28 DHSI Course Materials MondayTuesdayWednesdayThursdayFriday SEASR Overview Overview of Course SEASR Overview and Motivation SEASR Architecture Introduction of Meandre SEASR Community Hub Example Applications Meandre Workbench Meandre Data Flows Overview of Workbench Overview of Repositories Constructing Flows SEASR Analytics for Zotero Demonstrations of SEASR Analytic Interaction between Zotero and SEASR Installation and Development Tools Installation Community Collaboration Tools Architecture Details Overview of Development Tools Future SEASR Central Future Meandre Features Future Meandre Workbench Features Google Books Attendee Plan Presentations Course Wrap-up Text Analytics Overview of Text Analytics Text Clustering Frequent Patterns Analysis Entity Extraction Meandre Server Interface SEASR Applications Audio Analytics: NEMA: Blinkie Text Analytics: Monk Emotion Tracking Creating Zotero Flows Configuration Mechanism Specific Web Service Components Zotero-enabled Flows Deployment of Flows Overview of ZigZag Parallelization Example ZigZag flows Zotero and Fedora


Download ppt "The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National."

Similar presentations


Ads by Google