Presentation is loading. Please wait.

Presentation is loading. Please wait.

ELISQ Systems Demonstration Sagnik Ray Choudhury Doha -- May 2015.

Similar presentations


Presentation on theme: "ELISQ Systems Demonstration Sagnik Ray Choudhury Doha -- May 2015."— Presentation transcript:

1 ELISQ Systems Demonstration Sagnik Ray Choudhury sagnik@psu.edu Doha -- May 2015

2 SeerQ: SeerSuite for Qatar SeerSuite: A digital library management system developed at Penn State Key features: Crawls web to gather scholarly documents Extracts metadata from PDFs (title, author name, citation) using machine learning Stores extracted metadata in a database and allows metadata and fulltext search Differences from Google Scholar: Stores the metadata and exposes it through OAI-PMH Stores the citation graph which can be used later to measure scholarly impact Collects and stores the PDFs which can be used later for advanced processing such as table/ figure extraction, understanding the semantics SeerQ: The instance of SeerSuite running in Qatar University crawling scholarly content from the Qatari Web

3 SeerQ: Search Results

4 SeerQ: Details from Search Results

5 SeerQ: Components and Statistics System running at http://10.100.121.41:8080/citeseerx (available from within Qatar University, from outside use VPN).http://10.100.121.41:8080/citeseerx Components: Heritrix 3 and OAI based crawler (PSU uses Heritrix 1.2) Solr 3.6 (PSU just moved from Solr 1.2) MySQL and front end (same as PSU) Document collections: Documents crawled from QScience Documents crawled from the Web: seedlist provided by QNL

6 Some Statistics from SeerQ Total documents in the repository (as of May 2015): 3900 Documents from QScience: 2000 Main sources: qscience, rand, doha institute, doha film institute What can we do with the system: Scholarly analysis: How many authors are from Qatar/Doha/Qatar University? Citation analysis: QScience papers only have a inter journal citation rate of 0.15%. Use the stored PDFs to extract valuable information (Research: PSU RA). Expose the metadata through OAI/PMH.

7 SeerQ: Exposing Extracted Metadata through OAI-PMH

8 A searchable database for handwritten documents (both in English and Arabic) Motivation: Retrieve handwritten documents matching the search term Compare the difference in handwriting for Arabic words (recognize the writer) Demonstrate handling of images + text (in both languages) Arabic handwriting project interface: http://10.100.121.42:8000/http://10.100.121.42:8000/ Arabic/English Bilingual Handwriting Database

9 Handwriting Project: Search Results

10 Handwriting Project: Image with Metadata

11 Fusion is a free search eco-system developed by LucidWorks. Includes crawler, Solr for indexing, tools for query log analysis and error reporting Advantages over simple Solr: Enhanced Admin UI Security Data Enrichment Machine Learning Advanced Relevancy Tuning Reporting Admin Signal Processing Recommendations API (Configuration, History, Node, System, Usage) Connector Framework Fusion: A Search Eco System

12 Using Fusion to collect Qatari Digital Content Around 2 million English & Arabic documents related to Qatar have been crawled and are accessible using Fusion. Specific collections: Qatari Newspapers: >1 million documents from Al-Raya, Gulf-Times, Qatar-tribune Sports: QA domain sports sites, 5000 documents Government: government websites in Qatar, 14500 documents Arabic News Articles Templates Summary : 120,000 newspaper articles along with their summary, generated automatically (Research from VT RA) Qatar University Fusion can help in providing a data curation service: users request a collection, curator creates it, exposes the curated content to the user through an interface. archive-it provides some similar functionality, on a broader scope. archive-it

13 Fusion: for Curators

14 Fusion: Creating a New Collection

15 Fusion: How to Combine Multiple Datasources

16 Fusion: How to Combine Multiple Datasources: 2

17 Fusion: Two Step Web Crawling: Step 1

18 Fusion: Two Step Web Crawling: Step 2

19 Search Interface for Fusion: End User Designed by elisq team for demonstrations. http://10.100.121.44:8000

20 Search Result on Newspaper Summary Collection


Download ppt "ELISQ Systems Demonstration Sagnik Ray Choudhury Doha -- May 2015."

Similar presentations


Ads by Google