Presentation is loading. Please wait.

Presentation is loading. Please wait.

Practical Project of the 2006 Joint International Master’s Degree.

Similar presentations


Presentation on theme: "Practical Project of the 2006 Joint International Master’s Degree."— Presentation transcript:

1 Practical Project of the 2006 Joint International Master’s Degree

2 Agenda  Introduction  Technologies in use  Architecture  Demonstration  Remaining Issues  Work packages for Semester II  Questions & Comments

3 Introduction  Practical project during the course of studies  Timeframe: two terms  Topic: Prototype of a semantic search engine using UIMA  Objectives of the first semester  Study the UIMA-Framework and OpenNLP library  Search for players, teams, matches and dates  Semantic search for goal events  Implement an executable prototype

4 Technologies in Use  UIMA-Framework  OpenNLP  Java / Java Server Pages  Tomcat-Server  Python (Webcrawler)

5 Architecture Overview

6 Architecture Webcrawler  Usage of web crawler for preselection of Texts  Implemented in Python  Crawls ca. 2500 pages in 20 minutes  Presently based on keywords  Transfer of results to Jimgle still manual

7 Architecture NLP-Annotator  Usage of the OpenNLP-Tools & API  Rule based approach  Tagging of paragraphs, sentences and words  Part-of-Speech-Tagging  Implementation in UIMA as separate annotator  Results are used by consecutive annotators  Internal usage only, not displayed in the search index

8 Architecture  Identification of players of the WM2006  Rule based implementation  Usage of the OpenNLP word-annotations  Matching against the player database (XML- File)  Consideration of last names and nicknames Player-Annotator

9 Architecture Date & Time-Annotator  Identification of time and date information  Usage of the OpenNLP word-annotations  Presently custom, rule based implementation  Detecs standard conform time & date information  Detection of relative or colloquial time information not implemented yet

10 Architecture Match-Annotator  Identification of matches  Based on 3 components  Detection of locality  Detection of participating teams  Detection of the match result  Usage of upstream annotators  OpenNLP word-annotations  Player annotations  Date- & time-annotations

11 Architecture Goal-Event Annotator  Description of goals are too complex for a rule- based detection  Therefore: Machine based learning  Usage of the OpenNLP library  Based on statistical information of sentences  Comprehensive training necessary  Implementation as OpenNLP component  Integration into UIMA by wrapper-classes

12 Architecture Persistent Indexing  Functionality  Import of all files in a specific directory  Annotation of all available texts  Compilation of XML-Files with CAS-data of every source text  Adjacent creation of a search index  Provision of index files for the web-server

13 Architecture Graphical User Interface  Linux server with tomcat installation  Simple operation via web-based GUI  Search queries are handled by Java server pages  Processing of requests by Java beans

14 Demonstration Search engine

15 Open Issues Further proceeding…?  Search for attributes e.g. Player AND Germany (presently only via OmniFind)  Automate processing of search engine results  Further training of the components  Usage improvements at front- and backend

16 New scenarios… …for the second semester  Automated analysis of eMails  Search for phone numbers  Search for customer contacts of employee  Find employees with specific skills  Find links & relations between employees  Competitive analysis  Compare own products with ones from competitors  Find out about customer opinions in internet portals  Further ideas??

17 Ideas… …for the second semester  Natural language based search queries  Design templates for customizable annotators  Machine based learning for the Web-Crawler  Mark annotations in the search results  Automated processing of search results  Implement more anotators via OpenNLP  Provide annotators as web-services  Further ideas??

18 JIMGLE JIM Master-Project Questions? Suggestions?

19 JIMGLE JIM Master-Project Thanks for your attention…


Download ppt "Practical Project of the 2006 Joint International Master’s Degree."

Similar presentations


Ads by Google