Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jianwei Lu1 Information Extraction from Event Announcements Student: Jianwei Lu (40942937) Supervisor: Robert Dale.

Similar presentations


Presentation on theme: "Jianwei Lu1 Information Extraction from Event Announcements Student: Jianwei Lu (40942937) Supervisor: Robert Dale."— Presentation transcript:

1 Jianwei Lu1 Information Extraction from Event Announcements Student: Jianwei Lu (40942937) Supervisor: Robert Dale

2 Jianwei Lu2 Agenda Project Introduction Email Event Information Extractor Conclusion

3 Jianwei Lu3 Background What is Information Extraction (IE)?  Automated extraction of key information  Populate a database What are the significances?  Manage and search data efficiently  Aim for other target applications FOR MORE INFO... [Cowie J and Wilks Y n,d] http://www.dcs.shef.ac.uk/~yorick/papers/infoext.pdf http://www.dcs.shef.ac.uk/~yorick/papers/infoext.pdf

4 Jianwei Lu4 The Outcomes Title URL

5 Jianwei Lu5 Sample Data Corpus 1 – 30 documents Corpus 2 – 100 documents Corpus 3 – 1,500 documents

6 Jianwei Lu6 Agenda Project Introduction Email Event Information Extractor Conclusion

7 Jianwei Lu7 My System Architecture

8 Jianwei Lu8 Text Zoning

9 Jianwei Lu9 URL Finding Rules Use pattern to capture URLs Approaches for finding an event URL 1. Search Summary zone 2. Search the whole document Results

10 Jianwei Lu10 Dates Finding Rules Use pattern to capture Dates Use clues to find corresponding date 1. submission-date < start-date <= end-date 2. no submission-date in a “Call for Participation” announcement 3. etc. Results

11 Jianwei Lu11 Locations Finding Rules Tokenise lines into words Use gazetteer to capture Locations Results

12 Jianwei Lu12 Title Finding Rules

13 Jianwei Lu13 Title Finding Rules (cont’d) Apply Machine Learning to classify title lines Refine title after classification Results

14 Jianwei Lu14 Current Performance

15 Jianwei Lu15 Agenda Project Introduction Email Event Information Extractor Conclusion

16 Jianwei Lu16 What I have Achieved Modules for Information Extraction  URL  Dates  Locations  Title Evaluation Framework

17 Jianwei Lu17 Limitations and Future Work Extension for refining titles Comparison for titles Comprehensive study on SVM tool and features used for machine learning

18 Jianwei Lu18 Implementation Details Python 2.6 Gazetteer from http://world-gazetteer.com/ Support Vector Machine http://svmlight.joachims.org/ Natural Language Toolkit (NLTK) http://www.nltk.org/Home

19 Jianwei Lu19 Questions?


Download ppt "Jianwei Lu1 Information Extraction from Event Announcements Student: Jianwei Lu (40942937) Supervisor: Robert Dale."

Similar presentations


Ads by Google