Download presentation
Presentation is loading. Please wait.
1
Jianwei Lu1 Information Extraction from Event Announcements Student: Jianwei Lu (40942937) Supervisor: Robert Dale
2
Jianwei Lu2 Agenda Project Introduction Email Event Information Extractor Conclusion
3
Jianwei Lu3 Background What is Information Extraction (IE)? Automated extraction of key information Populate a database What are the significances? Manage and search data efficiently Aim for other target applications FOR MORE INFO... [Cowie J and Wilks Y n,d] http://www.dcs.shef.ac.uk/~yorick/papers/infoext.pdf http://www.dcs.shef.ac.uk/~yorick/papers/infoext.pdf
4
Jianwei Lu4 The Outcomes Title URL
5
Jianwei Lu5 Sample Data Corpus 1 – 30 documents Corpus 2 – 100 documents Corpus 3 – 1,500 documents
6
Jianwei Lu6 Agenda Project Introduction Email Event Information Extractor Conclusion
7
Jianwei Lu7 My System Architecture
8
Jianwei Lu8 Text Zoning
9
Jianwei Lu9 URL Finding Rules Use pattern to capture URLs Approaches for finding an event URL 1. Search Summary zone 2. Search the whole document Results
10
Jianwei Lu10 Dates Finding Rules Use pattern to capture Dates Use clues to find corresponding date 1. submission-date < start-date <= end-date 2. no submission-date in a “Call for Participation” announcement 3. etc. Results
11
Jianwei Lu11 Locations Finding Rules Tokenise lines into words Use gazetteer to capture Locations Results
12
Jianwei Lu12 Title Finding Rules
13
Jianwei Lu13 Title Finding Rules (cont’d) Apply Machine Learning to classify title lines Refine title after classification Results
14
Jianwei Lu14 Current Performance
15
Jianwei Lu15 Agenda Project Introduction Email Event Information Extractor Conclusion
16
Jianwei Lu16 What I have Achieved Modules for Information Extraction URL Dates Locations Title Evaluation Framework
17
Jianwei Lu17 Limitations and Future Work Extension for refining titles Comparison for titles Comprehensive study on SVM tool and features used for machine learning
18
Jianwei Lu18 Implementation Details Python 2.6 Gazetteer from http://world-gazetteer.com/ Support Vector Machine http://svmlight.joachims.org/ Natural Language Toolkit (NLTK) http://www.nltk.org/Home
19
Jianwei Lu19 Questions?
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.