Presentation is loading. Please wait.

Presentation is loading. Please wait.

DLSI Lexical Analysis Prof Brook Wu and Ph.D. student Xin Chen.

Similar presentations


Presentation on theme: "DLSI Lexical Analysis Prof Brook Wu and Ph.D. student Xin Chen."— Presentation transcript:

1 DLSI Lexical Analysis Prof Brook Wu and Ph.D. student Xin Chen

2 Lexical Analysis Focus on processing “text” Difficulties: –word sense ambiguities, e.g.: regular “mouse” v.s. computer “mouse” –irregularities, e.g.: datum, data –Part-of-speech tag ambiguities, e.g.: an “offer” (noun) v.s. “Prof Bieber offers …” (verb)

3 Lexical Analysis in DLSI project Purpose: generate link anchors for important concepts in returned documents. Work involved: –Find glossaries/thesauri on the web or contact DLSI partners for information. –Organize them into a master file. –Find glossary/thesaurus term in text using lexical analysis techniques, including tokenization, part-of speech tagging, parsing, and matching.

4

5 Qualifications and Supervision You should participate because text processing and lexical analysis is getting popular, for there is very rich information available in text. Industry will want people who know how to effectively process documents. Qualifications: –Proficiency in JAVA, or C++ Supervision: –A team of up to 3 students will be supervised by Prof Wu, but will mainly be led by Xin Chen, a Ph.D. candidate in IS.


Download ppt "DLSI Lexical Analysis Prof Brook Wu and Ph.D. student Xin Chen."

Similar presentations


Ads by Google