Presentation is loading. Please wait.

Presentation is loading. Please wait.

JavaConLib GSLT: Java Development for HLT Leif Grönqvist – 11. June 2002 10:30.

Similar presentations


Presentation on theme: "JavaConLib GSLT: Java Development for HLT Leif Grönqvist – 11. June 2002 10:30."— Presentation transcript:

1 javaConLib GSLT: Java Development for HLT Leif Grönqvist – leifg@ling.gu.se 11. June 2002 10:30

2 11 juni 2002Java Development for HLT: Leif Grönqvist 2 What have I done?  I have implemented a library useful for various word sense disambiguation based on contexts  From the beginning I have had a test method trying to provoke errors in each part of the implementation  A command line application using the library, implementing Yarowsky 1995  I have tried to make final code at once

3 11 juni 2002Java Development for HLT: Leif Grönqvist 3 What is left to do?  One very simple test implementation  A tutorial based documentation  Adjust things Lars pointed out in the last iteration  Make an ANT build script  The final report

4 11 juni 2002Java Development for HLT: Leif Grönqvist 4 Project Background  Several methods for word disambiguation based on context. For example:  Yarowsky’s unsupervised algorithm from 1995 is based on two general observations:  One sense per collocation: nearby words provide strong and consistent clues  One sense per discourse: the sense for a target word is highly consistent within any document

5 11 juni 2002Java Development for HLT: Leif Grönqvist 5

6 11 juni 2002Java Development for HLT: Leif Grönqvist 6

7 11 juni 2002Java Development for HLT: Leif Grönqvist 7 A much simpler supervised approach  Start with a disambiguated set of occurrences  Count all word types within a +-5 word context for each sense  To disambiguate a new occurrence: compare the context to the possible sense’s distributions

8 11 juni 2002Java Development for HLT: Leif Grönqvist 8 javaConLib  These two algorithms have a lot in common  There are many more similar algorithms  javaConLib includes classes that simplify implementation and tuning a lot  Higher order and intuitive methods – the main class will look more like an algorithm description

9 11 juni 2002Java Development for HLT: Leif Grönqvist 9 Typical parts of a main class  Yarowsky y=new Yarowsky(5);  Corpus trainCorp=new Corpus (“train.txt”);  SenseSet s1=new SenseSet(“äger|ägde, “Abs”, y.posl1);  DecisionList decList=y.train95(s1, s2, “rum”, trainCorp);  ContextList testCont=y.test95(decList, testCorpus, s1, s2, word);  print(testCont.toString());

10 11 juni 2002Java Development for HLT: Leif Grönqvist 10 The Classes  Context: An array of words with specific size and the main word at position 0.  ContextList: A set of Contexts around a certain word type extracted from a corpus  Corpus: A corpus is basically a vector containing words read from a file  Decision: A decision contains a word, a position, and a score deciding how good it is to decide the sense for the main word in a context  DecisionList: A DecisionList like the one used in Yarowsky's algorithm from 1995.  FreqList: A frequency list for strings in a corpus  Positions: Holds a list of positions (integers) relative to the center word when working with words and contexts.  SenseSet: A set of the necessary components for each sense when using the Yarowsky -95 algorithm for word sense disambiguation  Yarowsky: A class with some structures and classes useful when implementing Yarowsky's disambiguation algorithm from 1995, and similar.

11 11 juni 2002Java Development for HLT: Leif Grönqvist 11 We are done And probably out of time


Download ppt "JavaConLib GSLT: Java Development for HLT Leif Grönqvist – 11. June 2002 10:30."

Similar presentations


Ads by Google