28/02/02-01/03/02 4 th Meeting Athens ENERC v.2
28/02/02-01/03/02 4 th Meeting Athens Updates Change in early tokenisation: identification of words now a two stage process. Updated lexical resources based on new version of LexiconEn.xml. Current version does not include statistical classifier or POS tagger. Non-GUI version of NERC-based Demarcator added at end of pipeline.
28/02/02-01/03/02 4 th Meeting Athens egrep -v '^<\!DOCTYPE' \ | $EN/SCRIPTS/entsout.pl \ | $bin/fsgmatch -q ".*" $EN/GRAM/char/pretok.gr \ | $EN/SCRIPTS/openangle.pl \ | $bin/xmlperl2 $EN/SCRIPTS/findels-s.rule \ | $bin/xmlperl2 $EN/SCRIPTS/nobold.rule \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/tok.gr \ | $bin/xmlperl $EN/SCRIPTS/dels.rule \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/numbers.gr \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/numex-sf.gr \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/timex.gr \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/prodex-ll.gr \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/prodex-sf.gr \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/attribex.gr \ | $bin/xmlperl $EN/SCRIPTS/delete-tags.rule \ | $EN/SCRIPTS/tidyup-dem.pl \ > $EN/ddiri/current.xhtml wish8.4 $EN/Demarc/CROSSMARC_Demarcation_Tool.tcl -source $EN/ddiri -destination $EN/ddiro -gui 0 -language english > /dev/null cat $EN/ddiro/current.xhtml \ | $bin/xmlperl $EN/SCRIPTS/del-npno.rule ENERC Pipeline
28/02/02-01/03/02 4 th Meeting Athens Name Matching LexiconEn.lex derived from LexiconEn.xml: each synonym of a concept becomes a lexical entry. 1 st stage of name matching performs lexical look-up to find matches. (Case insensitive and entities such as ® ignored.) 2 nd stage of name matching uses a fuzzy matching program. This uses a list of target strings also derived from synonyms in LexiconEn.xml. Name matching operates on entities and encodes the ontology ID as the value of an attribute. Can be performed after NERC, Demarcation or FE.
28/02/02-01/03/02 4 th Meeting Athens Normalisation We use an xmlperl program to match particular facts containing certain NUMEXes. e.g 1.7 GHz Perl action in rule performs normalisation using a list of conversion rates. Normalised version appears as attribute value on NUMEX which can then be inherited by the fact. Normalisation could be performed before FE but fact type is useful in determining the conversion.
28/02/02-01/03/02 4 th Meeting Athens Evaluation Results: just NERC PrecisionRecallF-measure MANUF MODEL SOFT_OS PROCESSOR SPEED CAPACITY LENGTH RESOLUTION MONEY PERCENT WEIGHT DATE DURATION TIME
28/02/02-01/03/02 4 th Meeting Athens Evaluation Results: NERC+Demarcator PrecisionRecallF-measure MANUF MODEL SOFT_OS PROCESSOR SPEED CAPACITY LENGTH RESOLUTION MONEY PERCENT WEIGHT DATE DURATION TIME0.?
28/02/02-01/03/02 4 th Meeting Athens
28/02/02-01/03/02 4 th Meeting Athens Microsoft Office LexiconEn.lex Microsoft Office XP :: SOFT OV-d0e594 Windows XP :: OS OV-d0e522 W98 OS OV-d0e521 W 98 :: OS OV-d0e521 Win98 OS OV-d0e521 Win 98 :: OS OV-d0e521 Microsoft SOFT OV-d0e594 Office SOFT OV-d0e594 XP SOFT OV-d0e OS OV-d0e521 Win OS OV-d0e521 W OS OV-d0e521 R