$EN/ddiri/current.xhtml wish8.4 $EN/Demarc/CROSSMARC_Demarcation_Tool.tcl -source $EN/ddiri -destination $EN/ddiro -gui 0 -language english > /dev/null cat $EN/ddiro/current.xhtml \ | $bin/xmlperl $EN/SCRIPTS/del-npno.rule ENERC Pipeline"> $EN/ddiri/current.xhtml wish8.4 $EN/Demarc/CROSSMARC_Demarcation_Tool.tcl -source $EN/ddiri -destination $EN/ddiro -gui 0 -language english > /dev/null cat $EN/ddiro/current.xhtml \ | $bin/xmlperl $EN/SCRIPTS/del-npno.rule ENERC Pipeline">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

28/02/02-01/03/02 4 th Meeting Athens ENERC v.2. 28/02/02-01/03/02 4 th Meeting Athens Updates Change in early tokenisation: identification of words now.

Similar presentations


Presentation on theme: "28/02/02-01/03/02 4 th Meeting Athens ENERC v.2. 28/02/02-01/03/02 4 th Meeting Athens Updates Change in early tokenisation: identification of words now."— Presentation transcript:

1 28/02/02-01/03/02 4 th Meeting Athens ENERC v.2

2 28/02/02-01/03/02 4 th Meeting Athens Updates Change in early tokenisation: identification of words now a two stage process. Updated lexical resources based on new version of LexiconEn.xml. Current version does not include statistical classifier or POS tagger. Non-GUI version of NERC-based Demarcator added at end of pipeline.

3 28/02/02-01/03/02 4 th Meeting Athens egrep -v '^<\!DOCTYPE' \ | $EN/SCRIPTS/entsout.pl \ | $bin/fsgmatch -q ".*" $EN/GRAM/char/pretok.gr \ | $EN/SCRIPTS/openangle.pl \ | $bin/xmlperl2 $EN/SCRIPTS/findels-s.rule \ | $bin/xmlperl2 $EN/SCRIPTS/nobold.rule \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/tok.gr \ | $bin/xmlperl $EN/SCRIPTS/dels.rule \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/numbers.gr \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/numex-sf.gr \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/timex.gr \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/prodex-ll.gr \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/prodex-sf.gr \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/attribex.gr \ | $bin/xmlperl $EN/SCRIPTS/delete-tags.rule \ | $EN/SCRIPTS/tidyup-dem.pl \ > $EN/ddiri/current.xhtml wish8.4 $EN/Demarc/CROSSMARC_Demarcation_Tool.tcl -source $EN/ddiri -destination $EN/ddiro -gui 0 -language english > /dev/null cat $EN/ddiro/current.xhtml \ | $bin/xmlperl $EN/SCRIPTS/del-npno.rule ENERC Pipeline

4 28/02/02-01/03/02 4 th Meeting Athens Name Matching LexiconEn.lex derived from LexiconEn.xml: each synonym of a concept becomes a lexical entry. 1 st stage of name matching performs lexical look-up to find matches. (Case insensitive and entities such as ® ignored.) 2 nd stage of name matching uses a fuzzy matching program. This uses a list of target strings also derived from synonyms in LexiconEn.xml. Name matching operates on entities and encodes the ontology ID as the value of an attribute. Can be performed after NERC, Demarcation or FE.

5 28/02/02-01/03/02 4 th Meeting Athens Normalisation We use an xmlperl program to match particular facts containing certain NUMEXes. e.g 1.7 GHz Perl action in rule performs normalisation using a list of conversion rates. Normalised version appears as attribute value on NUMEX which can then be inherited by the fact. Normalisation could be performed before FE but fact type is useful in determining the conversion.

6 28/02/02-01/03/02 4 th Meeting Athens Evaluation Results: just NERC PrecisionRecallF-measure MANUF0.360.950.52 MODEL0.610.820.70 SOFT_OS0.740.790.76 PROCESSOR0.850.980.91 SPEED0.800.770.78 CAPACITY0.880.930.90 LENGTH0.940.770.85 RESOLUTION0.931.000.96 MONEY0.460.970.62 PERCENT0.630.710.67 WEIGHT0.970.950.96 DATE0.300.880.45 DURATION0.720.740.73 TIME0.330.800.47

7 28/02/02-01/03/02 4 th Meeting Athens Evaluation Results: NERC+Demarcator PrecisionRecallF-measure MANUF0.330.720.45 MODEL0.540.590.56 SOFT_OS0.760.650.70 PROCESSOR0.810.590.68 SPEED0.800.530.64 CAPACITY0.830.570.68 LENGTH0.920.530.67 RESOLUTION0.940.900.92 MONEY0.320.520.40 PERCENT0.800.570.67 WEIGHT0.980.790.87 DATE0.290.500.37 DURATION0.710.550.62 TIME0.?

8 28/02/02-01/03/02 4 th Meeting Athens

9 28/02/02-01/03/02 4 th Meeting Athens Microsoft Office LexiconEn.lex Microsoft Office XP :: SOFT OV-d0e594 Windows XP :: OS OV-d0e522 W98 OS OV-d0e521 W 98 :: OS OV-d0e521 Win98 OS OV-d0e521 Win 98 :: OS OV-d0e521 Microsoft SOFT OV-d0e594 Office SOFT OV-d0e594 XP SOFT OV-d0e594 98 OS OV-d0e521 Win OS OV-d0e521 W OS OV-d0e521 R


Download ppt "28/02/02-01/03/02 4 th Meeting Athens ENERC v.2. 28/02/02-01/03/02 4 th Meeting Athens Updates Change in early tokenisation: identification of words now."

Similar presentations


Ads by Google