Presentation is loading. Please wait.

Presentation is loading. Please wait.

EVALITA 2007 The Named Entity Recognition Task Manuela Speranza, FBK-irst.

Similar presentations


Presentation on theme: "EVALITA 2007 The Named Entity Recognition Task Manuela Speranza, FBK-irst."— Presentation transcript:

1 EVALITA 2007 The Named Entity Recognition Task Manuela Speranza, FBK-irst

2 Outline Named Entity Recognition at EVALITA 2007 –Introduction to the task –Participants Evaluation –Dataset –Metrics Results –Ranking –Discussion Conclusion EVALITA 2007 Workshop Rome, September 10, 2007

3 Introduction to the NER Task Task: Recognize Named Entities in Italian newspaper articles Four types of Named Entities: –Geo-Political Entities (GPE): e.g. Italy –Location Entities (LOC): e.g. Tevere –Organization Entities (ORG): e.g. FIAT –Person Entities (PER): e.g. Napolitano Based on the ACE Entity Recognition and Normalization Task Adaptations from ACE: –limit the task to the recognition of Named Entities –adapt it to Italian EVALITA 2007 Workshop Rome, September 10, 2007

4 Participants In the NER Task we had six participants: –FBK-irst, Trento (FBKirst_Zanoli_NER) –LDC, University of Pennsylvania (LDC_Walker_NER) –University of Alicante (UniAli_Kozareva_NER) –University of Dortmund (UniDort_Jungermann_NER) –University of Duisburg-Essen (UniDuE_Roessler_NER) –Yahoo, Barcelona (Yahoo_Ciaramita_NER) Only one Italian institution, while two from Spain and two from Germany One participant from the USA EVALITA 2007 Workshop Rome, September 10, 2007

5 Evaluation Dataset: I-CAB (i) 525 news stories from the Italian local newspaper LAdige 4 days 5 categories Two sections 7-8 September October 2004 News Stories Cultural News Economic News Sports News Local News Number of words = Average number of words per file = 348 EVALITA 2007 Workshop Rome, September 10, 2007 training (335 news stories) test (190 news stories)

6 EVALITA 2007 Workshop Rome, September 10, 2007 TrainingTestTotal # News stories # Sentences7,2274,00211,229 # Words113,63468,930182,564 # Tokens132,58779,889212,476 # GPE1,7401,0732,813 # LOC # ORG2,5181,1403,658 # PER2,9361,6414,577 Evaluation Dataset: I-CAB (ii)

7 Evaluation of Results Scorer: CONLL Shared Task 2002 Metrics: Precision (Pr.), Recall (Re.), and F-Measure (FB1) Official ranking is based on FB1 EVALITA 2007 Workshop Rome, September 10, 2007

8 Official Ranking RankParticipant Over. FB1 Over. Prec. Over. Rec. FB1 GPELOCORGPER 1 FBKirst_Zanoli_r %80.91% FBKirst_Zanoli_r %79.65% UniDuE_Roessler_r %72.94% UniDuE_Roessler_r %70.62% Yahoo_Ciaramita_r %66.85% Yahoo_Ciaramita_r %66.00% UniDort_Jungermann_r %65.12% UniDort_Jungermann_r %64.91% UniAli_Kozareva %70.95% LDC_Walker_r %50.88% LDC_Walker_r %50.70% BASELINE %39.86% BASELINE -u %33.95%

9 Official Ranking RankParticipant Over. FB1 Over. Pre. Over. Rec. FB1 GPELOCORGPER 1 FBKirst_Zanoli_r %80.91% FBKirst_Zanoli_r %79.65% UniDuE_Roessler_r %72.94% UniDuE_Roessler_r %70.62% Yahoo_Ciaramita_r %66.85% Yahoo_Ciaramita_r %66.00% UniDort_Jungermann_r %65.12% UniDort_Jungermann_r %64.91% UniAli_Kozareva %70.95% LDC_Walker_r %50.88% LDC_Walker_r %50.70% BASELINE %39.86% BASELINE -u %33.95%

10 Discussion RankParticipant Over. FB1 Over. Prec. Over. Rec. FB1 GPELOCORGPER 1 FBKirst_Zanoli_r %80.91% FBKirst_Zanoli_r %79.65% UniDuE_Roessler_r %72.94% UniDuE_Roessler_r %70.62% Yahoo_Ciaramita_r %66.85% Yahoo_Ciaramita_r %66.00% UniDort_Jungermann_r %65.12% UniDort_Jungermann_r %64.91% UniAli_Kozareva %70.95% LDC_Walker_r %50.88% LDC_Walker_r %50.70% BASELINE %39.86% BASELINE -u %33.95%

11 Discussion RankParticipant Over. FB1 Over. Prec. Over. Rec. FB1 GPELOCORGPER 1 FBKirst_Zanoli_r %80.91% FBKirst_Zanoli_r %79.65% UniDuE_Roessler_r %72.94% UniDuE_Roessler_r %70.62% Yahoo_Ciaramita_r %66.85% Yahoo_Ciaramita_r %66.00% UniDort_Jungermann_r %65.12% UniDort_Jungermann_r %64.91% UniAli_Kozareva %70.95% LDC_Walker_r %50.88% LDC_Walker_r %50.70% BASELINE %39.86% BASELINE -u %33.95%

12 Discussion RankParticipant Over. FB1 Over. Prec. Over. Rec. FB1 GPELOCORGPER 1 FBKirst_Zanoli_r %80.91% FBKirst_Zanoli_r %79.65% UniDuE_Roessler_r %72.94% UniDuE_Roessler_r %70.62% Yahoo_Ciaramita_r %66.85% Yahoo_Ciaramita_r %66.00% UniDort_Jungermann_r %65.12% UniDort_Jungermann_r %64.91% UniAli_Kozareva %70.95% LDC_Walker_r %50.88% LDC_Walker_r %50.70% BASELINE %39.86% BASELINE -u %33.95%

13 Discussion RankParticipant Over. FB1 Over. Prec. Over. Rec. FB1 GPELOCORGPER 1 FBKirst_Zanoli_r %80.91% FBKirst_Zanoli_r %79.65% UniDuE_Roessler_r %72.94% UniDuE_Roessler_r %70.62% Yahoo_Ciaramita_r %66.85% Yahoo_Ciaramita_r %66.00% UniDort_Jungermann_r %65.12% UniDort_Jungermann_r %64.91% UniAli_Kozareva %70.95% LDC_Walker_r %50.88% LDC_Walker_r %50.70% BASELINE %39.86% BASELINE -u %33.95%

14 Conclusions Good interest from the community: –14 initial registrations –6 participants (though only one Italian Institution) Relatively high rate of abandonment (8/14, 60%) Good performance –best system at CONLL: 88.8% for English, 72.4% for German –best system at EVALITA: 82.1% EVALITA 2007 Workshop Rome, September 10, 2007

15 Thanks to all who participated EVALITA 2007 Workshop Rome, September 10, 2007

16 References ACE. CONLL. LAdige. Linguistic Data Consortium (LDC). Automatic Content Extraction English Annotation Guidelines for Entities, version ace/docs/English-Entities-Guidelines_v5.6.1.pdfhttp://projects.ldc.upenn.edu/ ace/docs/English-Entities-Guidelines_v5.6.1.pdf Magnini, Cappelli, Pianta, Speranza, Bartalesi Lenzi, Sprugnoli, Romano, Girardi, Negri. Annotazione di contenuti concettuali in un corpus italiano: I-CAB. In Proceedings of SILFI 2006, X Congresso Internazionale della Società di Linguistica e Filologia Italiana, Firenze giugno Magnini, Pianta, Speranza, Bartalesi Lenzi, Sprugnoli. Italian Content Annotation Bank (I-CAB): Named Entities, Technical report, ITC-irst, ONTOTEXT. EVALITA 2007 Workshop Rome, September 10, 2007


Download ppt "EVALITA 2007 The Named Entity Recognition Task Manuela Speranza, FBK-irst."

Similar presentations


Ads by Google