Presentation is loading. Please wait.

Presentation is loading. Please wait.

EVALITA 2007 The Named Entity Recognition Task Manuela Speranza, FBK-irst.

Similar presentations


Presentation on theme: "EVALITA 2007 The Named Entity Recognition Task Manuela Speranza, FBK-irst."— Presentation transcript:

1 EVALITA 2007 The Named Entity Recognition Task Manuela Speranza, FBK-irst

2 Outline Named Entity Recognition at EVALITA 2007 –Introduction to the task –Participants Evaluation –Dataset –Metrics Results –Ranking –Discussion Conclusion EVALITA 2007 Workshop Rome, September 10, 2007

3 Introduction to the NER Task Task: Recognize Named Entities in Italian newspaper articles Four types of Named Entities: –Geo-Political Entities (GPE): e.g. Italy –Location Entities (LOC): e.g. Tevere –Organization Entities (ORG): e.g. FIAT –Person Entities (PER): e.g. Napolitano Based on the ACE Entity Recognition and Normalization Task Adaptations from ACE: –limit the task to the recognition of Named Entities –adapt it to Italian EVALITA 2007 Workshop Rome, September 10, 2007

4 Participants In the NER Task we had six participants: –FBK-irst, Trento (FBKirst_Zanoli_NER) –LDC, University of Pennsylvania (LDC_Walker_NER) –University of Alicante (UniAli_Kozareva_NER) –University of Dortmund (UniDort_Jungermann_NER) –University of Duisburg-Essen (UniDuE_Roessler_NER) –Yahoo, Barcelona (Yahoo_Ciaramita_NER) Only one Italian institution, while two from Spain and two from Germany One participant from the USA EVALITA 2007 Workshop Rome, September 10, 2007

5 Evaluation Dataset: I-CAB (i) 525 news stories from the Italian local newspaper LAdige 4 days 5 categories Two sections 7-8 September 2004 7-8 October 2004 News Stories Cultural News Economic News Sports News Local News Number of words = 182.500 Average number of words per file = 348 EVALITA 2007 Workshop Rome, September 10, 2007 training (335 news stories) test (190 news stories)

6 EVALITA 2007 Workshop Rome, September 10, 2007 TrainingTestTotal # News stories335190525 # Sentences7,2274,00211,229 # Words113,63468,930182,564 # Tokens132,58779,889212,476 # GPE1,7401,0732,813 # LOC240122362 # ORG2,5181,1403,658 # PER2,9361,6414,577 Evaluation Dataset: I-CAB (ii)

7 Evaluation of Results Scorer: CONLL Shared Task 2002 Metrics: Precision (Pr.), Recall (Re.), and F-Measure (FB1) Official ranking is based on FB1 EVALITA 2007 Workshop Rome, September 10, 2007

8 Official Ranking RankParticipant Over. FB1 Over. Prec. Over. Rec. FB1 GPELOCORGPER 1 FBKirst_Zanoli_r2 82.1483.41%80.91%85.5473.0464.2792.12 2 FBKirst_Zanoli_r1 81.2882.97%79.65%85.5273.0464.0690.40 3 UniDuE_Roessler_r1 72.2771.62%72.94%78.3953.9249.8984.42 4 UniDuE_Roessler_r2 71.9373.28%70.62%78.7554.7349.0183.64 5 Yahoo_Ciaramita_r1 68.9971.28%66.85%75.3852.8349.0878.89 6 Yahoo_Ciaramita_r2 68.1570.44%66.00%75.0852.3146.8578.36 7 UniDort_Jungermann_r2 67.9070.93%65.12%73.1846.0745.8579.78 8 UniDort_Jungermann_r1 67.7970.93%64.91%73.1846.0745.7479.58 9 UniAli_Kozareva 66.5962.73%70.95%72.6047.2647.8178.66 10 LDC_Walker_r1 63.1083.05%50.88%65.2552.9440.7075.39 11 LDC_Walker_r2 62.7082.12%50.70%65.1350.5636.2676.44 - BASELINE 41.1142.44%39.86%69.6727.6340.3225.48 - BASELINE -u 36.8540.29%33.95%57.6426.3239.4325.55

9 Official Ranking RankParticipant Over. FB1 Over. Pre. Over. Rec. FB1 GPELOCORGPER 1 FBKirst_Zanoli_r2 82.1483.41%80.91%85.5473.0464.2792.12 2 FBKirst_Zanoli_r1 81.2882.97%79.65%85.5273.0464.0690.40 3 UniDuE_Roessler_r1 72.2771.62%72.94%78.3953.9249.8984.42 4 UniDuE_Roessler_r2 71.9373.28%70.62%78.7554.7349.0183.64 5 Yahoo_Ciaramita_r1 68.9971.28%66.85%75.3852.8349.0878.89 6 Yahoo_Ciaramita_r2 68.1570.44%66.00%75.0852.3146.8578.36 7 UniDort_Jungermann_r2 67.9070.93%65.12%73.1846.0745.8579.78 8 UniDort_Jungermann_r1 67.7970.93%64.91%73.1846.0745.7479.58 9 UniAli_Kozareva 66.5962.73%70.95%72.6047.2647.8178.66 10 LDC_Walker_r1 63.1083.05%50.88%65.2552.9440.7075.39 11 LDC_Walker_r2 62.7082.12%50.70%65.1350.5636.2676.44 - BASELINE 41.1142.44%39.86%69.6727.6340.3225.48 - BASELINE -u 36.8540.29%33.95%57.6426.3239.4325.55

10 Discussion RankParticipant Over. FB1 Over. Prec. Over. Rec. FB1 GPELOCORGPER 1 FBKirst_Zanoli_r2 82.1483.41%80.91%85.5473.0464.2792.12 2 FBKirst_Zanoli_r1 81.2882.97%79.65%85.5273.0464.0690.40 3 UniDuE_Roessler_r1 72.2771.62%72.94%78.3953.9249.8984.42 4 UniDuE_Roessler_r2 71.9373.28%70.62%78.7554.7349.0183.64 5 Yahoo_Ciaramita_r1 68.9971.28%66.85%75.3852.8349.0878.89 6 Yahoo_Ciaramita_r2 68.1570.44%66.00%75.0852.3146.8578.36 7 UniDort_Jungermann_r2 67.9070.93%65.12%73.1846.0745.8579.78 8 UniDort_Jungermann_r1 67.7970.93%64.91%73.1846.0745.7479.58 9 UniAli_Kozareva 66.5962.73%70.95%72.6047.2647.8178.66 10 LDC_Walker_r1 63.1083.05%50.88%65.2552.9440.7075.39 11 LDC_Walker_r2 62.7082.12%50.70%65.1350.5636.2676.44 - BASELINE 41.1142.44%39.86%69.6727.6340.3225.48 - BASELINE -u 36.8540.29%33.95%57.6426.3239.4325.55

11 Discussion RankParticipant Over. FB1 Over. Prec. Over. Rec. FB1 GPELOCORGPER 1 FBKirst_Zanoli_r2 82.1483.41%80.91%85.5473.0464.2792.12 2 FBKirst_Zanoli_r1 81.2882.97%79.65%85.5273.0464.0690.40 3 UniDuE_Roessler_r1 72.2771.62%72.94%78.3953.9249.8984.42 4 UniDuE_Roessler_r2 71.9373.28%70.62%78.7554.7349.0183.64 5 Yahoo_Ciaramita_r1 68.9971.28%66.85%75.3852.8349.0878.89 6 Yahoo_Ciaramita_r2 68.1570.44%66.00%75.0852.3146.8578.36 7 UniDort_Jungermann_r2 67.9070.93%65.12%73.1846.0745.8579.78 8 UniDort_Jungermann_r1 67.7970.93%64.91%73.1846.0745.7479.58 9 UniAli_Kozareva 66.5962.73%70.95%72.6047.2647.8178.66 10 LDC_Walker_r1 63.1083.05%50.88%65.2552.9440.7075.39 11 LDC_Walker_r2 62.7082.12%50.70%65.1350.5636.2676.44 - BASELINE 41.1142.44%39.86%69.6727.6340.3225.48 - BASELINE -u 36.8540.29%33.95%57.6426.3239.4325.55

12 Discussion RankParticipant Over. FB1 Over. Prec. Over. Rec. FB1 GPELOCORGPER 1 FBKirst_Zanoli_r2 82.1483.41%80.91%85.5473.0464.2792.12 2 FBKirst_Zanoli_r1 81.2882.97%79.65%85.5273.0464.0690.40 3 UniDuE_Roessler_r1 72.2771.62%72.94%78.3953.9249.8984.42 4 UniDuE_Roessler_r2 71.9373.28%70.62%78.7554.7349.0183.64 5 Yahoo_Ciaramita_r1 68.9971.28%66.85%75.3852.8349.0878.89 6 Yahoo_Ciaramita_r2 68.1570.44%66.00%75.0852.3146.8578.36 7 UniDort_Jungermann_r2 67.9070.93%65.12%73.1846.0745.8579.78 8 UniDort_Jungermann_r1 67.7970.93%64.91%73.1846.0745.7479.58 9 UniAli_Kozareva 66.5962.73%70.95%72.6047.2647.8178.66 10 LDC_Walker_r1 63.1083.05%50.88%65.2552.9440.7075.39 11 LDC_Walker_r2 62.7082.12%50.70%65.1350.5636.2676.44 - BASELINE 41.1142.44%39.86%69.6727.6340.3225.48 - BASELINE -u 36.8540.29%33.95%57.6426.3239.4325.55

13 Discussion RankParticipant Over. FB1 Over. Prec. Over. Rec. FB1 GPELOCORGPER 1 FBKirst_Zanoli_r2 82.1483.41%80.91%85.5473.0464.2792.12 2 FBKirst_Zanoli_r1 81.2882.97%79.65%85.5273.0464.0690.40 3 UniDuE_Roessler_r1 72.2771.62%72.94%78.3953.9249.8984.42 4 UniDuE_Roessler_r2 71.9373.28%70.62%78.7554.7349.0183.64 5 Yahoo_Ciaramita_r1 68.9971.28%66.85%75.3852.8349.0878.89 6 Yahoo_Ciaramita_r2 68.1570.44%66.00%75.0852.3146.8578.36 7 UniDort_Jungermann_r2 67.9070.93%65.12%73.1846.0745.8579.78 8 UniDort_Jungermann_r1 67.7970.93%64.91%73.1846.0745.7479.58 9 UniAli_Kozareva 66.5962.73%70.95%72.6047.2647.8178.66 10 LDC_Walker_r1 63.1083.05%50.88%65.2552.9440.7075.39 11 LDC_Walker_r2 62.7082.12%50.70%65.1350.5636.2676.44 - BASELINE 41.1142.44%39.86%69.6727.6340.3225.48 - BASELINE -u 36.8540.29%33.95%57.6426.3239.4325.55

14 Conclusions Good interest from the community: –14 initial registrations –6 participants (though only one Italian Institution) Relatively high rate of abandonment (8/14, 60%) Good performance –best system at CONLL: 88.8% for English, 72.4% for German –best system at EVALITA: 82.1% EVALITA 2007 Workshop Rome, September 10, 2007

15 Thanks to all who participated EVALITA 2007 Workshop Rome, September 10, 2007

16 References ACE. http://www.nist.gov/speech/tests/ace/index.htmhttp://www.nist.gov/speech/tests/ace/index.htm CONLL. http://www.cnts.ua.ac.be/conll2002/ner/http://www.cnts.ua.ac.be/conll2002/ner/ LAdige. http://www.ladige.it/http://www.ladige.it/ Linguistic Data Consortium (LDC). Automatic Content Extraction English Annotation Guidelines for Entities, version 5.6.1 2005.05.23. http://projects.ldc.upenn.edu/ ace/docs/English-Entities-Guidelines_v5.6.1.pdfhttp://projects.ldc.upenn.edu/ ace/docs/English-Entities-Guidelines_v5.6.1.pdf Magnini, Cappelli, Pianta, Speranza, Bartalesi Lenzi, Sprugnoli, Romano, Girardi, Negri. Annotazione di contenuti concettuali in un corpus italiano: I-CAB. In Proceedings of SILFI 2006, X Congresso Internazionale della Società di Linguistica e Filologia Italiana, Firenze 14-17 giugno 2006. Magnini, Pianta, Speranza, Bartalesi Lenzi, Sprugnoli. Italian Content Annotation Bank (I-CAB): Named Entities, Technical report, ITC-irst, 2007. http://evalita.itc.it/tasks/I-CAB-Report-Named-Entities.pdf http://evalita.itc.it/tasks/I-CAB-Report-Named-Entities.pdf ONTOTEXT. http://ontotext.itc.it/http://ontotext.itc.it/ EVALITA 2007 Workshop Rome, September 10, 2007


Download ppt "EVALITA 2007 The Named Entity Recognition Task Manuela Speranza, FBK-irst."

Similar presentations


Ads by Google