Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Third Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition Gina-Anne Levow Fifth SIGHAN Workshop July 22, 2006.

Similar presentations


Presentation on theme: "The Third Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition Gina-Anne Levow Fifth SIGHAN Workshop July 22, 2006."— Presentation transcript:

1 The Third Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition Gina-Anne Levow Fifth SIGHAN Workshop July 22, 2006

2 Roadmap Bakeoff Task Motivation Bakeoff Structure: Materials and annotations Tasks and conditions Participants and timeline Results & Discussion: Word Segmentation Named Entity Recognition Observations & Conclusions Thanks

3 Bakeoff Task Motivation Core enabling technologies for Chinese language processing Word segmentation (WS) Crucial tokenization in absence of whitespace Supports POS tagging, parsing, ref. resolution, etc Fundamental challenges: “Word” not well, consistently defined; humans disagree Unknown words impede performance Named Entity Recognition (NER) Essential for reference resolution, IR, etc Common class of new unknown words

4 Data Source Characterization Five corpora, providers Annotation guidelines available, varied Simplified and traditional characters Range of encodings, all available in Unicode (UTF-8) Provided in common XML, converted to train/test form (LDC)

5 Tasks and Tracks Tasks: Word Segmentation: Training and truth: whitespace delimited End-of-word tags replaced with space, no others Named Entity Recognition: Training and truth: Similar to Co-NLL 2-column NAMEX only: LOC, PER, ORG (LDC: +GPE) Tracks: Closed: Only provided materials may be used Open: Any materials may be used, but must document

6 Structure: Participants &Timeline Participants: 29 sites submitted runs for evaluation (36 init) 144 runs submitted: ~2/3 WS; 1/3 NER Diverse groups: 11 PRC, 7 Taiwan, 5 US, 2 Japan, 1each: Singapore, Korea, Hong Kong, Canada Mix of Commercial: MSRA, Yahoo!, Alias-I, FR Telecom, etc- and Academic sites Timeline: March 15: Registration open April 17: Training data released May 15: Test data released May 17: Results due

7 Word Segmentation: Results Contrasts: Left-to-right maximal match Baseline: Uses only training vocabulary Topline: Uses only testing vocabulary SourceRecallPrecF-scoreOOVRoovRiv CITYU0.930.8820.9060.0490.0090.969 CKIP0.9150.870.8920.0420.030.954 MSRA0.9490.90.9240.0340.0220.981 UPUC0.8690.790.8280.0880.0110.951 SourceRecallPrecF-ScoreOOVRoovRiv CITYU0.9820.9850.9840.040.9930.981 CKIP0.980.9870.9830.0420.9970.979 MSRA0.9910.9930.9920.0340.9990.991 UPUC0.9610.9760.9680.0880.9890.958

8 Word Segmentation: CityU SiteRunIDRPFRoovRiv 15D0.9730.972 0.7870.981 15B0.9730.972 0.7870.981 200.9720.971 0.7920.979 320.9690.970 0.7730.978 CityU Closed SiteRunIDRPFRoovRiv 20 0.9780.977 0.840.984 32 0.9790.9760.9770.8130.985 340.9710.9670.9690.7950.978 220.9700.9650.9670.7610.979 CityU Open

9 Word Segmentation: CKIP SiteRunIDRPFRoovRiv 200.9610.9550.9580.7020.972 15A0.9610.9530.9570.6580.974 15B0.9610.9520.570.6560.974 320.9580.9480.9530.6460.972 SiteRunIDRPFRoovRiv 200.9640.9550.9590.7040.975 34 0.9590.9490.9540.6720.972 32 0.9580.9480.9530.6470.972 2A0.9530.9460.9490.6790.965 CKIP Closed CKIP Open

10 Word Segmentation: MSRA SiteRunIDRPFRoovRiv 320.9640.9610.9630.6120.976 260.9610.9530.9570.4990.977 9 0.9590.9550.9570.4940.975 1A0.9550.956 0.6500.966 SiteRunIDRPFRoovRiv 11A0.9800.9780.9790.8390.985 11B0.9770.9760.9770.8400.982 14 0.9750.9760.9750.8110.981 320.9770.9710.9740.6750.988 MSRA Closed MSRA Open

11 Word Segmentation: UPUC SiteRunIDRPFRoovRiv 200.9400.9260.9330.7070.963 32 0.9360.9230.9300.6830.961 1A0.9400.9140.9270.6340.969 26A0.9360.9170.9260.6170.966 SiteRunIDRPFRoovRiv 340.9490.9390.9440.7680.966 20.9420.9280.9350.7110.964 20 0.9400.9270.9330.7410.959 70.9440.9220.9330.6800.970 UPUC Closed UPUC Open

12 Word Segmentation: Overview F-scores: 0.481-0.797 Best score: MSRA Open Task (FR Telecom) Best relative to topline: CityU Open: >99% Most frequent top rank: MSRA Both F-scores and OOV recall higher in Open Overall good results: Most outperform baseline

13 Word Segmentation: Discussion Continuing OOV challenges Highest F-scores on MSRA Also highest top and base lines Lowest OOV rate Lowest F-scores on UPUC Also lowest top and baselines Highest OOV rate (> double all other OOV) Smallest corpus (~1/3 MSRA) Best scores: most consistent corpus Vocabulary, annotation UPUC also varies in genre: train: CTB; test: CTB,NW,BN

14 NER Results Contrast: Baseline Label as Named Entity if unique tag in training SourcePRFPER-FORG-FLOC-FGPE-F CITYU0.6110.4670.5290.5870.5160.503N/A LDC0.4930.3780.4280.3950.290.2590.539 MSRA0.590.4880.5340.6140.4690.531N/A

15 NER Results: CityU SitePRFORG-FLOC-FPER-F 30.9140.8670.890.8050.9210.909 190.920.8540.8860.8050.9250.887 21a0.9270.8470.8850.7970.920.89 21b0.9240.8490.8850.7980.9240.892 SitePRFORG-FLOC-FPER-F 60.8690.7490.8050.680.860.81 CityU Closed CityU Open

16 NER Results: LDC SitePRFORG-FLOC-FPER-F 70.76160.6620.7080.5210.2860.742 6-gpe-loc0.6720.6550.6640.4550.7080.742 60.3060.2980.3020.4550.0370.742 SitePRFORG-FLOC-FPER-F 30.8030.7260.7630.6580.3050.788 80.8140.5940.6880.5850.1700.657 LDC Closed LDC Open

17 NER Results: MSRA SitePRFORG-FLOC-FPER-F 140.8890.8420.8650.8310.8540.901 21a0.9120.8170.8620.820.9050.826 21b0.8840.8290.8560.770.9010.849 30.8810.8230.8510.8150.9060.794 SitePRFORG-FLOC-FPER-F 100.9220.9020.9120.8590.9030.960 140.9080.8920.8990.840.910.926 11b0.8770.8750.8760.7610.8970.922 11a0.8640.840.8520.6940.8740.92 MSRA Closed MSRA Open

18 NER: Overview Overall results: Best F-score: MSRA Open Track: 0.91 Strong overall performance: Only two results below baseline Direct comparison of NER Open vs Closed Difficult: only two sites performed both tracks Only MSRA had large numbers of runs Here Open outperformed Closed: top 3 Open > Closed

19 NER Observations Named Entity Recognition challenges Tagsets, variation, and corpus size Results on MSRA/CityU much better than LDC LDC corpus substantially smaller Also larger tagset: GPE GPE easily confused for ORG or LOC NER results sensitive to corpus size, tagset, genre

20 Conclusions & Future Challenges Strong, diverse participation in WS & NER Many effective competitive results Cross-task, cross-evaluation comparisons Still difficult Scores sensitive to corpus size, annotation consistency, tagset, genre, etc Need corpus, config-independent measure of progress Encourage submissions that support comparisons Extrinsic, task-oriented evaluation of WS/NER Continuing challenges: OOV, annotation consistency, encoding combinations and variation, code-switching

21 Thanks Data Providers: Chinese Knowledge Information Processing Group, Academia Sinica, Taiwan: Keh-Jiann Chen, Henning Chiu City University of Hong Kong: Benjamin K.Tsou, Olivia Oi Yee Kwong Linguistic Data Consortium: Stephanie Strassel Microsoft Research Asia: Mu Li University of Pennsylvania/University of Colorado: Martha Palmer, Nianwen Xue Workshop co-chairs: Hwee Tou Ng and Olivia Oi Yee Kwong All participants!


Download ppt "The Third Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition Gina-Anne Levow Fifth SIGHAN Workshop July 22, 2006."

Similar presentations


Ads by Google