The Third Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition Gina-Anne Levow Fifth SIGHAN Workshop July 22, 2006.

The Third Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition Gina-Anne Levow Fifth SIGHAN Workshop July 22, 2006

Roadmap Bakeoff Task Motivation Bakeoff Structure: Materials and annotations Tasks and conditions Participants and timeline Results & Discussion: Word Segmentation Named Entity Recognition Observations & Conclusions Thanks

Bakeoff Task Motivation Core enabling technologies for Chinese language processing Word segmentation (WS) Crucial tokenization in absence of whitespace Supports POS tagging, parsing, ref. resolution, etc Fundamental challenges: “Word” not well, consistently defined; humans disagree Unknown words impede performance Named Entity Recognition (NER) Essential for reference resolution, IR, etc Common class of new unknown words

Data Source Characterization Five corpora, providers Annotation guidelines available, varied Simplified and traditional characters Range of encodings, all available in Unicode (UTF-8) Provided in common XML, converted to train/test form (LDC)

Tasks and Tracks Tasks: Word Segmentation: Training and truth: whitespace delimited End-of-word tags replaced with space, no others Named Entity Recognition: Training and truth: Similar to Co-NLL 2-column NAMEX only: LOC, PER, ORG (LDC: +GPE) Tracks: Closed: Only provided materials may be used Open: Any materials may be used, but must document

Structure: Participants &Timeline Participants: 29 sites submitted runs for evaluation (36 init) 144 runs submitted: ~2/3 WS; 1/3 NER Diverse groups: 11 PRC, 7 Taiwan, 5 US, 2 Japan, 1each: Singapore, Korea, Hong Kong, Canada Mix of Commercial: MSRA, Yahoo!, Alias-I, FR Telecom, etc- and Academic sites Timeline: March 15: Registration open April 17: Training data released May 15: Test data released May 17: Results due

Word Segmentation: Results Contrasts: Left-to-right maximal match Baseline: Uses only training vocabulary Topline: Uses only testing vocabulary SourceRecallPrecF-scoreOOVRoovRiv CITYU0.930.8820.9060.0490.0090.969 CKIP0.9150.870.8920.0420.030.954 MSRA0.9490.90.9240.0340.0220.981 UPUC0.8690.790.8280.0880.0110.951 SourceRecallPrecF-ScoreOOVRoovRiv CITYU0.9820.9850.9840.040.9930.981 CKIP0.980.9870.9830.0420.9970.979 MSRA0.9910.9930.9920.0340.9990.991 UPUC0.9610.9760.9680.0880.9890.958

Word Segmentation: CityU SiteRunIDRPFRoovRiv 15D0.9730.972 0.7870.981 15B0.9730.972 0.7870.981 200.9720.971 0.7920.979 320.9690.970 0.7730.978 CityU Closed SiteRunIDRPFRoovRiv 20 0.9780.977 0.840.984 32 0.9790.9760.9770.8130.985 340.9710.9670.9690.7950.978 220.9700.9650.9670.7610.979 CityU Open

Word Segmentation: CKIP SiteRunIDRPFRoovRiv 200.9610.9550.9580.7020.972 15A0.9610.9530.9570.6580.974 15B0.9610.9520.570.6560.974 320.9580.9480.9530.6460.972 SiteRunIDRPFRoovRiv 200.9640.9550.9590.7040.975 34 0.9590.9490.9540.6720.972 32 0.9580.9480.9530.6470.972 2A0.9530.9460.9490.6790.965 CKIP Closed CKIP Open

Word Segmentation: MSRA SiteRunIDRPFRoovRiv 320.9640.9610.9630.6120.976 260.9610.9530.9570.4990.977 9 0.9590.9550.9570.4940.975 1A0.9550.956 0.6500.966 SiteRunIDRPFRoovRiv 11A0.9800.9780.9790.8390.985 11B0.9770.9760.9770.8400.982 14 0.9750.9760.9750.8110.981 320.9770.9710.9740.6750.988 MSRA Closed MSRA Open

Word Segmentation: UPUC SiteRunIDRPFRoovRiv 200.9400.9260.9330.7070.963 32 0.9360.9230.9300.6830.961 1A0.9400.9140.9270.6340.969 26A0.9360.9170.9260.6170.966 SiteRunIDRPFRoovRiv 340.9490.9390.9440.7680.966 20.9420.9280.9350.7110.964 20 0.9400.9270.9330.7410.959 70.9440.9220.9330.6800.970 UPUC Closed UPUC Open

Word Segmentation: Overview F-scores: 0.481-0.797 Best score: MSRA Open Task (FR Telecom) Best relative to topline: CityU Open: >99% Most frequent top rank: MSRA Both F-scores and OOV recall higher in Open Overall good results: Most outperform baseline

Word Segmentation: Discussion Continuing OOV challenges Highest F-scores on MSRA Also highest top and base lines Lowest OOV rate Lowest F-scores on UPUC Also lowest top and baselines Highest OOV rate (> double all other OOV) Smallest corpus (~1/3 MSRA) Best scores: most consistent corpus Vocabulary, annotation UPUC also varies in genre: train: CTB; test: CTB,NW,BN

NER Results Contrast: Baseline Label as Named Entity if unique tag in training SourcePRFPER-FORG-FLOC-FGPE-F CITYU0.6110.4670.5290.5870.5160.503N/A LDC0.4930.3780.4280.3950.290.2590.539 MSRA0.590.4880.5340.6140.4690.531N/A

NER Results: CityU SitePRFORG-FLOC-FPER-F 30.9140.8670.890.8050.9210.909 190.920.8540.8860.8050.9250.887 21a0.9270.8470.8850.7970.920.89 21b0.9240.8490.8850.7980.9240.892 SitePRFORG-FLOC-FPER-F 60.8690.7490.8050.680.860.81 CityU Closed CityU Open

NER Results: LDC SitePRFORG-FLOC-FPER-F 70.76160.6620.7080.5210.2860.742 6-gpe-loc0.6720.6550.6640.4550.7080.742 60.3060.2980.3020.4550.0370.742 SitePRFORG-FLOC-FPER-F 30.8030.7260.7630.6580.3050.788 80.8140.5940.6880.5850.1700.657 LDC Closed LDC Open

NER Results: MSRA SitePRFORG-FLOC-FPER-F 140.8890.8420.8650.8310.8540.901 21a0.9120.8170.8620.820.9050.826 21b0.8840.8290.8560.770.9010.849 30.8810.8230.8510.8150.9060.794 SitePRFORG-FLOC-FPER-F 100.9220.9020.9120.8590.9030.960 140.9080.8920.8990.840.910.926 11b0.8770.8750.8760.7610.8970.922 11a0.8640.840.8520.6940.8740.92 MSRA Closed MSRA Open

NER: Overview Overall results: Best F-score: MSRA Open Track: 0.91 Strong overall performance: Only two results below baseline Direct comparison of NER Open vs Closed Difficult: only two sites performed both tracks Only MSRA had large numbers of runs Here Open outperformed Closed: top 3 Open > Closed

NER Observations Named Entity Recognition challenges Tagsets, variation, and corpus size Results on MSRA/CityU much better than LDC LDC corpus substantially smaller Also larger tagset: GPE GPE easily confused for ORG or LOC NER results sensitive to corpus size, tagset, genre

Conclusions & Future Challenges Strong, diverse participation in WS & NER Many effective competitive results Cross-task, cross-evaluation comparisons Still difficult Scores sensitive to corpus size, annotation consistency, tagset, genre, etc Need corpus, config-independent measure of progress Encourage submissions that support comparisons Extrinsic, task-oriented evaluation of WS/NER Continuing challenges: OOV, annotation consistency, encoding combinations and variation, code-switching

Thanks Data Providers: Chinese Knowledge Information Processing Group, Academia Sinica, Taiwan: Keh-Jiann Chen, Henning Chiu City University of Hong Kong: Benjamin K.Tsou, Olivia Oi Yee Kwong Linguistic Data Consortium: Stephanie Strassel Microsoft Research Asia: Mu Li University of Pennsylvania/University of Colorado: Martha Palmer, Nianwen Xue Workshop co-chairs: Hwee Tou Ng and Olivia Oi Yee Kwong All participants!

The Third Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition Gina-Anne Levow Fifth SIGHAN Workshop July 22, 2006.

Similar presentations

Presentation on theme: "The Third Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition Gina-Anne Levow Fifth SIGHAN Workshop July 22, 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Third Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition Gina-Anne Levow Fifth SIGHAN Workshop July 22, 2006.

Similar presentations

Presentation on theme: "The Third Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition Gina-Anne Levow Fifth SIGHAN Workshop July 22, 2006."— Presentation transcript:

Similar presentations

About project

Feedback