Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Computational Linguistics Summarization Pilot TAC 2014 Kokil Jaidka †, Muthu Kumar Chandrasekaran* ‡, Min-Yen Kan* ‡, Ankur Khanna ‡ Nanyang.

Similar presentations


Presentation on theme: "The Computational Linguistics Summarization Pilot TAC 2014 Kokil Jaidka †, Muthu Kumar Chandrasekaran* ‡, Min-Yen Kan* ‡, Ankur Khanna ‡ Nanyang."— Presentation transcript:

1 The Computational Linguistics Summarization Pilot task @ TAC 2014 Kokil Jaidka †, Muthu Kumar Chandrasekaran* ‡, Min-Yen Kan* ‡, Ankur Khanna ‡ Nanyang Technological University † Dept. of Computer Science, National University of Singapore * Web, IR / NLP Group ‡, National University of Singapore

2 Scientific Document Summarization I have an abstract. I am done! Photo Credits Dennis JarvisPhoto Credits Dennis Jarvis @flickr@flickr 2 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

3 Outline Citation based extractive summaries Facetted summaries Automatic literature review CL development corpus Annotation TAC 2015: CL-Summ track Acknowledgements 3 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

4 Scientific Document Summarization: G rowth in # publications. 4 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

5 Scientific Document Summarization Abstracts –Authors’ own summary. Citation summary –Scientific community creates summaries of research papers while they cite a paper but… Facetted summaries – Capture all aspects of a paper. 5 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

6 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 6 Citation summary & facets Image credits Ken AmmiImage credits Ken Ammi @flickr

7 Structured Abstract: Common in Medicine, Biomed, Bioinformatics domains Facetted summaries 7 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

8 Facets & Argumentative zones 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 8

9 Scientific Document Summarization Citation based extractive summaries Scope of Citation Qazvinian, V., & Radev, D. R. “Identifying non-explicit citing sentences for citation-based summarization” (ACL, 2010) Abu-Jbara, Amjad, and Dragomir Radev. "Reference scope identification in citing sentences.” (ACL, 2012) Coherence Abu-Jbara, Amjad, and Dragomir Radev. "Coherent citation-based summarization of scientific papers.” (ACL 2011) 9 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

10 Scientific Document Summarization & Automatic Literature Review 10 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

11 11 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Scientific Document Summarization & Automatic Literature Review

12 Free to access at: http://acl-arc.comp.nus.edu.sg/ 12 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

13 SciSumm Corpus 10 reference papers or topics randomly sampled from the ACL ARC corpus. Upto 10 citing papers per reference paper including those outside ACL ARC. 13 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

14 Annotation pipeline 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 14 AUTOMA TIC SUM SCI DOC SUMM ……. …… ……. …… ……. …… ……. …… Annotation! Post Processing to Biomedsumm format: 1.Scripts from U. Colorado (Prabha) 2.Sentence segmented version from U.Mich (Rahul) OCR & section parse ParsCit ‘s: SectLabel module

15 3 annotators in all. Released data has one gold standard annotation per topic or reference paper. Discourse facet has a minor change from Biomedsumm’s categories. Annotating the SciSumm corpus 15 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

16 Task 1A: For each citance, identify the spans of text (cited text spans) in the RP that most accurately reflect the citance Tasks 16 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Reference Paper (RP) Citing papers. Citing text is called citance

17 Tasks Task 1B: For each cited text span, identify what facet of the paper it belongs to, from a predefined set of facets. 17 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Reference Paper (RP) Mark the cited text in RP and provide its facet. Citing papers. Citing text is called citance

18 Evaluation Small corpus: 10 fold cross validated evaluation over the 10 documents. Task 1a scored by overlap with citances. Task 1b scored by overlap with reference text spans. TAC Biomedsumm Track - The Computational Linguistics Pilot Task 18

19 Task & evaluation: highlights First corpus in the CL that incorporates prior research findings on citation based summaries. 10 teams from 5 different countries participated in the evaluation. 19 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015

20 Limitations No gold standard summaries yet OCR errors: We hope to have corrected them manually. But mainly, we need more annotated data! 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 20

21 TAC 2015: CL-Summ shared task Plans to rollout a full-fledged official shared task for the CL corpus. 20 training topics 10 test topics 3 annotations per summary. 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 21

22 TAC 2015: We need you help! We seek support from –summarization community in general and –CL community in particular to provide manpower for annotating the corpus Great to have all participating teams contribute! 06 October 2015 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 22

23 Acknlowledgements Hoa Dang, NIST Lucy Vanderwende, MSR All Biomedsumm track participants. This research is partially supported by CSIDM 23 TAC Biomedsumm Track - The Computational Linguistics Pilot Task 06 October 2015 Questions? Thank you!


Download ppt "The Computational Linguistics Summarization Pilot TAC 2014 Kokil Jaidka †, Muthu Kumar Chandrasekaran* ‡, Min-Yen Kan* ‡, Ankur Khanna ‡ Nanyang."

Similar presentations


Ads by Google