Presentation is loading. Please wait.

Presentation is loading. Please wait.

Institute of Computer Science & Technology

Similar presentations


Presentation on theme: "Institute of Computer Science & Technology"— Presentation transcript:

1 Institute of Computer Science & Technology
NLP&CC 2012 – Beijing, China Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology

2 Outline Introduction Related Work News Ontology Event Model Evaluation
Event definitions Existing event models News Ontology Event Model The Design of NOEM Main Concepts and Properties in NOEM Evaluation Conclusion This is the outline of my presentation. NLP&CC, Beijing, China -2-

3 Introduction “News Information Overload”
Numerous online news service providers Explosive increase of online news users Persons (Ten thousand ) Numbers of online news users and time they spend in browsing news According to investigations of CNNIC(Chinese Network Information Center), there has been an explosive increase of online news users. In 2011, uses rise to almost 3.5 billion from 1.5 billion in 2008, and the time users spend on browsing news remain a high level. The explosive increase of online news users, along with a fast enhancement of online news service providers, lead to a serious “news information overload”. That is, although facing a great amount of online news, it is very difficult for people to find out what they really want to read. NLP&CC, Beijing, China -3-

4 Web of entity and relation
Introduction Classification & summarization are widely used in online news domain document-oriented techniques based on traditional “BOW” models can not provide sufficient event semantic information Users need intelligent event level semantic news services to push events but not documents to users employing entities and relations to provide semantic navigation, e.g., renlifang of Microsoft, soso waltz of Tencent In online news domain, classification and abstraction have been widely used to release the news information overload problem. While these technologies are document-oriented techniques which supply to people “news articles” but not “news events”. They based on “Bag of Word” model which can not provide sufficient event semantic information. Nowadays, users need more intelligent news services, which can provide entities and relations of navigation, and push events but not documents to users. This goal is in accord with the trend of the evolution of Web from the “Web of document” to “Web of Data” and “Web of entity”. This revolutionary can be achieved based on information extraction technology. Web of Document Web of Data Web of entity and relation NLP&CC, Beijing, China -4-

5 Introduction How to provide multi-dimensional semantic navigation?
5W1H:Who, When, Where, What, Why, How 基于关键词的分析,容易造成“语义”错误 事件发生地是鹭岛而非香港 腾讯搜搜人物 当我们描述一个新闻事件的时候,需要说明事件发生的时间、地点、人物、起因、经过、结果。这些称为新闻事件5W1H要素,即何时(WHEN)、何地(WHERE)、何人(WHO)、何事(WHAT)、何故(WHY), 如何(HOW),用英文字头简称“5W1H”,被人们称为新闻六要素。新闻六要素对描述单个事件和事件之间的关系非常重要。 实际上,事件与事件之间又可以针对每个要素建立起关联,构成一个多维联通的事件网络。可以针对任何一维提供关联导航,支持用户的发散浏览。 目前,已经出现了基于5W1H信息的多维关联 By automatically extracting structural 5W1H semantic information of events and populating these information to NOEM, an event knowledge base can be built to support event and semantic level applications in news domain. 上海演唱会是王菲的事件,与刘德华无关 NLP&CC, Beijing, China -5-

6 Introduction Our research aim is We implemented This paper discusses
semantic understanding of Chinese news by extracting entities, relations involved in a key event of a news story building a news events knowledge base as well as a semantic retrieval engine to support event level semantic applications We implemented a novel framework to address the whole list of 5W1H key event identification event semantic elements extraction Ontology-based event knowledge base construction This paper discusses Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story This paper discusses semantic understanding of Chinese news by extracting entities, relations involved in a key event of a news story. Specially, we try to identify 5W1H, a concept in journalism, i.e., what, who, when, where, why and how of an event, to represent semantic elements of news events. We propose a novel framework to address the whole list of 5W1H. This framework comprise of three parts, (1) key event identification (2) event semantic elements extraction and (3) ontology-based event knowledge base construction. NLP&CC, Beijing, China -6-

7 Methodology 5WIH elements extraction in key events of Chinese news story We try to build a practical Chinese event extraction system by combining Natural Language Processing technologies (Lexical analysis, NER) Machine Learning (SVM, CRF) Semantic Web technologies (Ontology, OWL, Rules) Key event identification in one news story 5W1H event semantic-elements extraction Chinese Online News Event knowledge base Event semantic modeling and ontology population To extract 5WIH elements in key events of Chinese news story and build an event knowledge base, we conduct two level research work. The bottom one is the modeling event’s semantic information and finding an ontology population method. The upper level is to identify key event in one news story and then extract 5W1H semantic-elements of the key events. Our goal is to build a practical Chinese event extraction system. We try to combine Natural Language Processing technologies (Lexical analysis, NER), Machine Learning (SVM,CRF) and Semantic Web technologies (Ontology, OWL, Rules) to achieve this goal. NLP&CC, Beijing, China -7-

8 Outline Introduction Related Work News Ontology Event Model Evaluation
Event Definitions Existing Event Models News Ontology Event Model The Design of NOEM Main Concepts and Properties in NOEM Evaluation Conclusion This is the outline of my presentation. NLP&CC, Beijing, China -8-

9 Related Work Event Definitions WordNet “something that happens at a given place and time.” Cognitive psychologists “happenings in the outside world”, people observe and understand the world through event . Linguists (Chung and Timberlake, 1985) “an event can be defined in terms of three components: a predicate; an interval of time on which the predicate occurs and a situation or set of conditions under which the predicate occurs.” TimeML “a cover term for situations that happen or occur. Events can be punctual or last for a period of time.” ACE (Automatic Content Extraction) “an event involving zero or more ACE entities, values and time expressions” Event-based summarization atomic events: link major constituent parts (participants, locations, times) of events through verbs or action nouns labeling the event itself. 事件概念已经在多个研究领域中得到了应用,但定义各不相同。 事件(Event)最早由认知科学家提出,认知科学家们认为人们是以“事件”为单位来体验和认知世界的,事件符合人们的正常认知规律,事件以研究动词为核心内容,可以解释概念的结构的形成。 认知科学家[33] 人们是以“事件”为单位来体验和认知世界的,事件符合人们正常认知规律;事件以研究动词为核心内容,可以解释概念结构的形成 WordNet[25] 宽泛的定义:在特定地点和时间发生的某件事 语言学家[26] 谓词;事件框架,即谓词发生的时间段;事件界,即谓词发生的情况或者条件 基于事件的自动文摘[4] 原子事件:动词(或者动名词)及其连接起来的行为的主要组成部分(如参与者,地点,时间等) ACE (Automatic Content Extraction) [13] 事件是一个复杂的结构,包括0个或多个ACE实体、数值和时间表达 新闻事件定义: 新闻中主要报道的包含多个参与者的特定行为或事情。它包含三个组成部分:谓词;核心参与者,即施事、受事和与事;辅助参与者,谓词发生的时间和地点。 根据本文的研究目标,我新闻事件定义为“新闻中主要报道的包含多个参与者的特定行为或事情。它包含三个组成部分:谓词;核心参与者,即施事和受事;辅助参与者,即谓词发生的时间和地点。” NLP&CC, Beijing, China -9-

10 Related Work We define event as
“an event is a specific occurrence which involves in some participants”. It has three components: a predicate; core participants, i.e., agents and patients; auxiliary participants, i.e., time and location of the event. These participants are usually named entities which correspond to what, who, whom, when, where elements of an event. 根据本文的研究目标,我新闻事件定义为“新闻中主要报道的包含多个参与者的特定行为或事情。它包含三个组成部分:谓词;核心参与者,即施事和受事;辅助参与者,即谓词发生的时间和地点。” <S, P, O, T, L>, where S, P, O are core elements and T, L are subordinates. NLP&CC, Beijing, China -10-

11 Related Work Existing Event Models
Script Theory, Event Domain Cognitive Model Cognitive linguistics Probabilistic Event Model TDT Atomic Event Model Event-based automatic summarization Structural Event Model MUC & ACE Generic Event Model Eventcentric multimedia data management Ontology Event Models ABC, PROTON, EO (Event Ontology) , Event-Model-F 可操作性不强;太过简单,不能刻画事件的5W1H要素;领域相关,移植性差;不支持中文;信息太过丰富,需要简化;本体模型能够描述事件、实体的类型、属性、关联,具有通用性,易扩展,可描述多种对象及其关联 TDT:一个新闻事件可以被表示为人,时间,位置和关键词的按某种概率构成的集合。 文本摘要:原子事件就是由一个动词或一个动作性名词连接的两个命名实体构成的三元组。 信息抽取:模板与谓词论元结构 问答系统:事件模型是从文本中表示事件结构的词语单元中抽象出来的,一个适当的描述事件内部结构的模型来自于 框架语义学中的能够建模外部概念的框架结构。事件结构和框架语义学中的框架结构一样,是命题的形式化表达。 ABC, PROTON in knowledge management, EO (Event Ontology) describing music production process Event-Model-F in distributed event-based system. 结构化事件模型虽然能够在不同复杂程度上描述事件信息,但普遍存在的问题是缺乏形式化语义描述和推理能力。 基于本体的事件模型:ABC、PROTON、Event Ontology和Event-Model-F具有较强的事件语义描述能力,但也通常与领域相关,并不能完全适应新闻事件抽取的研究需要。 NLP&CC, Beijing, China -11-

12 Outline Introduction Related Work News Ontology Event Model Evaluation
Event Definitions Existing Event Models News Ontology Event Model The Design of NOEM Main Concepts and Properties in NOEM Evaluation Conclusion This is the outline of my presentation. NLP&CC, Beijing, China -12-

13 News Ontology Event Model
Modeling (1) event information, (2) event relations, (3) event media This is the designed News Ontology Event Model. NOEM defines concepts of entities (time, person, location, organization etc.), events and relationships to capture temporal, spatial, information, experiential, structural and causal aspects of events. NLP&CC, Beijing, China -13-

14 News Ontology Event Model
Main concepts Relations This is the designed News Ontology Event Model. NOEM defines concepts of entities (time, person, location, organization etc.), events and relationships to capture temporal, spatial, information, experiential, structural and causal aspects of events. NLP&CC, Beijing, China -14-

15 Outline Introduction Related Work News Ontology Event Model Evaluation
Event Definitions Existing Event Models News Ontology Event Model The Design of NOEM Main Concepts and Properties in NOEM Evaluation Conclusion This is the outline of my presentation. NLP&CC, Beijing, China -15-

16 Evaluation Janez Brank et. al. classified ontology evaluation methods into four categories: (1) Comparing the ontology to a “golden standard”; (2) Using an ontology in an application and evaluating the results; (3) Comparing with a source of data about the domain to be covered by the ontology; (4) Evaluation is done by humans who try to assess how well the ontology meets a set of predefined criteria, standards, requirements. 为了对NOEM模型进行检验,利用了janez等人提出的四类本体评估方法中的三种:1、2、4,即…… NLP&CC, Beijing, China -16-

17 Evaluation Comparison between NOEM and existing event models
For evaluation of our designed NOEM, we compare it with existing event models. Result shows our model has a compact structure and strong expression ability and suitable for Chinese news domain. For evaluation of 5ws extraction, please see our previous work. NLP&CC, Beijing, China -17-

18 Evaluation Manual labeling 4 postgraduates
Category code Category name Subclasses 1 政治 85 2 法律、司法 76 3 对外关系、国际关系 72 4 军事 129 5 社会、劳动、灾难事故 105 11 经济 132 12 经济理论研究 13 基本建设、建筑业、房地产 47 14 农业、农村 99 15 矿业、工业 239 16 能源、水务、水利 69 17 信息产业 18 交通运输、邮政、物流 65 19 商业、外贸、海关 55 21 服务业、旅游业 84 22 环境、气象 43 31 教育 63 33 科学技术 70 35 文化、娱乐休闲 98 36 文学、艺术 130 37 传媒业 61 38 医药、卫生 88 39 体育 68 Manual labeling 4 postgraduates 6000+ Chinese News stories from Xinhua news agency Covers 23 top classes and 2082 subclasses of CNML In 85% of them, we found a topic sentence which contains key event of the news 4/5Ws in the topic sentence which can be described by NOEM appropriately cnml即“中文新闻信息置标语言”, 中文新闻信息分类标准 For evaluation of our designed NOEM, we compare it with existing event models. Result shows our model has a compact structure and strong expression ability and suitable for Chinese news domain. For evaluation of 5ws extraction, please see our previous work. NLP&CC, Beijing, China -18-

19 Evaluation: A Case Study
Chinese President Hu Jintao arrived in Canada for a state visit Result of 5W1H extraction of key event <抵达, isTypeof, Movement/Transport>, <胡锦涛, isTypeof, Person>, <8日, isTypeof, Time> , <渥太华, isTypeof, Place> …… 5W1H Extraction Here is an example of our method on a Chinese news story titled “Chinese President Hu Jintao arrive in Canada for a state visit”. The news contains an arrival, a speech deliver, a meeting attendance and a conversation event. The key event is the arrival which belongs to ACE Movement/Transport event. The 5w1h elements are identified and extracted from the key event and mapped to concepts in the NOEM. 在本体的实际应用方面,首先以一个实例展示NOEM的表达能力:在胡报道中,包含抵达、发表、出席、检阅、会见等多个事件,根据ACE事件类型的定义,抵达事件是所关注的Movement/Transport类事件。事件的要素(参与者)首先被识别出来,并与本体中的概念相关联,以RDF的形式表示;最后整理成5W1H的形式; NLP&CC, Beijing, China -19-

20 Evaluation: Population of NOEM
Chinese President Hu Jintao arrived in Canada for a state visit An automatic generated OWL File Ontology Population 同时,由于在实现过程中,NOEM是在protégé中定义的,因此,可能将RDF数据根据预定义模板自动生成OWL文件,将事件及其要素作为实例导入本体,构造事件知识库。 从这个简单的case study中,可以看出NOEM具有一定的可操作性和可应用性。 At the same time, NOEM is defined in protégé, which supports population from the OWL file which contains RDF triples automatic generated by the 5w1h elements to the Ontology. This simple case shows the applicability and feasibility of NOEM. NLP&CC, Beijing, China -20-

21 Outline Introduction Related Work News Ontology Event Model Evaluation
Event Definitions Existing Event Models News Ontology Event Model The Design of NOEM Main Concepts and Properties in NOEM Evaluation Conclusion This is the outline of my presentation. NLP&CC, Beijing, China -21-

22 Conclusion Main contributions Future work
an extensive investigation of “event” and “event modeling” the usage of concept of 5W1H semantic elements in Chinese news domain the design of ontology-based event model: NOEM defining concepts of entities (time, person, location, organization etc.), events and relationships to capture temporal, spatial, information, experiential, structural and causal aspect, e.g. the 5W1H, of an event Future work building a news events knowledge base and a semantic retrieval engine on NOEM to support event level semantic applications The main contributions of this paper is it propose a novel event semantic understanding framework to facilitate online news browsing. Our future work include improving the performance of the proposed methods and the precision of the event 5W1H elements extraction algorithm, and building an event knowledge base to support event level semantic applications. NLP&CC, Beijing, China -22-

23 Thank you for your patience!
The End Thank you for your patience! Q&A

24 Framework A streamline of three steps and six sub-tasks
(1) Title classification and (2) topic sentences extraction for key event identification; (3) Semantic role labeling and (4) 5W1H elements identification for event semantic elements extraction; (5) NOEM definition and (6) Ontology population for event knowledge base construction. This is the framework of our 5w1h extraction methods. It has six sub-tasks which are grouped into three steps. Task 1: key event identification step, achieved by (1) Title classification and (2) topic sentences extraction . Task 2: event semantic elements extraction step, achieved by (3) Semantic role labeling and (4) 5W1H elements identification. Task 3: event knowledge base construction step, achieved by (5) NOEM definition and (6) Ontology population. NLP&CC, Beijing, China -24-

25 Publications Please see our previous work for more details
Key Event Extraction Wang, W., Zhao, D., Zhao, W.: Identification of topic sentence about key event in Chinese News. Acta Scientiarum Naturalium Universitatis Pekinensis 47(5),789–796 (2011). 5Ws Extraction Wang, W., Zhao, D., Zou, L., Wang, D., Zheng, W.: Extracting 5W1H Event Semantic Elements from Chinese Online News. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM LNCS, vol. 6184, pp. 644–655. Springer, Heidelberg (2010) Wang W., Zhao D., Wang D.: Chinese news event 5w1h elements extraction using semantic role labeling. In: the 3th ISIP. pp. 484–489(2010) Framework Wang, W., Zhao, D.: Chinese News Event 5W1H Semantic Elements Extraction for Event Ontology Population. WWW2012 PhD symposium. Lyon, France. (2012) For evaluation of our designed NOEM, we compare it with existing event models. Result shows our model has a compact structure and strong expression ability and suitable for Chinese news domain. For evaluation of 5ws extraction, please see our previous work. NLP&CC, Beijing, China -25-

26 NLP&CC, Beijing, China -26-
For evaluation of our designed NOEM, we compare it with existing event models. Result shows our model has a compact structure and strong expression ability and suitable for Chinese news domain. For evaluation of 5ws extraction, please see our previous work. NLP&CC, Beijing, China -26-

27 Title Based Key Event Extraction
Input: News document Output: Topic sentences Begin NLP-based Preprocessing: Title classification; // classified the title into informative or non-informative Topic words extraction; //1)TFIDF; 2) PageRank in word co-occurrence graph Title & Topic words co-occurrence analysis; //(1) For each sentence do: Term frequency scoring; //(2) Sentence location scoring; //(3) Sentence length scoring; //(4) Name entity scoring; //(5) Sentence and title similarity scoring; //(6) Sentence weighting & ranking; //(8) End do End For the Task 1, we propose a method of TBKEE, e.g., Title Based Key Event Extraction. The detail of our algorithm is listed here, the main idea of it is to use surface and semantic characteristics of a news story to identify a most important sentence which has the highest possibility to describe the key event of the story. Features used in this method include: Term frequency , Sentence location, Sentence length, Name entity, similarity between sentence and title. Specially, we by analyzing words co-occurrence of Title & Topic words, the stress the importance of an informative title to evaluate the topic sentence. NLP&CC, Beijing, China -27-

28 Chinese News Semantic Elements Extraction
Input: Topic Sentences Output: < Subject, Predicate, Object, Time, Location> & How of news Begin For each topic sentence do 1) NE recognition; 2) NP recognition; 3) Event identification and classification by verb-driven & SVM ; 4) Syntactic-semantic rules-based <Subject, Predicate, Object> recognition; 5) Time expressions identification and normalization; 6) Location identification; 7) Topic sentences as short summarization; End do End HMM-based NER tool CRF-based NP tagger What For the second step, we design a Chinese News Semantic Elements Extraction method which comprise a serial of algorithms. For example, we use a HMM-based NER tool to recognize NE, design a CRF-based NP tagger to recognize NP, a verb-driven & SVM method to identify Event, and a Syntactic-semantic rules-based algorithm to recognize event triple <Subject, Predicate, Object>. For sake of time, please see our previous work. Who did what to whom When Where How NLP&CC, Beijing, China -28-


Download ppt "Institute of Computer Science & Technology"

Similar presentations


Ads by Google