Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Institute of Informatics Kiyoko Uchiyama 1 A Study for Introductory Terms in Logical Structure of Scientific Papers.

Similar presentations


Presentation on theme: "National Institute of Informatics Kiyoko Uchiyama 1 A Study for Introductory Terms in Logical Structure of Scientific Papers."— Presentation transcript:

1 National Institute of Informatics Kiyoko Uchiyama 1 A Study for Introductory Terms in Logical Structure of Scientific Papers

2 Outline 1. Purpose, background, motivation 2. What’s “Introductory terms” 3. Analysis of logical structure 4. Analysis of structural role 5. Apply to MIC theory 6. Future works 2

3 Author-based logging in

4 Result of author’s publications & similar papers and similar researchers

5 Keyword-based logging in

6 Result of keyword search by cosine similarity

7 Select seed paper & several viewpoints

8 Jump to Cinii & REO

9 Purpose 9 Investigate the occurrence of introductory terms in logical structure of textbooks, research papers and encyclopedia Categorize each sentence including introductory terms into structural roles Analyze how to behave the introductory terms in Introduction section

10 Background A lot of technical terms exit in specific domain Difficult to identify the most important terms in the target field for novices Novices should learn the basic and necessary terms in the field in the first priority 10

11 Our motivation Apply to a method for advanced search Assume that introductory terms.. – play a important role for describing domain knowledge – help novices to understand the content of academic papers 11

12 What is “introductory terms”? 12 Essential & basic terms for a target field The terms that should make it a first priority to learn in a target field Difficult to understand more difficult terms in the target field without the introductory terms

13 novice Hidden Markov Model Chasen MeCab JUMAN KAKASI Paper A Conditional Random field Maximum entropy model High →→ introductory degree →→ low Morphologic al analysis ▼ Syntactic analysis Semantic analysis Tutorial paper PaperB PaperC

14 Automatic definition 14 Define the introductory terms which are selected in common by a lot of experts Experts of specific field wrote/edited the following resources – Textbooks – Encyclopedia – Research papers

15 Priority ( Frequency ) 15 Authors arrange the contents of their textbooks in an easy-to-understand order Authors include important keywords in title, author-assigned keywords in academic papers The table of contents of Encyclopedia is edited by a lot of experts

16 Compositionality 16 Introductory terms generate various new compound nouns by concatenating single words or word strings in prefix/suffix form All terms consist of the introductory terms are counted for this study

17 Logical Structure 17 Distribution patterns in IMRD structure (Introduction, Method, Result and Discussion) of the text might be informative for identifying the introductory terms Assume that introductory terms are frequently used in introduction section

18 Data set 18 Target field: NLP, Target language: Japanese Textbooks: 39 textbooks whose titles included “natural language processing” Natural Language Processing Encyclopedia written in Japanese Academic papers: 1421 papers of NLP research group in Information Processing Society of Japan from 1993 to 2007

19 Data collection 19 Morphological analysis by MeCab for Japanese Extract sequential noun strings as the term candidate in – the textbooks(694 types) – table of contents of Encyclopedia(463 types) – title, abstract and author-assigned keywords in papers ( 13493 types) 90 terms appeared in all of three resources

20 Analysis of Logical Structure 20 Use full text of research papers in NLP field Target papers which describe experiments and results Extract 100 papers which include words such as “experiment”, “evaluation”, “precision” and “%” and so on Divide full texts into 6 sections

21 21 numbers of sentences numbers of sentences including introductory terms rate Abstract 6563620.552 Introduction 244812840.525 Experiment 893127010.302 Related works 12225420.444 Conclusion 8053940.489 Others 1196534390.287 Total 2602787220.376

22 Analysis of structural role Extract sentences including introductory terms in Introduction “Introduction” section has several kinds of sentences outlining the research Categorize each sentences into structural role by manual Analyze the sentence from the viewpoint of various features 22

23 Structural role 23 1. Hypothesis 2. Motivation Problem 3. Background 4. Goal 5. Object 6. Method( new-old ) 7. Experiment 8. Model 9. Observation 10. Result 11. Conclusion Base on the the CoreSC Annotation scheme ( Soldatova & Liakata, 2007)

24 Features in structural role Tense, aspect, modality Verbs Syntactic features Lexical features 24

25 Tense, Aspect 25 Background – Recently, morphological analysis has been transitioning from the method based on heuristic knowledge to the method using probabilistic model. ( 近年、 〜しつつある。) Related Works – The authors is proposing/proposed a method for morphological analysis using rule-based paraphrasing (提案している => 提案した)

26 Modality, Verbs Modality – The high level of language processing would be needed for assigning semantic features to words 必要かもしれない Verbs – Specific verbs in present sense tend to be used in Object Ex. Propose 、 intend, design, tackle 26

27 Compounding Morphological analysis – Japanese morphological analysis – Morphological analysis model – Morphological analysis system Machine Translation – Machine translation method – Statistical machine translation 27

28 Syntactic features 28 Temporal expression (Background) – Recently 近年、 so far これまで、 – Several researches have been done …. 研究が行われてきた Fixed expression (Motivation, Related-works) – It is inevitable/necessary 〜必要である – The research has not be done … 〜の研究は 行われていない – [Authors] is proposing … 提案している

29 Lexical features Keywords related to structural role – Problem One of the main problems is that unknown word and new terms have been increasing day by day. it costs a lot of time … – Experiment We conducted/ proceeded the experiment In order to evaluate our proposed method, – Result We show the result of the experiment … We could obtain better precision … 29

30 Discussion Introductory terms are frequently used in sentences to position the proposed method in a target field Introductory terms and the structural role introduced the basic domain knowledge which is necessary for understanding the main purpose of papers Possible to classify each sentence into specific structural role automatically 30

31 Future works 31 Categorize sentences including introductory terms into each structural role automatically Analyze the collocation words with introductory terms – Syntactic information ( subject, object, modifier, and so on ) – Semantic relation between the introductory terms and other terms ( objective, method, target )

32 Information types 32 contentsinformationComponents of papers Semantic information Intensive expression Logical structure Informative expression Structural role Syntactic information Basic expressionTense, aspect, modality Introductory terms, author assigned keywords

33 Apply to MIC theory Logical structure consists of structural roles The authors consider the discourse of their paper based on their proposed model/method MIC theory could be applied to sentence level and discourse level The order strategy of structural roles might relate to meta-information 33

34 Analysis of Hierarchy Sentence level – There are no researches for [METHOD] Basic expression → informative expression Discourse level Background: Recently, [METHOD]has been used in… Motivation: We need to consider [METHOD] for morphological analysis Objective-New: We propose [METHOD] ←Focus 34

35 Conclusion Might be interested in analysis of introductory terms and their surrounding syntactic and semantic information from the view point of MIC ( I’m not sure…) The result of the analysis would hope to contribute the understanding of academic papers 35


Download ppt "National Institute of Informatics Kiyoko Uchiyama 1 A Study for Introductory Terms in Logical Structure of Scientific Papers."

Similar presentations


Ads by Google