Presentation is loading. Please wait.

Presentation is loading. Please wait.

TopicTrend By: Jovian Lin Discover Emerging and Novel Research Topics.

Similar presentations


Presentation on theme: "TopicTrend By: Jovian Lin Discover Emerging and Novel Research Topics."— Presentation transcript:

1 TopicTrend By: Jovian Lin Discover Emerging and Novel Research Topics

2 Introduction Formulating a research idea is the 1 st step for success in academia. A worthy research idea must be original and innovative. In order to come up with innovative research ideas, researchers have to read a lot of published articles… … which is time-consuming.

3 “Is there any shortcut to success?” “No.” “There are efficient ways to achieve success” Search Engines in Digital Libraries:

4 Search engines support information seeking and retrieval. Introduction Search Engine “Search Query” List of titles (of articles)

5 Search Results

6 Search engines support information seeking and retrieval. However, is this enough for the junior researcher? Introduction FYP students1 st year PhD students Define a research topic (from zero knowledge) Help in survey Identify emerging/new research areas to explore Determine related topics How useful is this result to the junior researcher?

7 Search engines support information seeking and retrieval. Input: “search query” (e.g. machine learning, DNA, polymerase) Output: List of titles + other info Ranked based on semantic closeness to the “search query”. However, Cannot help users understand research trends. Cannot help users recognize “hot” topics. Cannot help users understand how topics interact and influence research activity. Problem Definition

8 Junior researchers want: Understand research topics and trends. Recognize HOT topics. Understand how topics interact and influence research activity. Problem Definition

9 Junior researchers want: Understand research topics and trends. Recognize HOT topics. Understand how topics interact and influence research activity. Problem Definition Enter a search query View results Select a few articles to read Extract new terms from selected article Current Inefficient Method

10 Search Results

11 Information overload !

12 Junior researchers want: Understand research topics and trends. Recognize HOT topics. Understand how topics interact and influence research activity. Problem Definition Enter a search query View results Select a few articles to read Extract new terms from selected article Current Inefficient Method

13 Junior researchers want: Understand research topics and trends. Recognize HOT topics. Understand how topics interact and influence research activity. Problem Definition Enter a search query View results Desired Efficient Method Visualization of the research topics List of HOT research topics (related to the search query) Do it quick! TopicTrend

14 Our Solution Enter a search query View results Visualization of the research topics List of HOT research topics (related to the search query)

15 Quick Demo

16

17 Recruited 4 participants. Participants: Tested TopicTrend using queries from their respective domains. Rated TopicTrend’s output (w.r.t. their query). [Quantitative] Filled up a questionnaire. [Qualitative] Evaluation Chemistry / PhD Engineering (Transportation) / PhD Comp Science (AI) / PhD Engineering / FYP

18 Evaluation Topic A Topic B Topic C Topic D Topic E Topic F Topic G Topic H 1 0 1 1 1 1 1 1 Topic I1 Topic J1 Score 9/10 Topic A Topic B Topic C Topic D Topic E Topic F Topic G Topic H Topic I Topic J “machine learning”

19 Evaluation Average score = 68.125% Quantitative

20 Evaluation Questionaire using Five-Point Likert Scale. 1=Disagree, 5 =Agree. Some examples: “The system was easy to use.” “The system gave interesting results.” “I was able to get a better understanding of the topics.” “I was able to discover trends.” “I was able to discover relationships between topics.” “I was able to discover potential, novel topics.” Details in Project Report. Qualitative 4.75 / 5 4 / 5

21 Conclusion TopicTrend is a visualization tool that helps junior researchers: Understand research topics and trends. Recognize HOT topics. Understand how topics interact and influence research activity. However, results were mediocre  Due to presence of stop phrases (e.g., “problem set”, “proposed model”, etc) Solutions and Future Work: TF-IDF weight — don’t have to manually enter stop words. Statistical measure to evaluate how important a word is. The importance increases to the number of times a word appears in the document... But is offset by the frequency of the word in the corpus. Latent Dirichlet Allocation (LDA) – view each abstract as a mixture of topics. (David Blei) Online LDA – find topics faster than normal LDA; analyze in a stream. Dynamic Topic Models (DTM) – captures the word evolution of each topic over time. Search by exemplar (instead of search by keyword) Benefits users who have difficulty expressing their query.

22 Conclusion TopicTrend is a visualization tool that helps junior researchers: Understand research topics and trends. Recognize HOT topics. Understand how topics interact and influence research activity. However, results were mediocre  Due to presence of stop phrases (e.g., “problem set”, “proposed model”, etc) Solutions and Future Work: TF-IDF weight — don’t have to manually enter stop words. Statistical measure to evaluate how important a word is. The importance increases to the number of times a word appears in the document... But is offset by the frequency of the word in the corpus. Latent Dirichlet Allocation (LDA) – view each abstract as a mixture of topics. (David Blei) Online LDA – find topics faster than normal LDA; analyze in a stream. Dynamic Topic Models (DTM) – captures the word evolution of each topic over time. Search by exemplar (instead of search by keyword) Benefits users who have difficulty expressing their query.

23 Thank You

24 Backup Slides

25 OpenNLP —a machine learning based toolkit for the processing of natural language text. Used OpenNLP to retrieve a list of NPs. Implementation OpenNLP Tools An article 1.Sentence Detection 2.Tokenization 3.Part-of-Speech (POS) Tagging 4.Chunking and Retrieving NPs NP A NP B NP C NP D NP E NP F

26 Sentence Detection Implementation Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC, was named a director of this British industrial conglomerate. Those contraction-less sentences don't have boundary/odd cases...this one does. Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC, was named a director of this British industrial conglomerate. Those contraction-less sentences don't have boundary/odd cases...this one does.

27 Tokenization Implementation Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. [Pierre] [Vinken] [,] [61] [years] [old] [,] [will] [join] [the] [board] [as] [a] [nonexecutive] [director] [Nov.] [29] [.] [Mr.] [Vinken] [is] [chairman] [of] [Elsevier] [N.V.] [,] [the] [Dutch] [publishing] [group] [.]

28 Part-of-Speech Tagging Implementation Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. [NNP] [NNP] [,] [CD] [NNS] [JJ] [,] [MD] [VB] [DT] [NN] [IN] [DT] [JJ] [NN] [NNP] [CD] [.] [NNP] [NNP] [VBZ] [NN] [IN] [NNP] [NNP] [,] [DT] [JJ] [NN] [NN] [.]

29 Text Chunking and Extracting NPs Text chunking consists of dividing a text in syntactically correlated parts of words. Uses the Tokenization and POS Tagging data. For example: He reckons the current account deficit will narrow to only # 1.8 billion in September. Becomes: [NP He ] [VP reckons ] [NP the current account deficit ] [VP will narrow ] [PP to ] [NP only # 1.8 billion ] [PP in ] [NP September ]. Implementation

30 Text Chunking and Extracting NPs Text chunking consists of dividing a text in syntactically correlated parts of words. Uses the Tokenization and POS Tagging data. Implementation Note the: B-Chunk I-Chunk

31 OpenNLP —a machine learning based toolkit for the processing of natural language text. Used OpenNLP to retrieve a list of NPs. Implementation OpenNLP Tools An article 1.Sentence Detection 2.Tokenization 3.Part-of-Speech (POS) Tagging 4.Chunking and Retrieving NPs NP A NP B NP C NP D NP E NP F

32 An algorithm to calculate the score of a NP. Implementation NP A NP B NP C NP D NP E NP F # (0 ~ 2 years) # (2 ~ 4 years) # (4 yrs & beyond) 10 2 1 Score = 10 + 1 10 + 2 + 1 + 20 = 11 33 = 0.333 # (0 ~ 2 years) # (2 ~ 4 years) # (4 yrs & beyond) 1 2 10 Score = 1 + 1 1 + 2 + 10 + 20 = 3 33 = 0.090

33 An algorithm to calculate the score of a NP. Implementation NP A NP B NP C NP D NP E NP F

34 Re-rank the list of NPs base on the score. Implementation Re-rank NP B NP D NP E NP C NP A NP F New! NP A NP B NP C NP D NP E NP F

35 Implementation Calculate the relationship strength between NPs by considering the common articles (PIIs) that they have. The more articles they have in common, the thicker the edge.

36 The End


Download ppt "TopicTrend By: Jovian Lin Discover Emerging and Novel Research Topics."

Similar presentations


Ads by Google