Wikitology: A Wikipedia Derived Knowledge Base Zareen Syed Advisor: Dr. Tim Finin February 6th, 2009 Zareen Syed Advisor: Dr. Tim Finin February 6th, 2009.

Slides:

Advertisements

Similar presentations

Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.

Advertisements

Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.

Introduction to Information Retrieval

Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Wikitology Wikipedia as an Ontology Zareen Syed and Anupam Joshi University of Maryland, Baltimore County James Mayfield, Paul McNamee and Christine Piatko.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.

Search Engines and Information Retrieval

Aki Hecht Seminar in Databases (236826) January 2009

Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.

MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.

1 UCB Digital Library Project An Experiment in Using Lexical Disambiguation to Enhance Information Access Robert Wilensky, Isaac Cheng, Timotius Tjahjadi,

Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.

Large-Scale Cost-sensitive Online Social Network Profile Linkage.

Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.

CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.

Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.

Query Relevance Feedback and Ontologies How to Make Queries Better.

Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.

Search Engines and Information Retrieval Chapter 1.

C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )

An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.

1 The BT Digital Library A case study in intelligent content management Paul Warren

Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.

Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad and Anupam Joshi University of Maryland, Baltimore County

8//2808 Wikitology Wikipedia as an Ontology Tim Finin, Zareen Syed and Anupam Joshi University of Maryland, Baltimore County

Wikitology Wikipedia as an Ontology Zareen Syed, Tim Finin and Anupam Joshi University of Maryland.

The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Querying Structured Text in an XML Database By Xuemei Luo.

Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

Google News Personalization: Scalable Online Collaborative Filtering

Chapter 6: Information Retrieval and Web Search

1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma

Algorithmic Detection of Semantic Similarity WWW 2005.

Creating and Exploiting a Web of Semantic Data. Overview Introduction Semantic Web 101 Recent Semantic Web trends Examples: DBpedia, Wikitology Conclusion.

Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.

2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.

1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.

Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

Presented By- Shahina Ferdous, Student ID – , Spring 2010.

Topic Maps introduction Peter-Paul Kruijsen CTO, Morpheus software ISOC seminar, april 5 th 2005.

LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.

Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.

Generating Query Substitutions Alicia Wood. What is the problem to be solved?

Creating and Exploiting a Web of (Semantic) Data, Tim Finin Zareen Syed and Anupam Joshi University of Maryland, Baltimore County James Mayfield, Paul.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

Harnessing the Deep Web : Present and Future -Tushar Mhaskar Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy January 7,

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Web IR: Recent Trends; Future of Web Search

Wikitology Wikipedia as an Ontology

CSE 635 Multimedia Information Retrieval

Information Retrieval and Web Design

Topic: Semantic Text Mining

Presentation transcript:

Wikitology: A Wikipedia Derived Knowledge Base Zareen Syed Advisor: Dr. Tim Finin February 6th, 2009 Zareen Syed Advisor: Dr. Tim Finin February 6th, 2009

2 October 2015Page 2 Outline Introduction and MotivationIntroduction and Motivation Related WorkRelated Work Proposed WorkProposed Work TimelineTimeline Work ProgressWork Progress ConclusionConclusion

2 October 2015Page 3 Introduction Wikipedia Encyclopedia Developed Collaboratively Freely available online Millions of articles English Wikipedia (2,723,767 articles) Multiple Languages (More than 260) Structured and un-structured content

2 October 2015Page 4 Introduction Wikipedia Content and Organization Article Text Categories and Category Hierarchy Inter-article Links Info-boxes Disambiguation Pages Redirection Pages Talk Pages History Pages Meta-data

2 October 2015Page 5 Motivation Challenges Human Understandable Content (not machine readable) How to make it more structured and organized to improve machine readability How to automatically exploit the knowledge in Wikipedia to solve some real world problems

2 October 2015Page 6 Thesis Statement We can exploit Wikipedia and other related knowledge sources to automatically create knowledge about the world supporting a set of common use cases such as: Concept Prediction Information Retrieval Information Extraction

2 October 2015Page 7 Proposed Contributions Developing a Novel Hybrid Knowledge Base composed of structured, semi-structured and un- structured information extracted from Wikipedia and other related sources Developing Novel Application Specific Algorithms for exploiting the hybrid knowledge base Task Based Evaluation of the system on common use-cases such as Concept Prediction, Information Retrieval and Information Extraction

2 October 2015Page 8 Outline Introduction and MotivationIntroduction and Motivation Related WorkRelated Work Proposed WorkProposed Work TimelineTimeline Work ProgressWork Progress ConclusionConclusion

2 October 2015Page 9 Related Work Information Extraction Relation extraction [35] Co-reference resolution [25] Named Entity Classification [52] Natural Language Processing Automatic word sense disambiguation [27] Searching synonyms [28]

2 October 2015Page 10 Related Work Information Retrieval Text categorization [24] Computing semantic relatedness [30,31,32] Predicting document topics [26] Search Engine [69] Semantic Web DBPedia [46] Semantic MediaWiki [46] Linked Open Data Project [23] Freebase [22]

2 October 2015Page 11 Outline Introduction and MotivationIntroduction and Motivation Related WorkRelated Work Proposed WorkProposed Work TimelineTimeline Work ProgressWork Progress

2 October 2015Page 12 Proposed Work Refining, Enriching and Exploiting Structured Content in Wikipedia Integrating other related knowledge sources Developing application specific algorithms Developing a dynamic and scalable architecture

2 October 2015Page 13 Issues Single document in too many categories: George W. Bush is included in about 30 categories Links between articles belonging to very different categories John F. Kennedy has a link for “coincidence theory” which belongs to the Mathematical Analysis/ Topology/Fixed Points. Number of articles with in a category: Some categories are under represented where as others have many articles Administrative Categories For eg: Clean up from Sep 2006 Articles with unsourced statements Links to words in an article For eg. If the word United States appears in the document then that word might be linked to the page on “United States”

2 October 2015Page 14 Issues Category Hierarchy: Multiple Parents (Thesaurus) Noisy “Animals” category defined in the sub-tree rooted at “People” loose subsumption Geography-> Geography by place -> Regions-> Regions of Asia- >Middle East -> Dances of Middle East Events->Events by year->Lists of leaders by year

2 October 2015Page 15 Category Hierarchy Filtering out Administrative Categories Algorithms for Selecting and Ranking Categories Inferring and Labeling Semantic Relations between Categories Refining Subsumption (Taxonomy) Instance-of Relation Using Information in Wikipedia Lists_of_Topics Using Specific Administrative Categories Done Refining, Enriching and Exploiting Structured Content in Wikipedia

2 October 2015Page 16 Refining, Enriching and Exploiting Structured Content in Wikipedia Inter-Article Links: Problem: Don’t imply semantic relatedness Links to locations, term definitions, dates, entities Possible solutions: Classifying Link Types Introducing Link Weights Done

2 October 2015Page 17 Redirection Pages Refining, Enriching and Exploiting Structured Content in Wikipedia Disambiguation pages

2 October 2015Page 18 Proposed Work Exploring Other structured content Talk pages, user pages, history pages and meta-data Other structured resources Integrating structured information from other sources like DBpedia and Freebase in Wikitology How and When to employ reasoning over the RDF triples

2 October 2015Page 19 Proposed Work Developing Novel Application Specific Algorithms on top of the hybrid Wikitology Knowledge Base for applications such as Concept Prediction Information Retrieval Information Extraction

2 October 2015Page 20 Proposed Work Evaluation Main Approaches to Evaluating Ontologies Gold Standard Evaluation (Comparison to an existing Ontology) Criteria based Evaluation (By humans) Task based Evaluation (Application based) Comparison with Source of Data (Data driven) Using a Reasoning Engine Our Approach to Evaluation Task based Evaluation (Application based)

2 October 2015Page 21 IR Index Relational Database Relational Database Triple Store RDF Reasoner Page Links Graph Category Links Hierarchical Graph Category Links Hierarchical Graph Articles Wikitology Code Application Specific Algorithms Application Specific Algorithms Application Specific Algorithms Application Specific Algorithms Application Specific Algorithms Application Specific Algorithms Wikitology Overview

2 October 2015Page 22 Outline Introduction and MotivationIntroduction and Motivation Related WorkRelated Work Proposed WorkProposed Work Time LineTime Line Work ProgressWork Progress ConclusionConclusion

2 October 2015Page 23 Time Line No.Mile StonesExpected Completion Date 1Enriching Wikitology by extracting additional information from Wikipedia May, Studying other related knowledge sources in detail such as Freebase, DBPedia, YAGO etc. May, Incorporating additional knowledge sources to enrich Wikitology May, Working on techniques to improve applications in Information Retrieval and Information Extraction using additional features generated from Wikitology Dec, Evaluating the Wikitology knowledge baseMay, Thesis write upAug, 2010

2 October 2015Page 24 Outline Introduction and MotivationIntroduction and Motivation Related WorkRelated Work Proposed WorkProposed Work Time LineTime Line Work ProgressWork Progress ConclusionConclusion

2 October 2015Page 25 Work Done Case Study 1: Concept Prediction Case Study 2: Document Expansion for Information Retrieval Case Study 3: Named Entity Classification Case Study 4: Co-reference Resolution Case Study 5: Concept Based Features for Information Retrieval In Progress

2 October 2015Page 26 Case Study 1 Concept Prediction [2] Problem: Predict the individual document topics as well as concepts common to a set of documents Approach: Hybrid Knowledge base: Wikitology 1.0 Algorithms for selecting and aggregating terms

2 October 2015Page 27 Wikitology 1.0 Wikipedia as an Ontology Each article is a concept in the ontology Terms linked via Wikipedia’s category system and inter- article links It’s a consensus ontology created, kept current and maintained by a diverse community Overall content quality is high Terms have unique IDs (URLs) and are “self describing” for people

2 October 2015Page 28 Wikitology 1.0 Structured Data Specialized Concepts (article titles) Generalized Concepts (category titles) Inter-category and Inter-article links as relations between concepts Article-Category links as relations between specialized and generalized concepts Un-Structured Data Article Text ( A way to map ontology terms to free text) Algorithms Algorithms to select, rank and aggregate concepts using the hybrid knowledge base

2 October 2015Page 29 Method 1 Query doc(s) similar to Cosine similarity Similar Wikipedia Articles Using Wikipedia Article Text and Categories to Predict Concepts Input

2 October 2015Page 30 Method 1 Query doc(s) similar to Cosine similarity Wikipedia Category Graph Similar Wikipedia Articles Using Wikipedia Article Text and Categories to Predict Concepts Input

2 October 2015Page 31 Method 1 Query doc(s) similar to Rank Categories 1.Links 2.Cosine similarity Cosine similarity Wikipedia Category Graph Similar Wikipedia Articles Using Wikipedia Article Text and Categories to Predict Concepts Input Output

2 October 2015Page 32 Method 2 Query doc(s) Similar to Cosine similarity Wikipedia Category Graph Using Spreading Activation on Category Links Graph to get Aggregated Concepts Input Ranked Concepts based on Final Activation Score Output Spreading Activation Input Function Output Function

2 October 2015Page 33 Method 3 Query doc(s) Similar To Ranked Concepts based on Final Activation Score Spreading Activation Threshold: Ignore Spreading Activation to articles with less than 0.4 Cosine similarity score Edge Weights: Cosine similarity between linked articles Wikipedia Article Links Graph Using Spreading Activation on Article Links Graph Node Input Function Node Output Function Output Input

2 October 2015Page 34 Wikitology 1.0 The system was evaluated by predicting the categories and article links of existing Wikipedia articles and comparing with the ground truth It was observed that Wikitology 1.0 system was able to predict the document topics and common concepts with high accuracy when the article concepts were well represented within Wikipedia

2 October 2015Page 35 * In Collaboration with Paul McNamee, John Hopkins University Applied Physics Laboratory Doc: FT (3/9/92)... Alan Turing, described as a brilliant mathematician and a key figure in the breaking of the Nazis' Enigma codes. Prof IJ Good says it is as well that British security was unaware of Turing's homosexuality, otherwise he might have been fired 'and we might have lost the war'. In 1950 Turing wrote the seminal paper 'Computing Machinery And Intelligence', but in 1954 killed himself... Turing_machine, Turing_test, Church_Turing_thesis, Halting_problem, Computable_number, Bombe, Alan_Turing, Recusion_theory, Formal_methods, Computational_models, Theory_of_computation, Theoretical_computer_science, Artificial_Intelligence IR Effectiveness Using Wikipedia Concepts base Base + rf Concepts + rf * In Collaboration with Paul McNamee, John Hopkins University Applied Physics Laboratory Preliminary work with TREC documents Case Study 2 Document Expansion with Wikipedia Derived Ontology Terms [21]*

2 October 2015Page 36 Semi-automated generation of Training data Persons, Locations and Events Experimenting with different feature sets Inter-article link labeling Results showing accuracy obtained using different feature sets Case Study 3 Named Entity Classification

2 October 2015Page 37 Problem: To determine whether various named people, organizations or relations from different documents refer to the same object in the world. For example, does the “Condoleezza Rice” mentioned in one document refer to the same person as the “Secretary Rice” from another? * In Collaboration with John Hopkins University Human Language Technology Center of Excellence Case Study 4 Cross Document Entity Co-reference Resolution [21]*

2 October 2015Page 38 Entity Document (EDOC) ABC LDC2000T44-E2 Webb Hubbell PER Individual NAM: "Hubbell" "Hubbells" "Webb Hubbell" "Webb_Hubbell" NOM: "Mr. " "friend" "income" PRO: "he" "him" "his",. abc's accountant after again ago all alleges alone also and arranged attorney avoid been b efore being betray but came can cat charges cheating circle clearly close concluded conspiracy cooperate counsel counsel's department did disgrace do dog dollars earned eightynine enough eva sion feel financial firm first four friend friends going got grand happening has he help him his hope house hubbell hubbells hundred hush income increase independent indict indicted indictme nt inner investigating jackie jackie_judd jail jordan judd jury justice kantor ken knew lady la te law left lie little make many mickey mid money mr my nineteen nineties ninetyfour not nothing now office other others paying peter_jennings president's pressure pressured probe prosecutor s questions reported reveal rock saddened said schemed seen seven since starr statement such tax taxes tell them they thousand time today ultimately vernon washington webb webb_hubbell were what's whether which white whitewater why wife years Entity documents capture information about entities extracted from documents, including mention strings, type and subtype, and text surrounding the mentions.

2 October 2015Page 39 Wikitology 2.0 Enhancements Structured Data Specialized Concepts (article titles) Generalized Concepts (category titles) Inter-category and Inter-article links as relations between concepts Article-Category links as relations between specialized and generalized concepts YAGO types (to identify entity type) Table with Disambiguation set (to identify highly confused entities) Aliases using Redirect pages Un-Structured Data Article Text Redirect titles (added to article text)

2 October 2015Page 40 Wikitology 2.0 Data Structures Lucene Index Concept Title + Redirected Titles (field) Article Text + Redirected Titles (field) RDF field with Entity Type (YAGO type) Graphs Category links graph Article links graph Article-Category links Tables Disambiguation Set derived from disambiguation pages

2 October 2015Page 41 Wikitology 2.0 Custom Query Front end The EDOC’s name mention strings Wikitology’s title field slightly higher weight to the longest mention, i.e., “Webb Hubbell” The EDOC type RDF Field: Yago Type Name mention strings + Contextual text Text (Wikitology Article Contents)

2 October 2015Page 42 Wikitology Features Article Vector for ABC LDC2000T44-E Webster_Hubbell Hubbell_Trading_Post_National_Historic_Site United_States_v._Hubbell Hubbell_Center Whitewater_controversy Category Vector for ABC LDC2000T44-E Clinton_administration_controversies American_political_scandals Living_people _births People_from_Arkansas Arkansas_politicians American_tax_evaders Arkansas_lawyers Each entity document is tagged by Wikitology, producing vectors of article and category tags. Note the clear match with a known person in Wikipedia.

2 October 2015Page 43 Features Derived from Wikitology 2.0 NameRangeTypeDescription APL20WAS{0,1}sim1 if the top article tags for the two entities are identical, 0 otherwise APL21WCS{0,1}sim1 if the top category tags for the two entities are identical, 0 otherwise APL22WAM[0..1]simThe cosine similarity of the medium length article vectors (N=5) for the two entities APL23WcM[0..1]simThe cosine similarity of the medium length category vectors (N=4) for the two entities APL24WAL[0..1]simThe cosine similarity of the long length article vectors (N=8) for the two entities APL31WAS2[0..1]simmatch of entities top Wikitology article tag, weighted by avg(score1,score2) APL32WCS2[0..1]simmatch of entities top Wikitology category tag, weighted by avg(score1,score2) APL26WDP{0,1}dissim1 if both entities are of type PER and their top article tags are different, 0 otherwise APL27WDD{0,1}dissim1 if the two top article tags are members of the same disambiguation set, 0 otherwise APL28WDO{0,1}dissim1 if both entities are of type ORG and their top article tags are different, 0 otherwise APL29WDP2[0..1]dissimMatch both entities are of type PER and their top article tags are different, weighted by 1- abs(score1-score2), 0 otherwise APL30WDP2[0..1]dissimMatch if both entities are of type ORG and their top article matches are different organizations, weighted by 1-abs(score1-score2), 0 otherwise Twelve features were computed for each pair of entities using Wikitology, seven aimed at measuring their similarity and five for measuring their dissimilarity.

2 October 2015Page 44 Evaluation Evaluation results for cross-document entity co- reference task using Wikitology features match TP rateFP ratePrecisionRecallF-Measure yes no

2 October 2015Page 45 Case Study 5 Feature Generation to Improve Information Retrieval Performance* Incorporating Generalized Concept Features in MORAG [69] search engine * Work being done during internship at RiverGlass Company

2 October 2015Page 46 Incorporating Wikitology based features in MORAG search engine MORAG Search Engine Concept features generated using Wikipedia (ESA) Feature Selection using pseudo- relevance feedback Merged Ranking of Concept scores and BOW scores MORAG Search Engine Concept features generated using Wikipedia (ESA) Feature Selection using pseudo- relevance feedback Merged Ranking of Concept scores and BOW scores

2 October 2015Page 47 Outline Introduction and MotivationIntroduction and Motivation Related WorkRelated Work Proposed WorkProposed Work TimelineTimeline Work ProgressWork Progress ConclusionConclusion

2 October 2015Page 48 Thesis Statement We can exploit Wikipedia and other related knowledge sources to automatically create knowledge about the world supporting a set of common use cases such as: Concept Prediction Information Retrieval Information Extraction

2 October 2015Page 49 Proposed Contributions 1.Developing a Novel Hybrid Knowledge base composed of structured and un-structured information extracted from Wikipedia and other related sources Wikitology 1.0 Wikitology 2.0

2 October 2015Page 50 Proposed Contributions 2.Developing Novel Application Specific Algorithms for exploiting the hybrid knowledge base Methods for Concept Prediction Ranking methods and Spreading Activation Co-reference Resolution Novel Entity representation and Hybrid Querying Information Retrieval Document Expansion, Generalized Concept Features augmentation

2 October 2015Page 51 Proposed Contributions 3.Task Based Evaluation of the system on common use-cases such as Concept Prediction, Information Retrieval and Information Extraction Metrics: Precision and Recall

2 October 2015Page 52 The End Thank you Questions?