Social Knowledge Mining

Slides:



Advertisements
Similar presentations
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Advertisements

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart.
Human Language Technologies. Issue Corporate data stores contain mostly natural language materials. Knowledge Management systems utilize rich semantic.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
ITTL.ppt-1 Information Technology & Telecommunications Laboratory Semantic Technologies Applied to FOIA Review William Underwood Partnerships in Innovation:
 Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Semantic on the Social Semantic Desktop.
Semantic Technologies & GATE NSWI Jan Dědek.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
Ontology based Information Extraction
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Semantic Web Course - Semantic Annotation
Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ
March, 2007RCO LLC, RCO Text Analysis Technologies for information extraction and business intelligence We can tell you everything about.
Using Semantic Relations to Improve Information Retrieval
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Role of Metadata in dissemination of census data Regional Seminar on dissemination and spatial analysis of census data, Nairobi, September, 2010.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Data mining in web applications
Automatically Labeled Data Generation for Large Scale Event Extraction
 Corpus Formation [CFT]  Web Pages Annotation [Web Annotator]  Web sites detection [NEACrawler]  Web pages collection [NEAC]  IE Remote.
Object-Oriented Analysis and Design
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Presented by: Hassan Sayyadi
VELTI Evaluation Methodology
E-Commerce Theories & Practices
University of Computer Studies, Mandalay
CSCE 590 Web Scraping – Information Retrieval
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Introduction to Information Extraction
Clustering Algorithms for Noun Phrase Coreference Resolution
RichAnnotator: Annotating rich (XML-like) documents
Global Enterprise Search
Topic Oriented Semi-supervised Document Clustering
Automatic Detection of Causal Relations for Question Answering
Lecture 13 Information Extraction
How to publish in a format that enhances literature-based discovery?
Block Matching for Ontologies
Text Mining & Natural Language Processing
Web Mining Department of Computer Science and Engg.
Automatic Extraction of Hierarchical Relations from Text
Text Mining & Natural Language Processing
CS246: Information Retrieval
PASSI (Process for Agent Societies Specification and Implementation)
Presentation transcript:

Social Knowledge Mining A systematic review of tools and technologies By Shayne WEerakoon

Social Knowledge What Is Social Knowledge? Social Knowledge is the collective body of knowledge produced by your immediate community or social circle

How is Social Knowledge Shared? The era of web 2.0 has bought about many changes in how internet is used namely, social media sites such as Facebook, twitter Volume of User Generated Content available on the net rises as a result of this Social knowledge shared rises proportionately to the volume of UGC Popularity of Question and Answer platforms rising as well Stackoverflow had a 600% rise in users over the past few years

Introduction to Knowledge Extraction What is Knowledge Extraction? “The processing of natural language text and to retrieve occurrences of a particular class of objects or events and occurrences of relationships among them” – Russel & Norvig Primary Types of Information Extraction Extraction from unstructured sources Extraction from structured sources

Extraction From Unstructured Sources There have been several attempts to extract knowledge from text leading may approaches These approaches can be distinctly grouped as follows Traditional Information Extraction Automatic Content Extraction(ACE) Ontology Based Information Extraction(OBIE)

Traditional Information Extraction Borne out of the initial MUC conferences Compromises of five steps Named Entity Recognition Conference Resolution Template Element Construction Template Relation Construction Scenario Template Production

Named Entity Recognition Identifies Proper Nouns Simplest task Provides more than 90% accuracy Domain Dependent Coference Resolution Relates noun-phrases in text to real world entities Identity of reference between markables Can be definite noun phrases, demonstrative noun phrases, proper names, appositives, sub–noun phrases that act as modifiers, pronouns, and so on

Template Element Construction Extracts Information related to person or organization entities Draws evidence from anywhere in the text Basically adds descriptive information to NER results using CO Domain Dependent Template Relation Construction Identifies relations in the templates identified in previous task for example, an employee relationship between a person and a company, a family relationship between two persons. Central feature of almost any information extraction task

Scenario Template Construction Uses all the results from previous tasks Extracts pre-specified event information Then, relates the event information to a particular entity involved in the event More difficult IE tasks – Only 60% accuracy, relative to 80% in humans

Automatic Content Extraction Immediate Successor of traditional IE Similar to traditional IE has 4 steps. Data Annotation offered a unique approach to cross domain extraction Compromises of four steps(occurs simultaneously) Entity Detection and Tracking Relation Detection and Characterization

Entity Detection and Tracking Focuses on the identification of entities – not just names All mentions of a entity are found and collected Relation Detection and Characterization Detects relations between pairs of previously detected entities Divides into 5 general relation Role, the role a person plays in an organization, Part, i.e., part-whole relationships, subtyped as Subsidiary, Part-Of, or Other At, location relationships,4. Near, to identify relative locations Near, to identify relative locations Social, social relations

ACE – Data Annotation Key Features of ACE over Traditional IE This is the Entity Linking Task Establishes co-reference between entity mentions Produces both training and test data for common research and evaluation tasks Three types of data annotation: EDT Annotating – Tagging of all mentions of entities in the document. RDC Annotation – Identifying all relationships between entities. VDC Annotation – Identifying the events the previously identified entities participate in Issue is manual process – does not scale to the breadth of web

ACE – Conclusion Key Features of ACE over Traditional IE This is the Entity Linking Task Establishes co-reference between entity mentions Produces both training and test data for common research and evaluation tasks Three types of data annotation: EDT Annotating – Tagging of all mentions of entities in the document. RDC Annotation – Identifying all relationships between entities. VDC Annotation – Identifying the events the previously identified entities participate in

Ontology Based Information Extraction (OBIE) OBIE has emerged as another subfield in IE Ontologies play a crucial role in the IE process Ontologies are used in the information extraction process and the output is also generally an ontology. Ontologies are usually specific to the domain or which it is created. Features of OBIE’s: Process unstructured or semi-structured natural language text Output should be in ontology format Ontologies should supplement existing IE processes

Common methods of IE in OBIE Linguistic Rules Using Regular Expressions Provides good results despite simplicity Have to manually read documents and create rules Gazeteer Lists The words to be recognized are provided to the system in the form of a list Web Based Search General idea behind this approach is using the web as a big corpus Partial Parse Trees Construct a semantically annotated parse tree for the text

The Proposed System

Conclusion

Questions?

Thank You!