Presentation is loading. Please wait.

Presentation is loading. Please wait.

Social Knowledge Mining

Similar presentations


Presentation on theme: "Social Knowledge Mining"— Presentation transcript:

1 Social Knowledge Mining
A systematic review of tools and technologies By Shayne WEerakoon

2 Social Knowledge What Is Social Knowledge?
Social Knowledge is the collective body of knowledge produced by your immediate community or social circle

3 How is Social Knowledge Shared?
The era of web 2.0 has bought about many changes in how internet is used namely, social media sites such as Facebook, twitter Volume of User Generated Content available on the net rises as a result of this Social knowledge shared rises proportionately to the volume of UGC Popularity of Question and Answer platforms rising as well Stackoverflow had a 600% rise in users over the past few years

4 Introduction to Knowledge Extraction
What is Knowledge Extraction? “The processing of natural language text and to retrieve occurrences of a particular class of objects or events and occurrences of relationships among them” – Russel & Norvig Primary Types of Information Extraction Extraction from unstructured sources Extraction from structured sources

5 Extraction From Unstructured Sources
There have been several attempts to extract knowledge from text leading may approaches These approaches can be distinctly grouped as follows Traditional Information Extraction Automatic Content Extraction(ACE) Ontology Based Information Extraction(OBIE)

6 Traditional Information Extraction
Borne out of the initial MUC conferences Compromises of five steps Named Entity Recognition Conference Resolution Template Element Construction Template Relation Construction Scenario Template Production

7 Named Entity Recognition
Identifies Proper Nouns Simplest task Provides more than 90% accuracy Domain Dependent Coference Resolution Relates noun-phrases in text to real world entities Identity of reference between markables Can be definite noun phrases, demonstrative noun phrases, proper names, appositives, sub–noun phrases that act as modifiers, pronouns, and so on

8 Template Element Construction
Extracts Information related to person or organization entities Draws evidence from anywhere in the text Basically adds descriptive information to NER results using CO Domain Dependent Template Relation Construction Identifies relations in the templates identified in previous task for example, an employee relationship between a person and a company, a family relationship between two persons. Central feature of almost any information extraction task

9 Scenario Template Construction
Uses all the results from previous tasks Extracts pre-specified event information Then, relates the event information to a particular entity involved in the event More difficult IE tasks – Only 60% accuracy, relative to 80% in humans

10 Automatic Content Extraction
Immediate Successor of traditional IE Similar to traditional IE has 4 steps. Data Annotation offered a unique approach to cross domain extraction Compromises of four steps(occurs simultaneously) Entity Detection and Tracking Relation Detection and Characterization

11 Entity Detection and Tracking
Focuses on the identification of entities – not just names All mentions of a entity are found and collected Relation Detection and Characterization Detects relations between pairs of previously detected entities Divides into 5 general relation Role, the role a person plays in an organization, Part, i.e., part-whole relationships, subtyped as Subsidiary, Part-Of, or Other At, location relationships,4. Near, to identify relative locations Near, to identify relative locations Social, social relations

12 ACE – Data Annotation Key Features of ACE over Traditional IE
This is the Entity Linking Task Establishes co-reference between entity mentions Produces both training and test data for common research and evaluation tasks Three types of data annotation: EDT Annotating – Tagging of all mentions of entities in the document. RDC Annotation – Identifying all relationships between entities. VDC Annotation – Identifying the events the previously identified entities participate in Issue is manual process – does not scale to the breadth of web

13 ACE – Conclusion Key Features of ACE over Traditional IE
This is the Entity Linking Task Establishes co-reference between entity mentions Produces both training and test data for common research and evaluation tasks Three types of data annotation: EDT Annotating – Tagging of all mentions of entities in the document. RDC Annotation – Identifying all relationships between entities. VDC Annotation – Identifying the events the previously identified entities participate in

14 Ontology Based Information Extraction (OBIE)
OBIE has emerged as another subfield in IE Ontologies play a crucial role in the IE process Ontologies are used in the information extraction process and the output is also generally an ontology. Ontologies are usually specific to the domain or which it is created. Features of OBIE’s: Process unstructured or semi-structured natural language text Output should be in ontology format Ontologies should supplement existing IE processes

15 Common methods of IE in OBIE
Linguistic Rules Using Regular Expressions Provides good results despite simplicity Have to manually read documents and create rules Gazeteer Lists The words to be recognized are provided to the system in the form of a list Web Based Search General idea behind this approach is using the web as a big corpus Partial Parse Trees Construct a semantically annotated parse tree for the text

16 The Proposed System

17 Conclusion

18 Questions?

19 Thank You!


Download ppt "Social Knowledge Mining"

Similar presentations


Ads by Google