Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by: Hassan Sayyadi

Similar presentations


Presentation on theme: "Presented by: Hassan Sayyadi"— Presentation transcript:

1 Presented by: Hassan Sayyadi
Annotation Presented by: Hassan Sayyadi

2 Outline What is annotation? Why use annotation? Crawler
Annotation model Annotation methods Our Implementation

3 Outline What is annotation? Why use annotation? Crawler
Annotation model Annotation methods Our Implementation

4 What is annotation? People make notes to themselves in order to preserve ideas that arise during a variety of activities The purpose of these notes is often to summarize, criticize, or emphasize specific phrases or events One powerful use of annotations is locating items that have been subjectively found by others to match certain criteria

5 Outline What is annotation? Why use annotation? Crawler
Annotation model Annotation methods Our Implementation

6 Why use annotation? To have the world knowledge at one's finger tips seems possible. The Internet is the platform for information. Unfortunately most of the information is provided in an unstructured and non-standardized form.

7 Why use annotation? (continue)

8 Outline What is annotation? Why use annotation? Crawler
Annotation model Annotation methods Our Implementation

9 Crawler A crawler is a program which traverses the Internet following these links from one page to the next.

10 Focused crawler Not all the Internet knowledge is required for every query. This assumption seems reasonable because most people work on a restricted domain and do not need the knowledge of the whole Internet Searching the whole Internet in this case is very inefficient and expensive. Free texts in the Internet contain various information in diverse domains.

11 Focused crawler (continue)
The focus can be achieved by examining keywords Problems: “Understanding“ the semantic of document Extremely focusing on one topic Another way to focus is the Internet connectivity structure

12 Outline What is annotation? Why use annotation? Crawler
Annotation model Annotation methods Our Implementation

13 Annotation models Mark in web page Example:
SUT is one of the largest engineering schools in the Islamic Republic of Iran <university>SUT</university> is one of the largest universities in the <country>Islamic Republic of Iran</country>

14 Annotation models (continue)
Generate RDF Example: SUT is one of the largest engineering schools in the Islamic Republic of Iran <rdf:Description rdf:about=" <rdf:type>university</rdf:type> <SHARIF:be_in rdf:resource=" </rdf:Description> <rdf:Description rdf:about=" <rdf:type>Country</rdf:type>

15 Outline What is annotation? Why use annotation? Crawler
Annotation model Annotation methods Our Implementation

16 Annotation methods Manually Semi-automatically Automatically
Automatic semantic annotation for the natural language sentences in these pages is a daunting task and we are often forced to do it manually or semi-automatically using handwritten rules

17 Semi-automatic annotation
assumptions: vocabulary set is limited word usage has patterns semantic ambiguities are rare terms and jargon of the domain appear frequently

18 Semi-automatic annotation (continue)

19 Semi-automatic annotation (continue)
Example I go to Shanghai Link structure is more like a RDF graph

20 Semi-automatic annotation (continue)
Phases: Training Generation Operations: Word-Conceptualization Link-folding Relationalization

21 Semi-automatic annotation (continue)
Example sentence

22 Word-conceptualization
Its function is to annotate open words as concepts in the sentence to form the skeleton of the initial empty RDF graph and mark close words for further operation context vector: <polo, NN, Dmu, Mp>

23 Link-folding Closed words with their links representing semantic relations can be seen as word usage patterns. context vector: <with, IN, Mp, Js, POLO, EDGE>

24 Relationalization Semantic relation can also be implied by a link that directly connects two concepts in the link structure. context vector: <MVa, REFINE,ENOUGH>

25 The accuracy of concepts and relations about different algorithm

26 Automatic annotation

27 Source preprocessing Document Object Model (DOM) Text Model
Layout Model NLP Model

28 Information Identification
Operators perform extraction actions on document access models Retrieval, Check, Execute Strategies build operator sequences according to user time and quality requirements Source Description

29 Ontology population The final stage of the overall process is to decide which hypothesis represents the extracted information to insert into the ontology The module simulates insertions and calculates the cost according to the number of new instance creations, instance modifications or inconsistencies found

30 Outline What is annotation? Why use annotation? Crawler
Annotation model Annotation methods Our Implementation

31 Our implementation Crawler: Crawl all link that contains: sharif.ir
sharif.edu sharif.ac.ir

32 Our implementation Source pre-processing Html to text Additional
text = text.replaceAll("\n", "*_newline_*"); text = text.replaceAll("\\<script.*?\\</script\\>", ""); text = text.replaceAll("\\<style.*?</style.*\\>", ""); text = text.replaceAll("<\\!--.*?--\\>", ""); text = text.replaceAll("\\<.*?\\>", ""); text = text.replaceAll(" ", " "); text = text.replaceAll("<", "<"); text = text.replaceAll("\\*_newline_\\*", "\n"); Additional text = text.replaceAll("\n(\n|| )*\n","."); text = text.replaceAll(",", " and ");

33 Our implementation Information extraction: JMontyLingua
SUT is one of the largest engineering schools in the Islamic Republic of Iran ("be" "SUT" "one" "of largest engineering school" "in Islamic Republic" "of Iran")

34 Our implementation JMontyLingua problem:
SUT has computer, mechanic and electric engineering departments ("have" "SUT" "computer mechanic and electric engineering departments") ("have" "SUT" "computer and mechanic and electric engineering departments")

35 Our inplementation ("be" "SUT" “university" "in Islamic Republic" "of Iran") => ("be" "SUT" “university" "in Islamic Republic of Iran") =>SUT,be,university & SUT,be_in,Islamic Republic of Iran <rdf:Description rdf:about=" <rdf:type>university</rdf:type> <SHARIF:be_in rdf:resource=" </rdf:Description>

36 Any question?


Download ppt "Presented by: Hassan Sayyadi"

Similar presentations


Ads by Google