Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measuring Complexity of Web Pages Using Gate

Similar presentations


Presentation on theme: "Measuring Complexity of Web Pages Using Gate"— Presentation transcript:

1 Measuring Complexity of Web Pages Using Gate
Prepared by: The Who

2 Subject1: Can more meaningful indicators be extracted from the resources (webpages), e.g. a more interesting complexity, diversity or even other like sentiment.

3 Complexity Definition: How to learn the features associated to the difficulty to understand the resources.

4 Our Vision To employ entities liked to diverse contexts as a base to determine the complexity of a Webpage by: Gathering sets of Webpages from different domains Annotating the complexity of the pages (Crowdsourcing) Obtaining the set of named entities on each page (Gate) Determining a complexity score for each entity based on which pages it appears (Centrality / text ranking / Entity authority metrics: how many times it appears in the page vs how many entities are in that page and what is the page complexity score) Employing the set of weighted entities to predict a score for new pages Correlate the outputs with the commonly employed sentence metrics

5 Proposed approach

6 Run Entity and Terms Recognition on a sample from the data set .
1. Create Datastore for the sample 1 3 2

7 1 2 2. Populate the sample on to the corpus & save it to the datastore.

8 3. Run the TermRaider (it is already contain the annieGazetteer for entity recognition )
1 2

9 4. Search for specific Annotation Type

10 5. Export the Terms and annotation set

11 Scoring Score the complexity of the entities
This score is based on the average complexity score of documents that the entity appears on. 2

12 Calculate the page based on the scores of the entities that appear in it
Score the complexity of the entities This score is based on the average complexity score of documents that the entity appears on.

13 Compare scores by the two methods
Site Vanilla Score Proposed Score 0.6 .475 0.796 0.568 .75 0.536 .45 0.52 .55 iswc2013_demo_36.html .375 0.504 .775 0.464 .6 0.48 .725 0.528 .5

14 Thank You! Gracias! Ευχαριστώ! Prepared by: The Who


Download ppt "Measuring Complexity of Web Pages Using Gate"

Similar presentations


Ads by Google