Presentation is loading. Please wait.

Presentation is loading. Please wait.

>lingway█ >Lingway Fact Extractor (LFE)█ >Introduction >Goals Crossmarc / Lingway >Lingway adaptation of the NHLRT approach >Rule induction >(ongoing work)

Similar presentations


Presentation on theme: ">lingway█ >Lingway Fact Extractor (LFE)█ >Introduction >Goals Crossmarc / Lingway >Lingway adaptation of the NHLRT approach >Rule induction >(ongoing work)"— Presentation transcript:

1 >lingway█ >Lingway Fact Extractor (LFE)█ >Introduction >Goals Crossmarc / Lingway >Lingway adaptation of the NHLRT approach >Rule induction >(ongoing work)

2 >lingway█ >Introduction█ >LR, HLRT and NHLRT approaches >LR wrapper (Left-right) >set {,..., } of 2K delimiters >Rigid (the left and right delimiters and the order between them are unique) >HLRT (Head-left-right-tail) >Two additional elements (f. ex. and ) >NHLRT (Nested HLRT) >Less rigid approach (conditional rules) Kushmerick, N. Finite-state approaches to Web information extraction. In Proc. 3rd Summer Convention on Information Extraction, Rome, Italy 2002, Kushmerick, N. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence J. 118(1-2):15-68, special issue on Intelligent Internet Systems, 2000).

3 >lingway█ >Crossmarc architecture Constraints and Lingway goals█ >division of the process into NERC and FE >multilingualism of the FE >semi-automatic approach >reuse of XTIRP which formalism accepts disjunction, missing and repeated elements and free order >=>result is a much more flexible formalism that the one of the original NHLRT

4 >lingway█ >Lingway adaptation of the NHLRT approach█ >Named entities (NE) are already recognised >Aspects of the problem are: >to detect where starts a fact (the Head), >to detect where it ends (the Tail), >to select the relevant NE (= to drop non-relevant NE), >to produce the fact (the tuple) proper.

5 >lingway█ >LFE - WHISK█ > Relatively close to WHISK, which can be seen as an extension of Kushmerick systems using regular expressions including disjunctions >But LFE different because it does not use semantic or linguistic categories, and its concrete algorithm and general philosophy are different.

6 >lingway█ >Head and Tail█ >Recognition of Head and Tails (technique similar to NERC) >Ex. "Poste:", "Intitulé:" >(Important role of JOB_TITLE)

7 >lingway█ >Fact Extraction█

8 >lingway█ >Selecting / dropping elements█ >Relations between elements >F.ex. association between SCHEDULE and DURATION, >In this case, a DURATION without SCHEDULE could be marked as NONFACT ("dropped"), etc. >Testing of the context (previous and next NE)

9 >lingway█ >Extension of XTIRP formalism█ > NEXT, PREV > These operators allow to test conditions with respect to previous and next elements (NE, NUMEX and TIMEX), including types and attributes > COUNTER > Just a trivial counter (from 0 to...) > CURRENT_MARK > This operator allows to test if the current element is embedded in a given mark (notably for testing stressed fonts) >(generalisation) DYNAMIC VARIABLES

10 >lingway█ >Extension implementation█ > NEXT, PREV > Ongoing development > COUNTER > Implemented > CURRENT_MARK > Ongoing >(generalisation) DYNAMIC VARIABLES >Implemented

11 >lingway█ >Production of facts proper█ >Once NE are marked as belonging to a fact (fact#1, etc.) or as being "nonfacts" : >a simple XSLT program extracts the facts in the corresponding XML output format

12 >lingway█ >Calendar█ >First complete version 21st of July >Evaluation end of July


Download ppt ">lingway█ >Lingway Fact Extractor (LFE)█ >Introduction >Goals Crossmarc / Lingway >Lingway adaptation of the NHLRT approach >Rule induction >(ongoing work)"

Similar presentations


Ads by Google