28/02/02-01/03/02 4 th Meeting Athens ENERC v.2. 28/02/02-01/03/02 4 th Meeting Athens Updates Change in early tokenisation: identification of words now.

Slides:



Advertisements
Similar presentations
© NCSR, Paris, December 5-6, 2002 WP1: Plan for the remainder (1) Ontology Ontology  Enrich the lexicons for the 1 st domain based on partners remarks.
Advertisements

Modeling the Evolution of Product Entities “Newer Model" Feature on Amazon Paper ID: sp093 1.Product search engine ranking 2.Recommendation systems 3.Comparing.
Maurice Hermans.  Ontologies  Ontology Mapping  Research Question  String Similarities  Winkler Extension  Proposed Extension  Evaluation  Results.
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 6 1 Microsoft Office Excel 2003 Tutorial 6 – Working With Multiple Worksheets.
XML Schema Matthias Hauswirth. Agenda 4 W3C Process 4 XML Schema Requirements 4 The Specifications 4 Schema Tools.
Internationalization of Java Platform Presenter: Ataru Nakazawa Advisor: Xiaoping Jia Date: January 23, 2004.
Knowledge-Based NLP and the Semantic Web Sergei Nirenburg Institute for Language and Information Technologies University of Maryland Baltimore County Workshop.
Lecture 3: Computer Performance
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Device 1Device 2Device 3 Display, inch (cm) Resolution, pixels 1136x x 1080 Weight, gr Processor Dual core, 1,4 GHz Quad core, 2.5GHz Quad core,
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
Tutorial 11 Installing, Updating, and Configuring Software
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Final Review 31 October WP2: Named Entity Recognition and Classification Claire Grover University of Edinburgh.
Presentation Handout EDBA – Module 8 Information Technology 21 st December 2014 By K.M.Prashanthan.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 10: File-System Interface.
1 Regular Expressions CIS*2450 Advanced Programming Techniques Material for this lectures has been taken from the excellent book, Mastering Regular Expressions,
Section 2 Software.
Towards Drafting a Risk Ontology based on the IRIS Risk Glossary SUMMER ACADEMY Sep 1 st – Sep 4 th 2009 Nick Bassiliades, Dimitris Vrakas Logic Programming.
20 21 Remote Wipe.
Tommie Curtis SAIC January 17, 2000 Open Forum on Metadata Registries Santa Fe, NM SDC JE-2023.
27/03/01CROSSMARC kick-off meeting LTG Background XML-based Processing –Several years of experience in developing XML-based software –LT XML Tools –Pipeline.
Maths. Key Instant Recall Facts A ‘true’ number fact is a fact that can be recalled instantly. There is no calculation time at all. You need to know it.
©2012 Microsoft Corporation. All rights reserved..
FNERC OVERVIEW 05/12/2002. Lingway, of December 2002 FNERC : introduction Lingway entered the project while CDC had already worked on FNERC Lingway.
Customer Order Order Number Date Cust ID Last Name First Name State Amount Tax Rate Product 1 ID Product 1 Description Product 1 Quantity Product 2 ID.
Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.
05/03/03-06/03/03 7 th Meeting Edinburgh Naïve Bayes Fact Extractor (NBFE) v.1.
Microsoft Excel By Tom Osti. What is Excel? Microsoft Excel (full name Microsoft Office Excel) is a spreadsheet-application written and distributed by.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
Sheffield -- Victims of Mad Cow Disease???? Or is it really possible to develop a named entity recognition system in 4 days on a surprise language with.
© NCSR, Frascati, July 18-19, 2002 WP1: Plan for the remainder (1) Ontology Ontology  Use of PROTÉGÉ to generate ontology and lexicons for the 1 st domain.
MedKAT Medical Knowledge Analysis Tool December 2009.
ICDCRome November 2001CROSSMARC Third meeting French NERC (first version and results) CROSSMARC Project IST Third meeting Rome November 2001.
Microsoft Office 2013 ®® Calculating Data with Formulas and Functions.
DATA TYPES, VARIABLES AND CONSTANTS. LEARNING OBJECTIVES  Be able to identify and explain the difference between data and information  Be able to identify,
Power Designer n See course web page for additional information on using Power Designer n Business rules – Come from a description of activities – Example.
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
20 21 Remote Wipe.
Institute of Informatics & Telecommunications NCSR “Demokritos” Spidering Tool, Corpus collection Vangelis Karkaletsis, Kostas Stamatakis, Dimitra Farmakiotou.
A CRF-BASED NAMED ENTITY RECOGNITION SYSTEM FOR TURKISH Information Extraction Project Reyyan Yeniterzi.
Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.
WP1: Plan for the remainder (1) Ontology –Finalise ontology and lexicons for the 2 nd domain (RTV) Changes agreed in Heraklion –Improvement to existing.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
5 th -6 th December th Meeting Paris WP2: NERC.
WP2: Hellenic NERC Vangelis Karkaletsis, Dimitra Farmakiotou Paris, December 5-6, 2002 Institute of Informatics & Telecommunications NCSR “Demokritos”
IN THIS Slide show YOU WILL LEARN ABOUT ALL VERSIONS OF "MS OFFICE"
All about the program: Microsoft Office Word.
Microsoft Office Access 2010 Lab 3
Basic 1960s It was designed to emphasize ease of use. Became widespread on microcomputers It is relatively simple. Will make it easier for people with.
 Corpus Formation [CFT]  Web Pages Annotation [Web Annotator]  Web sites detection [NEACrawler]  Web pages collection [NEAC]  IE Remote.
Basic Database Concepts
Institute of Informatics & Telecommunications
Right Guide Trading System
Integrating Word, Excel, and Access
Choice of Programming Language
Entity Based Staging SQL Server 2012 Tyler Graham
SDD 1.1 General Direction Proposal
Chapter 3: Lexical Analysis
درس تطبيقي مادة التربية الفنية للصف الرابع الابتدائي
Lecture 13 Information Extraction
Maritime Resource Names (MRN) concept S-100 WG TSM September 2017
Subtracting Real Numbers
3.1 Basic Concept of Directory and Sub-directory
CS246: Information Retrieval
Презентация құру тәсілдері
Шаттық шеңбері.
Target Language English Created by Jane Driver.
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

28/02/02-01/03/02 4 th Meeting Athens ENERC v.2

28/02/02-01/03/02 4 th Meeting Athens Updates Change in early tokenisation: identification of words now a two stage process. Updated lexical resources based on new version of LexiconEn.xml. Current version does not include statistical classifier or POS tagger. Non-GUI version of NERC-based Demarcator added at end of pipeline.

28/02/02-01/03/02 4 th Meeting Athens egrep -v '^<\!DOCTYPE' \ | $EN/SCRIPTS/entsout.pl \ | $bin/fsgmatch -q ".*" $EN/GRAM/char/pretok.gr \ | $EN/SCRIPTS/openangle.pl \ | $bin/xmlperl2 $EN/SCRIPTS/findels-s.rule \ | $bin/xmlperl2 $EN/SCRIPTS/nobold.rule \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/tok.gr \ | $bin/xmlperl $EN/SCRIPTS/dels.rule \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/numbers.gr \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/numex-sf.gr \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/timex.gr \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/prodex-ll.gr \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/prodex-sf.gr \ | $bin/fsgmatch -q ".*/.[PROC='yes']" $EN/GRAM/xml/attribex.gr \ | $bin/xmlperl $EN/SCRIPTS/delete-tags.rule \ | $EN/SCRIPTS/tidyup-dem.pl \ > $EN/ddiri/current.xhtml wish8.4 $EN/Demarc/CROSSMARC_Demarcation_Tool.tcl -source $EN/ddiri -destination $EN/ddiro -gui 0 -language english > /dev/null cat $EN/ddiro/current.xhtml \ | $bin/xmlperl $EN/SCRIPTS/del-npno.rule ENERC Pipeline

28/02/02-01/03/02 4 th Meeting Athens Name Matching LexiconEn.lex derived from LexiconEn.xml: each synonym of a concept becomes a lexical entry. 1 st stage of name matching performs lexical look-up to find matches. (Case insensitive and entities such as ® ignored.) 2 nd stage of name matching uses a fuzzy matching program. This uses a list of target strings also derived from synonyms in LexiconEn.xml. Name matching operates on entities and encodes the ontology ID as the value of an attribute. Can be performed after NERC, Demarcation or FE.

28/02/02-01/03/02 4 th Meeting Athens Normalisation We use an xmlperl program to match particular facts containing certain NUMEXes. e.g 1.7 GHz Perl action in rule performs normalisation using a list of conversion rates. Normalised version appears as attribute value on NUMEX which can then be inherited by the fact. Normalisation could be performed before FE but fact type is useful in determining the conversion.

28/02/02-01/03/02 4 th Meeting Athens Evaluation Results: just NERC PrecisionRecallF-measure MANUF MODEL SOFT_OS PROCESSOR SPEED CAPACITY LENGTH RESOLUTION MONEY PERCENT WEIGHT DATE DURATION TIME

28/02/02-01/03/02 4 th Meeting Athens Evaluation Results: NERC+Demarcator PrecisionRecallF-measure MANUF MODEL SOFT_OS PROCESSOR SPEED CAPACITY LENGTH RESOLUTION MONEY PERCENT WEIGHT DATE DURATION TIME0.?

28/02/02-01/03/02 4 th Meeting Athens

28/02/02-01/03/02 4 th Meeting Athens Microsoft Office LexiconEn.lex Microsoft Office XP :: SOFT OV-d0e594 Windows XP :: OS OV-d0e522 W98 OS OV-d0e521 W 98 :: OS OV-d0e521 Win98 OS OV-d0e521 Win 98 :: OS OV-d0e521 Microsoft SOFT OV-d0e594 Office SOFT OV-d0e594 XP SOFT OV-d0e OS OV-d0e521 Win OS OV-d0e521 W OS OV-d0e521 R