Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.sti-innsbruck.at © Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at NLP Interchange Format José M. García.

Similar presentations


Presentation on theme: "Www.sti-innsbruck.at © Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at NLP Interchange Format José M. García."— Presentation transcript:

1 www.sti-innsbruck.at © Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at NLP Interchange Format José M. García

2 www.sti-innsbruck.at Outline What is NIF? Design requirements URI schemes NIF ontologies Use cases Relationship with ELRA Roadmap for NIF 2.0 Conclusions 2

3 www.sti-innsbruck.at What is NIF? Natural Language Processing Interchange Format NIF is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. Building blocks –URI scheme for identifying elements in texts –Ontology for describing common NLP terms Created and maintained by AKSW group of University of Leipzig, during the LOD2 EU project. Community project: http://persistence.uni-leipzig.org/nlp2rdf/http://persistence.uni-leipzig.org/nlp2rdf/ 3

4 www.sti-innsbruck.at NIF design requirements Compatibility with RDF Coverage Structural Interoperability Conceptual Interoperability Granularity Provenance and Confidence SimplicityScalability 4

5 www.sti-innsbruck.at URI schemes Text needs to be referenceable by URIs With URI references text can be used as resources in RDF statements NIF distinguishes: –Documents –Text of the document –Substrings of the text. URI scheme is an algorithm to create IDs for text and substrings URI elements –Document URI –Separator –Character indices 5

6 www.sti-innsbruck.at RFC 5147 Canonical URI scheme for NIF is based on RFC 5147 It standardizes fragment identifiers for text/plain media type 6 http://www.w3.org/DesignIssues/LinkedData.html

7 www.sti-innsbruck.at RFC 5147 Canonical URI scheme for NIF is based on RFC 5147 It standardizes fragment identifiers for text/plain media type 7 http://www.w3.org/DesignIssues/LinkedData.html http://www.w3.org/DesignIssues/LinkedData.html#char=0,26610

8 www.sti-innsbruck.at RFC 5147 Canonical URI scheme for NIF is based on RFC 5147 It standardizes fragment identifiers for text/plain media type 8 http://www.w3.org/DesignIssues/LinkedData.html http://www.w3.org/DesignIssues/LinkedData.html#char=0,26610 http://www.w3.org/DesignIssues/LinkedData.html#char=1206,1218

9 www.sti-innsbruck.at NIF Core Ontology Classes and properties to describe relation between –Documents –Text –Substrings –Corresponding URI schemes 9

10 www.sti-innsbruck.at NIF Core Ontology Additional classes and properties (unstable/testing) –More URI schemes –Text structure (words, sentences, paragraphs…) –Part of Speech (POS) –Annotations with Stanbol –Confidence 10

11 www.sti-innsbruck.at Workflows, Modularity and Extensibility of NIF Workflows for NLP integration –Normalization –Tokenization –Merge RDF annotations 11

12 www.sti-innsbruck.at Workflows, Modularity and Extensibility of NIF NIF ontology logical modules –Terminological model –Inference model –Validation model Vocabulary modules –FISE –ITS –OLiA –NERD –… 12

13 www.sti-innsbruck.at Workflows, Modularity and Extensibility of NIF Granularity profiles 13

14 www.sti-innsbruck.at ITS Use Case The Internationalization Tag Set 2.0 is a W3C working draft that is becoming a Recommendation. ITS standardizes HTML and XML attributes which can be used to annotate nodes with processing information for language service providers (i18n, l10n) ITS 2.0 RDF ontology was developed using NIF, including a round-trip conversion algorithm from ITS to NIF. NIF is expected to receive wide adoption by translation & language service providers ITS 2.0 RDF ontology provides properties which can be used to provide best practices for NLP annotations. 14

15 www.sti-innsbruck.at OLiA Use Case The Ontologies of Linguistic Annotation provide stable identifiers for morpho-syntactical annotation tag sets, so that NLP tools can use these ids for better interoperability. OLiA provides Annotation Models and a Reference Model, comprising more than 110 OWL ontologies for over 34 tag sets in 69 languages Features –Documentation –Flexible Granularity –Language Independence NIF provides two properties –nif:oliaIndividual (links a nif:String to an OLiA Annotation Model) –nif:oliaCategory (links to the Reference Model) 15

16 www.sti-innsbruck.at RDFaCE Use Case RDFa Content Editor is a rich text editor that supports WYSIWYM authoring including various views of the semantically enriched textual content. It combines results of different NLP APIs for automatic content annotation –Heterogeneous APIs access, URI generation and output data structure –Solution: server-side proxy, hard-coded input and connection of each API. NIF simplified the integration, adding an interoperability layer 16

17 www.sti-innsbruck.at What is ELRA? European Language Resources Association http://www.elra.info Effort to make available Language Resources (LR) for language engineering and to evaluate language engineering technologies. LR marketplace Related organizations –ELDA (ELRA’s operational body) –LREC conferences 17

18 www.sti-innsbruck.at What is ELRA? 18

19 www.sti-innsbruck.at Relationship with NIF Different objectives LR written resources (esp. Corpora) can be annotated with NIF for further interoperability and integration with NLP tools ADVANTAGE: Large test data collection to evaluate NLP tools DISADVANTAGE: Cost of LR (though there are free ones) 19

20 www.sti-innsbruck.at Roadmap for NIF 2.0 Release of NIF 1.0 –DONE (Nov 2009) Release of NIF 2.0 Draft –CURRENT effort on solving pending issues –Adoption in ITS 2.0 W3C (soon-to-be) Recommendation –NIF-Core ontology is becoming stable –RLOG - an RDF Logging Ontology –NIF Validator software available Release of NIF 2.0 Core Release of NIF 2.0 Extensions –ITS ontology, PROV ontology, Lemon Ontology, NERD, UIMA, MARL opinion ontology… 20

21 www.sti-innsbruck.at Conclusions NIF allows to integrate NLP tools using Linked Data Ongoing effort Many adopters and supporters –LOD2 EU project –Several W3C working groups –Named Entity Recognition and Disambiguation (NERD) –Ontologies of Linguistic Annotation (OLiA) –… 27 different implementations and use cases –Some available at http://persistence.uni-leipzig.org/nlp2rdf/http://persistence.uni-leipzig.org/nlp2rdf/ 21

22 www.sti-innsbruck.at © Copyright 2012 STI INNSBRUCK www.sti-innsbruck.at Thanks for your attention Questions? 22

23 www.sti-innsbruck.at References 1.http://persistence.uni-leipzig.org/nlp2rdf/http://persistence.uni-leipzig.org/nlp2rdf/ 2.Integrating NLP using Linked Data by Sebastian Hellmann, Jens Lehmann, Sören Auer, and Martin Brümmer in 12th International Semantic Web Conference, 21-25 October 2013, Sydney, AustraliaIntegrating NLP using Linked Data 23


Download ppt "Www.sti-innsbruck.at © Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at NLP Interchange Format José M. García."

Similar presentations


Ads by Google