Presentation is loading. Please wait.

Presentation is loading. Please wait.

PANACEA - Y2 After the 2 nd Annual Review, 28 th February 2012, Barcelona 1.

Similar presentations


Presentation on theme: "PANACEA - Y2 After the 2 nd Annual Review, 28 th February 2012, Barcelona 1."— Presentation transcript:

1 PANACEA - Y2 After the 2 nd Annual Review, 28 th February 2012, Barcelona 1

2 Join together a number of advanced interoperable tools to build a platform/factory/production line that automates the stages involved in the –acquiring, processing and producing Language Resources required by MT and other Language Technologies Objectives

3 Partners WP1 – Management (UPF) WP3 – The Platform (UPF) WP4 – Corpus Acquisition & Annotation (ILSP) WP5 – Parallel corpus & derivatives (DCU) WP6 – Lexical Acquisition (UCAM) WP7 – Integration & resource evaluation (ILC) WP8 – Evaluation in industrial environment (LT) WP2 – Dissemination and Exploitation (ELDA)

4 Platform The PANACEA platform is an interoperability space based on tools, guidelines, a Common Interface definition, and a “Travelling Object” specification Tools: Taverna, BioCatalogue, myExperiment, Soaplab Common Interface: WS interoperability Travelling Object: XCES and GrAF Documentation (video tutorials, how-tos, deliverables, etc. at http://www.panacea-lr.eu) 4

5 Tools SOAPLAB 2 (SOAP) - Web application for deploying command line tools as WS - No coding needed! Metadata only - Services deployed by ILSP at http://nlp.ilsp.gr/ws/ http://nlp.ilsp.gr/ws/ - Web application for deploying command line tools as WS - No coding needed! Metadata only - Services deployed by ILSP at http://nlp.ilsp.gr/ws/ http://nlp.ilsp.gr/ws/ TAVERNA - Open source desktop application - Imports Soaplab and other types of WS - Allows for combination of WS in workflows (http://www.taverna.org.uk/)http://www.taverna.org.uk/ - Open source desktop application - Imports Soaplab and other types of WS - Allows for combination of WS in workflows (http://www.taverna.org.uk/)http://www.taverna.org.uk/ BioCatalogue -Web application for registering and documenting WSs http://registry.elda.orghttp://registry.elda.org -Search function - Auto-checks web services status - Annotations: tags, categories, etc. -Web application for registering and documenting WSs http://registry.elda.orghttp://registry.elda.org -Search function - Auto-checks web services status - Annotations: tags, categories, etc. Web Services Workflow editor Registry Social network myExperiment - Share workflows, files, data, etc. - Share opinions and comments, create work groups, etc. http://myexperiment.elda.org http://myexperiment.elda.org - Share workflows, files, data, etc. - Share opinions and comments, create work groups, etc. http://myexperiment.elda.org http://myexperiment.elda.org 5

6 Three levels of interoperability: –COMMUNICATION PROTOCOLS: Soap, Rest –DATA –PARAMETERS Format N Tool A Format M Tool B Format L Tool C Format N Tool A empty Tool B empty Tool C Interoperability Tool B does not “understand” format N! All tools understand the previous format Tool A Tool B ABCDABCD ABCDABCD Tool A Tool B YTQZYTQZ ABCDABCD 6

7 Travelling Object The Travelling Object (TO) is the common data and metadata format used in PANACEA to make components understand each other (syntactic interoperability) First TO for annotations up to tagging and lemmatization –Based on XCES (XML files with p, s, and t elements) –Tools: formatConverters and stylesheets Second TO for everything else (NER, DepParsing, etc.) –Based on GrAF (standoff annotation) –One file for primary data –One file for each annotation layer 7

8 Common Interface A Common Interface (CI) defines the mandatory parameters for every type of WS: http://panacea-lr.eu/en/info-for-professionals/documents/ http://registry.elda.org 8

9 Soaplab Web Services 28 Corpus Acquisition and Annotation Web Services NLP WS’s focusing on sentence splitting, tokenization, tagging, lemmatization and parsing, e.g: –EN, FR: Berkeley tagger and parser (DCU) –ES: UPF tools, Freeling; IT: ILC’s DESR, Freeling –DE and EL: LT’s and ILSP’s in-house tools WS’s for conversion from and to PANACEA’s Travelling Object (@UPF and ILC) WS’s for alignment of parallel data (@DCU)

10 10 Corpus Acquisition WS Focused Bilingual Crawler (FBC) –Documentation: http://registry.elda.org/services/127http://registry.elda.org/services/127 –Test at http://nlp.ilsp.gr/soaplab2-axis/#ilsp.ilsp_bilingual_crawl_rowhttp://nlp.ilsp.gr/soaplab2-axis/#ilsp.ilsp_bilingual_crawl_row –Sample topic definition for crawling EN-FR pages in the Environment domain http://nlp.ilsp.gr/panacea/testinput/bilingual/ENV_topics/ENV_EN_FR_topic.t xt http://nlp.ilsp.gr/panacea/testinput/bilingual/ENV_topics/ENV_EN_FR_topic.t xt –Seed URL for crawling EN-FR ENV data http://nlp.ilsp.gr/panacea/testinput/bilingual/ENV_EN_FR_greenfacts.txt http://nlp.ilsp.gr/panacea/testinput/bilingual/ENV_EN_FR_greenfacts.txt Focused Monolingual Crawler (FMC) –Documentation: http://registry.elda.org/services/160http://registry.elda.org/services/160 –Test at http://nlp.ilsp.gr/soaplab2-axis/#ilsp.ilsp_fmc_rowhttp://nlp.ilsp.gr/soaplab2-axis/#ilsp.ilsp_fmc_row –Topic definition for crawling EN ENV data http://nlp.ilsp.gr/panacea/testinput/monolingual/ENV_topics/ENV_EN_topic. txt http://nlp.ilsp.gr/panacea/testinput/monolingual/ENV_topics/ENV_EN_topic. txt –List of seed URLs for crawling EN ENV http://nlp.ilsp.gr/panacea/testinput/monolingual/ENV_seeds/ENV_EN_seeds. txt http://nlp.ilsp.gr/panacea/testinput/monolingual/ENV_seeds/ENV_EN_seeds. txt

11 11 Taverna Workflow Demo How can I align crawled data? Search for a DCU hosted alignment service at http://myexperiment.elda.org/workflows?que ry=alignhttp://myexperiment.elda.org/workflows?que ry=align

12 12 Corpus Annotation WS ILSP –Documentation: http://registry.elda.org/services/180http://registry.elda.org/services/180 –Test at http://nlp.ilsp.gr/soaplab2-axis/#ilsp.ilsp_nlp_rowhttp://nlp.ilsp.gr/soaplab2-axis/#ilsp.ilsp_nlp_row –Sample input: http://nlp.ilsp.gr/nlp/examples/5345.txthttp://nlp.ilsp.gr/nlp/examples/5345.txt ILC DESR (dependency parser) –Workflow: http://myexperiment.elda.org/workflows/24http://myexperiment.elda.org/workflows/24


Download ppt "PANACEA - Y2 After the 2 nd Annual Review, 28 th February 2012, Barcelona 1."

Similar presentations


Ads by Google