Generality and Openness in Enabling Methodologies for Morphology and Text Processing Anssi Yli-Jyrä Department of General Linguistics, University of Helsinki.

Slides:



Advertisements
Similar presentations
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Advertisements

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
1/32 Assignments Basic idea is to choose a topic of your own, or to take a study found in the literature Report is in two parts –Description of problem.
How do we work in a virtual multilingual classroom? A virtual multilingual classroom with Moodle and Apertium Cultural and Linguistic Practices in the.
Tools and resources Summary of working group discussion.
Building bridges for chemical information Interoperability and the Blue Obelisk Noel M. O’Boyle, et a lot of al The Blue Obelisk is a group of people and.
Two Broad Categories of Software
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
1 Using Scalable and Secure Web Technologies to Design Global Format Registry Muluwork Geremew, Sangchul Song and Joseph JaJa Institute for Advanced Computer.
1/13 Parsing III Probabilistic Parsing and Conclusions.
6/25/2015AEB/Yleisesittely Språkbanken i Finland Kielipankki Language Bank of Finland Nordic Treebank Network Fefor, September 17, 2003.
Brad A. Myers, CMU Pilot: Exploratory Programming for Interactive Behaviors: Unleashing Interaction Designers’ Creativity Brad Myers, Stephen Oney, John.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
CSC Grid Activities Arto Teräs HIP Research Seminar February 18th 2005.
An Integrated Solution for Web-based Mathematical Expression Inputting Wei Su Department of Computer Science, Lanzhou University, PRC Department of Computer.
Web Development Using ASP.NET CA – 240 Kashif Jalal Welcome to week – 1 of…
Mapping Physical Formats to Logical Models to Extract Data and Metadata Tara Talbott IPAW ‘06.
Distributed Systems: Client/Server Computing
Version 4 for Windows NEX T. Welcome to SphinxSurvey Version 4,4, the integrated solution for all your survey needs... Question list Questionnaire Design.
INTRODUCTION TO CLOUD COMPUTING Cs 595 Lecture 5 2/11/2015.
ICT work programme ICT 17 Cracking the language barrier Aleksandra Wesolowska Unit G.3 - Data Value Chain.
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
Publish Your Work BIM Curriculum 04. Topics  External Collaboration  Sharing the BIM model  Sharing Documents  Sharing the 3D model  Reviewing 
Leveraging Oracle Data for Web- Based Reporting Northern California Oracle Users Group May 2001.
Chapter 4 – Requirements Engineering
Structure of Study Programmes
Computational Investigation of Palestinian Arabic Dialects
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Dennis Hoppe (HLRS) ATOM: A near-real time Monitoring.
A Web Application for Customized Corpus Delivery Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science Vassar College USA.
Structure of Study Programmes Bachelor of Computer Science Bachelor of Information Technology Master of Computer Science Master of Information Technology.
Roadmap for Language Resources and Evaluation in a Multilingual Environment Minority Languages in the African Context Justus Roux Centre for Language and.
EGrid Software Packages Overview. EGrid Introduction Egrid Introduction : A description of the main software packages EGrid Inside : A detailed description.
Ontology Summit2007 Survey Response Analysis -- Issues Ken Baclawski Northeastern University.
FIRE – GENI collaboration workshop Sep 2015 Washington.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
Study on How to Improve the Quality of Official Statistics and Provide Accurately Categorized Data SAFE Shanghai Branch Deputy Director-General Lv Jinzhong.
From E-Content to E-Learning in Computational Linguistics Localisation of Teaching materials for less processed languages Kiril Simov *, Petya Osenova.
Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.
1 Advanced Software Architecture Muhammad Bilal Bashir PhD Scholar (Computer Science) Mohammad Ali Jinnah University.
1 24 September BREAKOUT :30 1)Review of Metadata Standards Directory (DCC version and GitHub) 2)Introduction of Metadata Standards Catalog.
IST Programme - Key Action III Semantic Web Technologies in IST Key Action III (Multimedia Content and Tools) Hans-Georg Stork CEC DG INFSO/D5
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Scenarios for a Learning GRID Online Educa Nov 30 – Dec 2, 2005, Berlin, Germany Nicola Capuano, Agathe Merceron, PierLuigi Ritrovato
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
ICT TOOLS AND SOCIETY INVOLVEMENT AMONG THE EUPAN NETWORK HIGHLIGHTS FROM THE SURVEY RESULTS TANYA CHETCUTI AND MARCO FICHERA - WORKSHOP EUROPEAN COMMISSION.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
interactive logbook Paul Kiddie, Mike Sharples et al. The Development of an Application to Enhance.
Enterprise Solutions Chapter 10 – Enterprise Content Management.
 Programming - the process of creating computer programs.
Open Source & Interoperability Profit Proprietary Closed Free Collaborative Open.
Comprehensive Project Management Solutions with the.NET Server family.
Introduction A field survey of Dutch language resources has been carried out within the framework of a project launched by the Dutch Language Union (Nederlandse.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
© 2005 NORTHROP GRUMMAN CORPORATION 2 11 Jan 06 Northrop Grumman Private / Proprietary Level I Terminology Service Bureau Vision Processes  Procedures.
Knowledge Support for Modeling and Simulation Michal Ševčenko Czech Technical University in Prague.
Ontologies for the Semantic Web Prepared By: Tseliso Molukanele Rapelang Rabana Supervisor: Associate Professor Sonia Burman 20 July 2005.
1 February 2012 ILCAA, TUFS, Tokyo program David Nathan and Peter Austin Hans Rausing Endangered Languages Project SOAS, University of London Language.
NSF PARTNERSHIP FOR RESEARCH AND EDUCATION : M EANING R EPRESENTATION FOR S TATISTICAL L ANGUAGE P ROCESSING 1 TectoMT TectoMT = highly modular software.
Explorestockholm A mobile Internet tourist service.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
The Agricultural Ontology Server (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Food and Agriculture Organization.
Metadata Schema Registries: background and context MEG Registry Workshop, Bath, 21 January 2003 Rachel Heery UKOLN, University of Bath Bath, BA2 7AY UKOLN.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
GATE and the Semantic Web
Corpus Linguistics I ENG 617
MANAGING KNOWLEDGE FOR THE DIGITAL FIRM
Distributed System using Web Services
Hardware Development Tool Stack
Presentation transcript:

Generality and Openness in Enabling Methodologies for Morphology and Text Processing Anssi Yli-Jyrä Department of General Linguistics, University of Helsinki

Tools to make tools... Annotated resources are tools for machine learning and theory developers, for making applications Morphological annotation of morphologically comples languages is difficult. Computational lexicons are tools to make annotation. Finite-state compilers are among most useful tools to make computational word-form lexicons. Open sourcing and collaboration is a tool to make methods widely available.

Limited availability of finite-state tools existing proprietary tools for morphology and shallow processing: –finite-state tools are expensive to develop (e.g. many man years), but very useful –Can the users get support in the future? Can we get the tools in the tomorrow’s machines? –Who may use the compilers, lexicons and corpora? the open source alternatives: –diversity of alternative tools (Unitex, SFST,... ) –low interoperability –much more limited functionality –few standardized interfaces and formats –rejection of finite-state technologies (eg. in Hebrew)

Current Challenges Less-studied, morphologically rich languages are still in need of new professional, fully functional tools –Descriptions without free compilers and run-time implementation are not free in practice! –Ad-hoc tools reduce the productivity of basic resource development –Confusion among the users Effects to the corpus resource creation in any language –Many technologically appropriate, but proprietary tools limit the distribution of the linguistic model and applications developed. –Proprietary compiler tools may induce restrictions on lthe corpora analysed with the descriptions. –Many proprietary analysers hinder the development of widely available treebanks even in well-studied languages Closed, non-extendible tools hinder long-term, incremental development of OS tools

Initiative: Interoperable FS tools Initial surveys Yli-Jyrä et al. (2006), Infrastructures WS, 2006, Genova. Another paper in Nordic Journal of African Studies, Purpose: to increase collaboration between tool providers and satisfaction among users Complementary tools: interoperability, user’s interfaces, standard file formats, converters etc. to get more of the existing tools free APIs to integration to various end-user applications web-based services that apply methods on-demand The evolution of tools enabled by OS solution extensibility of finite-state compilers & related formalisms finite-state methods for machine learning and active learning help to implement BLARK for various languages increase the quality of lexicons and taggers