Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generality and Openness in Enabling Methodologies for Morphology and Text Processing Anssi Yli-Jyrä Department of General Linguistics, University of Helsinki.

Similar presentations


Presentation on theme: "Generality and Openness in Enabling Methodologies for Morphology and Text Processing Anssi Yli-Jyrä Department of General Linguistics, University of Helsinki."— Presentation transcript:

1 Generality and Openness in Enabling Methodologies for Morphology and Text Processing Anssi Yli-Jyrä Department of General Linguistics, University of Helsinki

2 Tools to make tools... Annotated resources are tools for machine learning and theory developers, for making applications Morphological annotation of morphologically comples languages is difficult. Computational lexicons are tools to make annotation. Finite-state compilers are among most useful tools to make computational word-form lexicons. Open sourcing and collaboration is a tool to make methods widely available.

3 Limited availability of finite-state tools existing proprietary tools for morphology and shallow processing: –finite-state tools are expensive to develop (e.g. many man years), but very useful –Can the users get support in the future? Can we get the tools in the tomorrow’s machines? –Who may use the compilers, lexicons and corpora? the open source alternatives: –diversity of alternative tools (Unitex, SFST,... ) –low interoperability –much more limited functionality –few standardized interfaces and formats –rejection of finite-state technologies (eg. in Hebrew)

4 Current Challenges Less-studied, morphologically rich languages are still in need of new professional, fully functional tools –Descriptions without free compilers and run-time implementation are not free in practice! –Ad-hoc tools reduce the productivity of basic resource development –Confusion among the users Effects to the corpus resource creation in any language –Many technologically appropriate, but proprietary tools limit the distribution of the linguistic model and applications developed. –Proprietary compiler tools may induce restrictions on lthe corpora analysed with the descriptions. –Many proprietary analysers hinder the development of widely available treebanks even in well-studied languages Closed, non-extendible tools hinder long-term, incremental development of OS tools

5 Initiative: Interoperable FS tools Initial surveys Yli-Jyrä et al. (2006), Infrastructures WS, 2006, Genova. Another paper in Nordic Journal of African Studies, 2005. Purpose: to increase collaboration between tool providers and satisfaction among users Complementary tools: interoperability, user’s interfaces, standard file formats, converters etc. to get more of the existing tools free APIs to integration to various end-user applications web-based services that apply methods on-demand The evolution of tools enabled by OS solution extensibility of finite-state compilers & related formalisms finite-state methods for machine learning and active learning help to implement BLARK for various languages increase the quality of lexicons and taggers


Download ppt "Generality and Openness in Enabling Methodologies for Morphology and Text Processing Anssi Yli-Jyrä Department of General Linguistics, University of Helsinki."

Similar presentations


Ads by Google