Download presentation
Presentation is loading. Please wait.
Published byBethany White Modified over 8 years ago
1
Extending LanguageTool, a style and grammar checker Daniel Naber, Marcin Miłkowski 11:15 – 13:00 Friday, 2009-11-06 Workshop Notes v2.0, updated 2009-11-11
2
Extending LanguageTool slide 2 Agenda ● Intro ● Workshop ● Download and use LanguageTool ● understand XML rules ● write new XML rules ● add support for a new language ● the community website ● setup Eclipse and write Java rules ● Resources
3
Extending LanguageTool slide 3 Intro ● LanguageTool = Open Source language checker for English, French, German, Polish + more ● Not (just) grammar
4
Extending LanguageTool slide 4 Intro: How to express Rules what's allowed what's forbidden - LanguageTool knows errors, i.e. what's forbidden
5
Extending LanguageTool slide 5 Intro: Flow Sentence Tokenization Word Tokenization POS Tagging Disambiguation and Chunking Rule Matching Display or Apply Suggestions Rules written by you
6
Extending LanguageTool slide 6 Workshop Notes ● download the current OXT ● install it in OOo and try on English text ● try the GUI: java -jar LanguageToolGUI.jar ● command line and HTTP server
7
Extending LanguageTool slide 7 Workshop Notes ● show that the English rules are in in rules/en/grammar.xml ● explain that using an XML editor is a good idea ● explain the syntax of a simple rule - one of the English redundant phrases, this is simple but useful ● create a rule to match "foo bar"
8
Extending LanguageTool slide 8 Workshop Notes ● restart LT and try it (GUI version, not OOo) ● run a test using testrules.sh/testrules.bat ● it's important to use real examples ● explain a more complex rule
9
Extending LanguageTool slide 9 Workshop Notes ● what is a POS tagger ● mention SENT_START / SENT_END tags available to all languages ● it_is - a couple of cases, one with POS tags, and a couple others (as a rulegroup) ● “both… and” in English, as it requires skipping ● a rule for repeated phrases (can be French or Dutch)
10
Extending LanguageTool slide 10 Workshop Notes ● complex suggestions and testing corrections ● what is a synthesizer
11
Extending LanguageTool slide 11 Workshop Notes ● show how we add a new language by adapting the Java code ● check out code from CVS ● help is at http://sourceforge.net/scm/?type=cvs&group_id=110216 http://sourceforge.net/scm/?type=cvs&group_id=110216 ● module name is “JLanguageTool” ● call “ant” ● see http://www.languagetool.org/development/#newlanguage http://www.languagetool.org/development/#newlanguage ● you can usually start working in a different language
12
Extending LanguageTool slide 12 Workshop Notes ● complex suggestions and testing corrections ● what is a synthesizer
13
Extending LanguageTool slide 13 Workshop Notes ● community.languagetool.org website ● how to use it to find false alarms in Wikipedia ● how to find content for rules (Wikipedia typos page, community feedback…)
14
Extending LanguageTool slide 14 Workshop Notes ● How to setup LanguageTool in Eclipse ● check out from CVS (see above) ● File -> Import… ● General, Existing Projects into Workspace ● select the "JLanguageTool" directory just checked out from CVS ● structure of a rule written in Java ● write your own simple rule
15
Extending LanguageTool slide 15 Finally... Resources ● All resources are at http://www.languagetool.org or are linked from therehttp://www.languagetool.org ● Don't miss the Wiki ● Discussion happens on the mailing list
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.