Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sanchay and other NLP Tools Himanshu Sharma, Sambhav Jain.

Similar presentations


Presentation on theme: "Sanchay and other NLP Tools Himanshu Sharma, Sambhav Jain."— Presentation transcript:

1 Sanchay and other NLP Tools Himanshu Sharma, Sambhav Jain

2 Sanchay Sanchay ⇔ संचय (http://sanchay.co.in/) Sanchay ⇔ संचयhttp://sanchay.co.in/ – A Collection of Tools and APIs for Language Processing – An open source platform – Especially South Asian languages 2Sanchay and NLP Tools

3 Sanchay - Installation Platform Independent: Windows/Linux Pre-requisite: Sun (now Oracle) JDK 1.6Sun (now Oracle) JDK 1.6 Download – binaries Extract.zip OR.tgz Go to the extracted directory Ready !!! 3Sanchay and NLP Tools

4 Sanchay - Modules Editors – text, RTF, HTML Tree Creator Syntactic Annotation Alignment tools – Sentence – Word 4Sanchay and NLP Tools

5 Shallow Parser 9 Indian Languages – Hindi,Kannada,Malayalam,Marathi,Tamil,Telugu, Bengali,Punjabi,Urdu Does Tokenization + Morph Analysis + POS Tagging + Chunking Linux Platform http://ltrc.iiit.ac.in/showfile.php?filename=downloa ds/shallow_parser.php http://ltrc.iiit.ac.in/showfile.php?filename=downloa ds/shallow_parser.php 5Sanchay and NLP Tools

6 Shallow Parser - Installation Dependencies – ‘dos2unix’ & ‘unix2dos’ must be installed Download and Extract Install If libgdbm.so.2 doesn’t exist in /usr/lib/ then – sudo cp /usr/lib/libgdbm.so.3 /usr/lib/libgdbm.so.2 6Sanchay and NLP Tools

7 TNT POS Tagger TNT Tagger [http://www.coli.uni-saarland.de/~thorsten/tnt/]http://www.coli.uni-saarland.de/~thorsten/tnt/ Train – tnt-para data.txt – Generates data.123 & data.lex Tag – tnt data file Evaluate – tnt-diff goldfile taggedfile 7Sanchay and NLP Tools

8 CRF++ - Chunker CRF++ [http://crfpp.googlecode.com/svn/trunk/doc/index.html]http://crfpp.googlecode.com/svn/trunk/doc/index.html Separate binaries for Linux as well Windows Installation –./configure – make – make install Sanchay and NLP Tools8

9 CRF++ - Chunker Train –./crf_learn template train_file model Tag/Test –./crf_test -m model testfile 9Sanchay and NLP Tools

10 Malt Parser (dependency parsing) MaltParser – [http://www.maltparser.org/]http://www.maltparser.org/ Train – java –jar malt.jar –c model –i input file –m train Test – java –jar malt.jar –c model –i testfile –o output -m parse 10Sanchay and NLP Tools

11 Other NLP Tools Tookits – NLTK (Python) [http://nltk.org/]http://nltk.org/ – OpenNLP(Java)[http://opennlp.apache.org/]http://opennlp.apache.org/ – LingPipe(Java)[http://alias-i.com/lingpipe/]http://alias-i.com/lingpipe/ Frameworks – GATE [http://gate.ac.uk/]http://gate.ac.uk/ – Apache UIMA [http://uima.apache.org/]http://uima.apache.org/ 11Sanchay and NLP Tools

12 12Sanchay and NLP Tools


Download ppt "Sanchay and other NLP Tools Himanshu Sharma, Sambhav Jain."

Similar presentations


Ads by Google