Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to Tag a Corpus Using Stanford Tagger. Accuracy All tokens: 97.32% Unknown words: 90.79%

Similar presentations


Presentation on theme: "How to Tag a Corpus Using Stanford Tagger. Accuracy All tokens: 97.32% Unknown words: 90.79%"— Presentation transcript:

1 How to Tag a Corpus Using Stanford Tagger

2 Accuracy All tokens: 97.32% Unknown words: 90.79%

3 What You Need JRE: http://www.java.com/en/download/ie_manual.j sp?locale=en

4 To make sure that Windows can find the Java compiler and interpreter: Select Start -> Computer -> System Properties -> Advanced system settings -> Environment Variables -> System variables - > PATH. [ In Vista, select Start -> My Computer -> Properties -> Advanced -> Environment Variables -> System variables -> PATH. ] [ In Windows XP, Select Start -> Control Panel -> System -> Advanced -> Environment Variables -> System variables -> PATH. ] Prepend C:\Program Files\Java\jdk1.6.0_27\bin; to the beginning of the PATH variable. Click OK three times.

5 Installing Java (JRE) on your computer  Click Start  type cmd and press enter  this will open the command prompt window  type java –version and press enter  you will get a message: java version “1.7.0” (or may be an older version) If you do not get this message it means you could not install Java correctly. Ask for help.

6 Install the Stanford POS Tagger Basic English Stanford Tagger Version 3.1.3: http://nlp.stanford.edu/software/stanford- postagger-2012-07-09.tgz

7 Installing Basic English Stanford Tagger Version 3.1.3 Click on the link that I provided above download the zip file. Unzip the file to Documents using an archive manager software, such as WinRAR, 7-Zip, or WinZip You might want to change the name of this unzipped folder to stanTagger. I do this because the original name is too long: stanford-postagger-2012-07-09

8 Create a Corpus Folder In stanTagger folder create two folders to hold your files. I name them myCorpus and myTaggedCorpus Now put some text files (or your corpus) in myCorpus Make sure there are no spaces in your file names. For example, writtenArgument.txt instead of written Argument.txt Carry your folder named stanTagger under C: so that you can find it easily.

9 Tagging Files  Start your command window as described above  Go to C: by typing the command cd.. twice  Go in stanTagger by typing cd stanTagger

10 Tagging files To be able to use the Stanford-Tagger on every file automatically, we need to do some programming. We can do this with Perl or other programming languages, such as Java, PHP, Python, and so on. However, I found programming the Command Prompt to be the simplest and will share the code I prepared.

11 Tagging files Code to be used in Command Prompt: FOR %a IN (C:\stanTagger\myCorpus\*.txt) DO stanford-postagger models\left3words-wsj-0- 18.tagger myCorpus\%~nxa >myTaggedCorpus\%~nxa You can simply copy the above code and paste it in the Command Prompt

12 New Code! FOR %a IN (C:\stanTagger\myCorpus\*.txt) DO stanford-postagger models\wsj-0-18- left3words.tagger myCorpus\%~nxa >myTagge dCorpus\%~nxa

13 Newest Code! FOR %a IN (C:\stanTagger\myCorpus\*.txt) DO stanford-postagger models\english- left3words- distsim.tagger myCorpus\%~nxa >myTaggedCo rpus\%~nxa

14 Each file may take about 2-3 seconds and at the end, you will see that myTaggedChineseFolder contains the tagged files.


Download ppt "How to Tag a Corpus Using Stanford Tagger. Accuracy All tokens: 97.32% Unknown words: 90.79%"

Similar presentations


Ads by Google