CIS392 Sp 03Assign#11 CIS392 Text Processing, Retrieval, and Mining Spring 03 Instructor: Dr. Y. F. Brook Wu BOW toolkit:
Published byModified over 4 years ago
Presentation on theme: "CIS392 Sp 03Assign#11 CIS392 Text Processing, Retrieval, and Mining Spring 03 Instructor: Dr. Y. F. Brook Wu BOW toolkit:"— Presentation transcript:
CIS392 Sp 03Assign#11 CIS392 Text Processing, Retrieval, and Mining Spring 03 Instructor: Dr. Y. F. Brook Wu BOW toolkit: http://www.cs.cmu.edu/~mccallum/bow
CIS392 Sp 03Assign#12 Login in to AFS On campus: go to a computer lab in GITC 2305. At home: make sure the internet connection has been established. Assume everyone has Windows at home. Click on Start Run Type in “telnet afs1.njit.edu” (without quotes; the first screen shows some useful information.) Enter user name and password What if your account doesn’t work: Call help desk 973.596.2900, they can reset your password for you.
CIS392 Sp 03Assign#13 Useful UNIX commands Note: All filenames and commands in UNIX system are case sensitive. General syntax: Command [option] Argument Options modify the way command works, and they are optional. Arguments are usually files; sometimes they are optional too. Ex: rm –r directory_name
CIS392 Sp 03Assign#14 Note Typing two “-” next to each other in MS PowerPoint will make them look like “—”. Those BOW and UNIX commands you see in these slides, therefore, are confusing. So, please refer to BOW help file and UNIX documentations for their actual usages.
CIS392 Sp 03Assign#15 Useful UNIX commands man (for manual) ex: man ls (manual for ls command) cd (change directory) ls (list files and attributes) dir (list files) mkdir (crete a directory) rm (delete a file) rm –fr directory_name (delete the whole directory and files inside it.)
CIS392 Sp 03Assign#16 Useful UNIX commands rmdir (remove directory) cp (copy) pwd (current working directory) pico (a text editor) more filename (read plain text file one screen at a time. Press space bar to continue and “q” to quit.) quota (disk space)
CIS392 Sp 03Assign#18 How to create your home page on AFS system? Help info: http://www- ec.njit.edu/ec_info/newuser/web/web.ht mlhttp://www- ec.njit.edu/ec_info/newuser/web/web.ht ml Execute this command at the UNIX prompt: /usr/ec/bin/home.page.setup Your URL: http://www-ec.njit.edu/~yourusername http://www-ec.njit.edu/~yourusername
CIS392 Sp 03Assign#19 Overview of Retrieval Experiment Create a sub-directory for CIS392 assignments under ~your_user_name/public_html Create 3 sub-directories under the above directory for the 3 automatic indexing activities Perform 3 automatic indexing activities with 3 different options
CIS392 Sp 03Assign#110 Overview of Retrieval Experiment (cont) Perform 3 retrievals for each of the above 3 auto indexing activities Analyze how different indexing options affect retrieval Make an html page to present your results.
CIS392 Sp 03Assign#111 Creating sub directories Change directory to public_html by typing: cd public_html mkdir cis392 (now you’ve created a directory for your CIS392 retrieval assignments) cd cis392 (go inside cis392 directory)
CIS392 Sp 03Assign#112 Creating three sub-directories mkdir model1 (this directory stores results from default settings: no stemming and stopped words removed.) mkdir model2 (this directory stores results from the following settings: no stemming, and stopped words INCLUDED.) mkdir model3 (this directory stores results from the following settings: stemming, and stopped words removed.)
CIS392 Sp 03Assign#113 URL of your retrieval experiment http://www- ec.njit.edu/~yourusername/cis392/cis39 2re.html http://www- ec.njit.edu/~yourusername/cis392/cis39 2re.html See a sample page created by Prof Wu: http://www- ec.njit.edu/~wu/cis392/cis392re.html http://www- ec.njit.edu/~wu/cis392/cis392re.html
CIS392 Sp 03Assign#114 Getting Access to BOW and Test Collection there are three directories under ~wu/IR_Tools: bow (for BOW system), to execute BOW, change directory to: ~wu/IR_Tools/bow/bin som (for self-organizing map program. Do NOT use it now!) tc (test collection, Library and Information Science Abstracts) the text is under ~wu/IR_Tools/tc/lisa/text/group0 to group5
CIS392 Sp 03Assign#115 Test Collection: LISA The sample queries are stored in ~wu/IR_Tools/tc/lisa/LISA.QUE The relevant documents corresponding to queries are stored in: ~wu/IR_Tools/tc/lisa/LISA.REL (“-1” marks the end of the entry.)
CIS392 Sp 03Assign#116 Operating Arrow of BOW Read information from BOW’s web site (again, the URL is list on the “Resources” section of the class syllabus) Read Arrow’s help file (available on syllabus page; You should print a copy of the help file.)
CIS392 Sp 03Assign#117 Automatic Indexing To begin the retrieval tasks, first you need to index the whole document collection. Specify lexing options (stopped words removal and/or stemming) at this time. arrow -d ~yourusername/public_html/cis392 - -index ~wu/IR_Tools/tc/lisa/text/* The * sign is a wildcard represents all files and directories under ~wu/IR_Tools/tc/lisa/text
CIS392 Sp 03Assign#118 Automatic Indexing -d parameter specifies where you will store the statistics resulted from indexing. (You will have to specify this directory when you want to index and retrieve documents.) The path after –index specifies the location of text collection. The default lexing settings of the above task include: NO stemming performed, and stopped words REMOVED.
CIS392 Sp 03Assign#119 Query assigned for retrieval Please refer to retrieval experiment section of the online syllabus to see which query you get for the experiment. (http://web.njit.edu/~wu/teaching/sp03/C IS392/CIS392-Sp03.htm)http://web.njit.edu/~wu/teaching/sp03/C IS392/CIS392-Sp03.htm
CIS392 Sp 03Assign#120 Retrieval First, please specify where the indexing statistics is stored, and then the query to be performed. arrow –d ~yourusername/public_html/cis392/model1 -- num-hits-to-show=25 –query > ~yourusername/public_html/cis392/model1/re trieved_docs The greater-than sign (>) specifies the output filename and where it will be stored.
CIS392 Sp 03Assign#121 Presenting your RE create a page under your ~/public_html/cis392 directory named: cis392re.html this page should contain several pieces of information, see: http://web.njit.edu/~wu/cis392/cis392re. html http://web.njit.edu/~wu/cis392/cis392re. html
CIS392 Sp 03Assign#122 Presenting your RE You can create this html page with the pico editor in UNIX (if you know basic html tags), Microsoft Word (save the file in html format), or Netscape composer. If you use an html editor, you might need FTP software. http://www.zdnet.com/downloads/stories/info/0,10615,30994,00.html http://www.zdnet.com/downloads/stories/info/0,10615,30994,00.html Before due date: Please check all items on your html page and make sure all of them are displayed properly. After due date: do not make changes. I can check when the files were last updated.