Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to Use LucidWorks Search Sagnik Ray Choudhury

Similar presentations

Presentation on theme: "How to Use LucidWorks Search Sagnik Ray Choudhury"— Presentation transcript:

1 How to Use LucidWorks Search Sagnik Ray Choudhury

2 Installation and Search Components Installation: Access control. Crawling Aperture crawler. Web, filesystem, amazon S3 bucket Information extraction: Aperture parser Indexing Lucene. Ranking Lucene. Result interface Standard/Flair interface lucidworks IST 441 PSU2

3 Start Page lucidworks IST 441 PSU3

4 Access Control: Admin Panel Admin screen: login here (username admin, password admin) lucidworks IST 441 PSU4

5 Admin Dashboard User control Collections lucidworks IST 441 PSU5

6 Adding Users If you use local installation: May or may not create users. If you use server installation: Create a new user with admin privilege. Delete the admin account. Do not use PSU/IST credentials. Creating new user Deleting admin lucidworks IST 441 PSU6

7 Crawling: Step 1 Add a new collection with default template. lucidworks IST 441 PSU7

8 Crawling: Choosing a Data Source Click on the new collection. Note index size and number of documents. Add a new data source (web site) lucidworks IST 441 PSU8

9 Crawling: Parameter Selection Name, url, crawl depth Constraint to Allow crawling within the site/ outside the site. Include paths Particular set of pages you wish to crawl. Exclude paths Filetypes/ pages you do not Want to crawl. Small scale single thread crawler, for better performance, nutch can be integrated. lucidworks IST 441 PSU9

10 Starting the Crawling Process Click create to move to crawl-job screen. Start crawling (you can add a schedule too to crawl periodically). You can add another website by going back to collection page (slide 8). lucidworks IST 441 PSU10

11 Information Extraction and Indexing Information extraction from crawled web pages. Default: Aperture parser. Fallback: Apache Tika. Extracted information: author, fulltext, date etc. (field mapping section) Information extraction and indexing runs simultaneously with the crawling. Need to do a “hard commit” to ensure that index is up to date. To know more about the index, go to the Solr page for the collection. lucidworks IST 441 PSU11

12 Searching Default interface: click on “tools” link on the top panel. lucidworks IST 441 PSU12

13 Searching: Flare interface The “Apps” page links to the starting point for Flare interface. For advanced searching and statistics, click on your collection. lucidworks IST 441 PSU13

14 Conclusion Basic crawling, indexing and searching using LucidWorks. Simple to use, but do not offer much flexibilities. Things to try: Incorporating new crawlers. Changing the information extraction process. Changing the indexing schema and ranking functions. Questions? lucidworks IST 441 PSU14

Download ppt "How to Use LucidWorks Search Sagnik Ray Choudhury"

Similar presentations

Ads by Google