Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to Use LucidWorks Search

Similar presentations

Presentation on theme: "How to Use LucidWorks Search"— Presentation transcript:

1 How to Use LucidWorks Search
Sagnik Ray Choudhury

2 Installation and Search Components
Access control. Crawling Aperture crawler. Web, filesystem, amazon S3 bucket Information extraction: Aperture parser Indexing Lucene. Ranking Result interface Standard/Flair interface lucidworks IST 441 PSU

3 Start Page lucidworks IST 441 PSU

4 Access Control: Admin Panel
Admin screen: login here (username admin, password admin) lucidworks IST 441 PSU

5 Admin Dashboard User control Collections lucidworks IST 441 PSU

6 Adding Users If you use local installation:
May or may not create users. If you use server installation: Create a new user with admin privilege. Delete the admin account. Do not use PSU/IST credentials. Creating new user Deleting admin lucidworks IST 441 PSU

7 Crawling: Step 1 Add a new collection with default template.
lucidworks IST 441 PSU

8 Crawling: Choosing a Data Source
Click on the new collection. Note index size and number of documents. Add a new data source (web site) lucidworks IST 441 PSU

9 Crawling: Parameter Selection
Name, url, crawl depth Constraint to Allow crawling within the site/ outside the site. Include paths Particular set of pages you wish to crawl. Exclude paths Filetypes/ pages you do not Want to crawl. Small scale single thread crawler, for better performance, nutch can be integrated. lucidworks IST 441 PSU

10 Starting the Crawling Process
Click create to move to crawl-job screen. Start crawling (you can add a schedule too to crawl periodically). You can add another website by going back to collection page (slide 8). lucidworks IST 441 PSU

11 Information Extraction and Indexing
Information extraction from crawled web pages. Default: Aperture parser. Fallback: Apache Tika. Extracted information: author, fulltext, date etc. (field mapping section) Information extraction and indexing runs simultaneously with the crawling. Need to do a “hard commit” to ensure that index is up to date. To know more about the index, go to the Solr page for the collection. lucidworks IST 441 PSU

12 Searching Default interface: click on “tools” link on the top panel.
lucidworks IST 441 PSU

13 Searching: Flare interface
The “Apps” page links to the starting point for Flare interface. For advanced searching and statistics, click on your collection. lucidworks IST 441 PSU

14 Conclusion Basic crawling, indexing and searching using LucidWorks.
Simple to use, but do not offer much flexibilities. Things to try: Incorporating new crawlers. Changing the information extraction process. Changing the indexing schema and ranking functions. Questions? lucidworks IST 441 PSU

Download ppt "How to Use LucidWorks Search"

Similar presentations

Ads by Google