How to Use LucidWorks Search

How to Use LucidWorks Search
Sagnik Ray Choudhury

Installation and Search Components
Access control. Crawling Aperture crawler. Web, filesystem, amazon S3 bucket Information extraction: Aperture parser Indexing Lucene. Ranking Result interface Standard/Flair interface lucidworks IST 441 PSU

Start Page lucidworks IST 441 PSU

Access Control: Admin Panel
Admin screen: login here (username admin, password admin) lucidworks IST 441 PSU

Admin Dashboard User control Collections lucidworks IST 441 PSU

Adding Users If you use local installation:
May or may not create users. If you use server installation: Create a new user with admin privilege. Delete the admin account. Do not use PSU/IST credentials. Creating new user Deleting admin lucidworks IST 441 PSU

Crawling: Step 1 Add a new collection with default template.
lucidworks IST 441 PSU

Crawling: Choosing a Data Source
Click on the new collection. Note index size and number of documents. Add a new data source (web site) lucidworks IST 441 PSU

Crawling: Parameter Selection
Name, url, crawl depth Constraint to Allow crawling within the site/ outside the site. Include paths Particular set of pages you wish to crawl. Exclude paths Filetypes/ pages you do not Want to crawl. Small scale single thread crawler, for better performance, nutch can be integrated. lucidworks IST 441 PSU

Starting the Crawling Process
Click create to move to crawl-job screen. Start crawling (you can add a schedule too to crawl periodically). You can add another website by going back to collection page (slide 8). lucidworks IST 441 PSU

Information Extraction and Indexing
Information extraction from crawled web pages. Default: Aperture parser. Fallback: Apache Tika. Extracted information: author, fulltext, date etc. (field mapping section) Information extraction and indexing runs simultaneously with the crawling. Need to do a “hard commit” to ensure that index is up to date. To know more about the index, go to the Solr page for the collection. lucidworks IST 441 PSU

Searching Default interface: click on “tools” link on the top panel.
lucidworks IST 441 PSU

Searching: Flare interface
The “Apps” page links to the starting point for Flare interface. For advanced searching and statistics, click on your collection. lucidworks IST 441 PSU

Conclusion Basic crawling, indexing and searching using LucidWorks.
Simple to use, but do not offer much flexibilities. Things to try: Incorporating new crawlers. Changing the information extraction process. Changing the indexing schema and ranking functions. Questions? lucidworks IST 441 PSU

How to Use LucidWorks Search

Similar presentations

Presentation on theme: "How to Use LucidWorks Search"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

How to Use LucidWorks Search

Similar presentations

Presentation on theme: "How to Use LucidWorks Search"— Presentation transcript:

Similar presentations

About project

Feedback