Presentation is loading. Please wait.

Presentation is loading. Please wait.

Search Bootstrapping How / Where to get started. Crawling Start with Nutch – Index directly to SOLR –

Similar presentations


Presentation on theme: "Search Bootstrapping How / Where to get started. Crawling Start with Nutch – Index directly to SOLR –"— Presentation transcript:

1 Search Bootstrapping How / Where to get started

2 Crawling Start with Nutch – http://nutch.apache.org/ Index directly to SOLR – http://www.lucidimagination.com/blog/2010/09/10 /refresh-using-nutch-with-solr/ http://www.lucidimagination.com/blog/2010/09/10 /refresh-using-nutch-with-solr/ Create a seed list from DMOZ rdf – http://www.dmoz.org/rdf.html http://www.dmoz.org/rdf.html – http://wiki.apache.org/nutch/NutchTutorial http://wiki.apache.org/nutch/NutchTutorial

3 Understanding Content Entity Extraction – LingPipe http://alias-i.com/lingpipe/http://alias-i.com/lingpipe/ – OpenNLP http://incubator.apache.org/opennlp/http://incubator.apache.org/opennlp/ Entity Identification / Taxonomies – Freebase http://www.freebase.com/http://www.freebase.com/

4 Some Additional Links Basic Web Page Parser – https://github.com/pjaol/Webcrawler https://github.com/pjaol/Webcrawler Example of OpenNLP usage – https://github.com/pjaol/entity_extractor https://github.com/pjaol/entity_extractor


Download ppt "Search Bootstrapping How / Where to get started. Crawling Start with Nutch – Index directly to SOLR –"

Similar presentations


Ads by Google