Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building a Vertical Search Site (using lots of Apache software, of course)

Similar presentations


Presentation on theme: "Building a Vertical Search Site (using lots of Apache software, of course)"— Presentation transcript:

1 Building a Vertical Search Site (using lots of Apache software, of course)

2 Just the Facts, Ma’am Ken Krugler - CTO/co-founder of Krugle We use lots of Apache S/W at krugle.org –Httpd, Lucene, Nutch, Solr, Xerces, etc. I’ll describe our architecture And the sometimes painful lessons learned

3 Three Faces of Krugle Free public site - http://www.krugle.orghttp://www.krugle.org Partner sites –http://sourceforge.krugle.comhttp://sourceforge.krugle.com –http://developerworks.krugle.comhttp://developerworks.krugle.com –http://aws.krugle.comhttp://aws.krugle.com Enterprise appliance

4 Krugle.org free public site Search code, projects, & technical web pages 150,000 projects 2.5billion lines of code 40million web pages

5 Krugle.org Architecture (web) Web tier runs Apache Also mod_perl –“glue” for Javascript to backend RESTful API –Partner APIs “Dirty” side of system

6 Krugle.org Architecture (API) API server uses Resin Webapps provide RESTful API services Filer is big disk array –LightTPD, NFS Searchers run Hadoop, Lucene

7 Krugle.org Architecture (CPI) Page crawl uses Nutch Code crawl uses bits of Nutch, custom stuff Fuzzy parsers created using ANTLR Project data in MySQL, pushed to Solr Code index is Lucene

8 Krugle partner sites IBM developerWorks Sourceforge.net Amazon Web Services Yahoo! Dev Network Collabnet

9 Krugle Architecture (partners) Higher level API Wraps RESTful API Handled in web tier Big chunks of Perl LightTPD cache

10 Krugle enterprise server Krugle inside firewall Talks to major SCMs SCM Comment search Includes public site info

11 Krugle Architecture (enterprise) Collapses web tier, API server, code searchers, filer, and DB server Separate admin system (DB, GUI, code crawler, configuration) as Jetty-hosted webapp

12 RESTful API HTTP requests, XML responses Works well with Perl middleware Some load/memory issues Solr integration challenges Integration test challenges

13 Key Lessons If it isn’t broke, don’t upgrade –There’s always a newer version –That includes the build system Be prepared to pay for free software –Motivating project contributors to do things Moderation in architectural abstraction –There’s always a higher and lower option


Download ppt "Building a Vertical Search Site (using lots of Apache software, of course)"

Similar presentations


Ads by Google