Presentation is loading. Please wait.

Presentation is loading. Please wait.

GOAT SEARCH Revorg GOAT Search Solution (Powered by Lucene)

Similar presentations


Presentation on theme: "GOAT SEARCH Revorg GOAT Search Solution (Powered by Lucene)"— Presentation transcript:

1 GOAT SEARCH Revorg GOAT Search Solution (Powered by Lucene)

2 About Me Grover Fields  Revorg, LLC (Owner)  M.S. Information System (Troy University)  B.S. Industrial Engineering (Florida A&M University)  Stanford Project Management Courses

3 About Me  10+ years of development, analysis, and implementation  10+ years of ColdFusion experience  2+ years of Java experience  Commonspot, Strongmail, ClickFix (Developer)  Email: grover_fields@yahoo.com  Web site: http://www.groverfields.comhttp://www.groverfields.com

4 Agenda  What? What can we do with GOAT?  Why? Why do we want to use GOAT and not Verity?  How? How do we do that?  Conclusion and alternative solutions

5 What  What is a Search Engine? Builds an index on text Answers queries using that index, a la Verity Existing database already  A search engine offers? Scalability Reliance Ranking Tweaking Integrates different sources (email, web pages, files, DATABASES)

6 What is a search engine? (cont.)  Works on words, not on substrings Auto != automatic, automobile  Indexing process: Convert document Extract text and meta data Normalize text Write (inverted) index

7 Apache Lucene Overview  Lucene Java 2.4 A high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.  No GUI  http://lucene.apache.org http://lucene.apache.org

8 Apache Lucene Overview  Java library for indexing and searching  No dependencies  Works with Java 1.4 or later  Input for indexing: Document objects Each document: set of Fields, field name, field content  Stores its index as files on disk or memory  No document converters  No web crawler

9 Lucene Java users  HBCU.info  LinkedIn  IBM OmniFind Yahoo! Edition  Techorati.com  Eclipse  Monster.com  …

10 Lucene Java Summary  Java Library for indexing and searching  Lightweight /no dependencies  Powerful and fast and tested!  No document conversion  No GUI

11 Why?  Cost of Enterprise Search Solution  Need for search speed  Java projects to work on Things to do

12 Verity Limitations  10,000 documents for ColdFusion Developer Edition  125,000 documents of ColdFusion Standard Edition  250,000 documents for ColdFusion Enterprise Edition What do developers do in a shared hosting environment? Is it possible for the hosting company to limit the number of documents per Web site?

13 T-SQL Limitations?  Search for “Yahoo” on my blog SELECT entry.id FROM tbl_mango_entry as entry INNER JOIN tbl_mango_post as post ON entry.id = post.id WHERE entry.blog_id = ‘default’ AND (entry.title LIKE ‘%yahoo%’ OR entry.content LIKE ‘%yahoo%’ OR entry.excerpt LIKE ‘%yahoo%’ ) AND post.posted_on <= getdate() AND entry.status = 'published' ORDER BY post.posted_on DESC  Multiply that time 10, 100, 500, or 1000 users/hr?

14 T-SQL Limitations?  Full table scan = 1 THING  PERFORMANCE KILLER!!!  No search sorting RDBMS isn’t designed to do this but allows it  Use the right tools!

15 How?  GOAT Search Solution Lucene 2.4.0 ColdFusion MX 8 MX is fine but GUI needs to be rolled back Commons IO 1.4  Simply package.jar files  Simply Web based GUI

16 How?  Macromedia JDBC Drivers Same drivers that ColdFusion uses No additional drivers to install  Supports RDBMS ONLY MSSQL MySQL Oracle  No File system support (Yet)

17 Basics?  Indexing extracts both meaning and structure from unstructured information by indexing each document  Contains a complete list of all the words used in a given document along with metadata about that document  Lucene creates a collection that normalizes both the structured and unstructured data.  Search requests then check these collections rather than scanning the actual documents and database fields.  This provides a faster search of information, regardless of the file type and whether the source is structured or unstructured.

18 Basics?  Collection A special database created by Lucene that contains metadata that describes the documents  Documents A sequence of fields Similar to a row in a database table Row 1 Row 2, etc  Fields A named sequence of terms Similar to a column in a table Primary Key Column 1  Terms Is a string

19 Knowledge?  Index A special database created by Lucene that contains metadata that describes the documents  Query Syntax Similar to Google’s advanced search: field:value E.G. resume: coldfusion http://lucene.apache.org/java/2_4_0/queryparsersyntax.html Results Primary Key list of values XML based on the document CFX Tag integration

20 Alternative Solutions for Search  Commercial vendors: FAST, $100k Autonomy, $80k Google, $50k  Commercial search engines based on Lucene IBM OmniFind Yahoo Edition  RDBMS with Integrated Search Oracle MySQL MSSQL PERFORMANCE KILLERS

21 RoadMap A set of guidelines, instructions, or explanations: wrote an ethics code as a road map for the behavior of elected officials.  Overhaul Java programming (still novice)  Integrate with other products Aperture Nutch Solr  File system integration.txt,.pdf,.doc,.ppt, etc.  Geospatial based searches E.G. All jobs within a 50 mile radius

22 References  Apache.org  Adobe.com  Ben Forta’s Blog  Slideshare.net Multiple authors  Other references


Download ppt "GOAT SEARCH Revorg GOAT Search Solution (Powered by Lucene)"

Similar presentations


Ads by Google