Presentation is loading. Please wait.

Presentation is loading. Please wait.

Search Gotchas Sharon Richardson Joining Dots. Indexing Architecture There can be only one… …indexing server.

Similar presentations


Presentation on theme: "Search Gotchas Sharon Richardson Joining Dots. Indexing Architecture There can be only one… …indexing server."— Presentation transcript:

1 Search Gotchas Sharon Richardson Joining Dots

2 Indexing Architecture There can be only one… …indexing server

3 Front-end Index Internal Content Web front-ends Internal & External Content sources Single Server Deployment Web Farm Queries Indexing Index Server External Content sources

4 Large Web Farm Indexing server Web front-ends Query servers Internal & External Content sources

5 Scaling to 50 million docs Source: Estimate performance and capacity requirements for search environments 600Gb content created 100Gb index Full crawl took 35 days (approx 15 docs per sec) If 2% of content changes, an incremental crawl will take approx 8 to 12 hours SharePoint Sites10 million items File shares15 million items Web content15 million items People profiles2.5 million Documents (auto-generated)7.5 million Properties (metadata)1 million

6 Scaling to 50 million docs Source: Estimate performance and capacity requirements for search environments Test Lab Recommended 4 dual-core Intel Xeon 2.66 Ghz processors Dual 3Ghz processors 32Gb RAM 4Gb RAM (for > 1m docs) Index server disk space requirements (according to doc) Size of data crawled = Y Size of index = range of 5% thru 12% * Y = X Initial disk space = a minimum of 2.5 * X

7 Taxonomy Management Still haven’t found what you’re looking for?

8 Taxonomy Management Conceptual and related-term searches require classification –Manual = user tagging –Automatic = provide training set (Bayesian inference algorithm Rev. Thomas Bayes (1702–1761)

9 Taxonomy Options Scopes Keywords Columns (metadata for internal content) Customised results pages Third-party add-ons

10 From Gotcha to Oscar

11 Social Searches Social network User profile Diary and contact info Organisation hierarchy

12 What vs Who? Employees get 50%-75% of their relevant information directly from other people More than 80% of enterprises’ digitized information reside on individual hard drives and in personal files –Source: “The Knowledge Worker Investment Paradox” Gartner research 7/17/2002

13 Define Search …beyond queries SharePointPlatformServices Find Use Share Web Desktop Intranet

14 References Estimate performance and capacity requirements for search environments http://technet2.microsoft.com/Office/en- us/library/5465aa2b-aec3-4b87-bce0- 8601ff20615e1033.mspx?mfr=truehttp://technet2.microsoft.com/Office/en- us/library/5465aa2b-aec3-4b87-bce0- 8601ff20615e1033.mspx?mfr=true Third-party tools and add-ons http://markharrison.co.uk/blog My blog http://www.joiningdots.net/blog

15 Thank you! Sharon Richardson Joining Dots Email: sharonr@joiningdots.netsharonr@joiningdots.net


Download ppt "Search Gotchas Sharon Richardson Joining Dots. Indexing Architecture There can be only one… …indexing server."

Similar presentations


Ads by Google