Presentation on theme: "September 24, 2008 Zuzana Gedeon – Research Labs"— Presentation transcript:
1 September 24, 2008 Zuzana Gedeon – Research Labs Knowledge Base TuningSeptember 24, 2008Zuzana Gedeon – Research Labs
2 Overview Key Ideas Searching KB Configuration & Tuning KB – what do we/you mean by KBSearchingBackground on Search Engine TechnologyKB Configuration & Tuning
3 KB in general terms Knowledge base Data mining – making KB available Information available to the userData mining – making KB availableApplications helping user to access this information
4 Who uses the Knowledge Base? End usersInternal databaseManagersCSRsExternal documentsAnswer databaseCommunity ForumComing up: how users search? What do we mean by “search”MarketersSubject experts
5 Types of search - architecture Filter basedDirect database query – built into views engineproduct/category filteringDate, customer address, …Most runtime selectable filters in ReportsText/Index based“Google style” searchDocuments -> indexBoosting and weight calculationKB BrowseNavigational, exploratory searchNo Search !!Get what you need without need for search
6 Mashup Report filters with index based search Incident search Answer search pagesFilter > search_thread (search_xxx)Sort by match_wt!!!
7 Types of search - architecture Filter basedDirect database query – built into views engineproduct/category filteringDate, customer address, …Most runtime selectable filters in ReportsText/Index based“Google style” searchDocuments -> indexBoosting and weight calculationKB BrowseNavigational, exploratory searchNo Search !!Get what you need without need for search
8 EU Knowledge sources and delivery Syndication widgetVoiceKB SearchKB BrowseAnswer databaseExternal documentsCommunity ForumPro Services integration
9 No Search !!Fact: A large percentage of user sessions do NOT do a searchUsers find what they are looking for without any search just by showing them the right stuff as soon as they access page.
10 How do we do that? Good content Administrator Users Well-chosen category and product organizationGood descriptive titlesConcise information (generic vs. specific)ConsistencyAdministratorTopic/Add wordsUser specifiable content tags to start/stop indexing for searchingAnswer as a file attachment or URL versus just Q&A pairSmartGuide to create branching (script-like) AnswersPublish-on and review-on datesPlace on top (“fix on top” really sparingly)Answer access level conditional sectionsUsersUsers ranking helpfulness - explicitlyAnts leaving pheromone trail – implicit ranking
11 Find information where they search Sitemap: exporting KB to search enginesWhat are Sitemaps?Sitemaps are an easy way for webmasters to inform search engines such as Google and Yahoo about pages on their sites that are available for crawling.Sitemap Feature Description:Facilitates Google’s (and other search engine’s) spidering of your public RightNow knowledgebase content.Benefits:Allows you to control how search engine spiders visit and consume your knowledgebase content.If you desire, this can help your content go to the front of the line in Google/Yahoo web spiders.
12 Information placement Knowledge Syndication Widgetwith Product filter
13 How do we do that? Good content Well-chosen category and product organizationGood descriptive titlesConcise information (generic vs. specific)Consistency
14 How do we do that? Administrator Topic/Add words User specifiable content tags to start/stop indexing for searchingAnswer as a file attachment or URL versus just Q&A pairPublish-on and review-on datesAnswer access level conditional sectionsPlace on top (“fix on top” really sparingly)
15 Topic Words for SearchAllows KB administrator to associate either a WWW document or KB Answer to a specific single search termThe given document appears first in the list of search resultsDocument can be set to always be shownUseful for directed information presentation, advertising, notices, announcements, etc.
17 How do we do that? Administrator Topic/Add words User specifiable content tags to start/stop indexing for searchingAnswer as a file attachment or URL versus just Q&A pairPublish-on and review-on datesAnswer access level conditional sectionsPlace on top (“fix on top” really sparingly)
18 Stop/start index This text is being indexed <!--stopindex--> this text is not being indexed<!--startindex-->And this text is again indexedCase independent
19 How do we do that? Administrator Topic/Add words User specifiable content tags to start/stop indexing for searchingAnswer as a file attachment or URL versus just Q&A pairPublish-on and review-on datesAnswer access level conditional sectionsPlace on top (“fix on top” really sparingly)
20 How do we do that? Users AI Administrator Users ranking helpfulness - explicitlyAnts leaving pheromone trail – implicit rankingAIaging of the informationagedatabaseAdministratorPromoting new answers
21 No Search !!Users find what they are looking for without any search just by showing them the right stuff as soon as they access page.
22 Users + AI Common-> knowledge base -> Answer search: SA_SOLVED_WEIGH_PREF – long term or short term preference
25 Relationships Between Answers Sibling Answers section must be enabled from workspace propertyCan manually relate answers together
26 Use Smart Assistant Set up Smart Assistant Rules Help in populating KB – respond to customer inquiries– propose new answersSet up Smart Assistant RulesTry to answer the question without admin interaction
27 Smart Assistant tuning Limit by matching Browse topicsRNT UI → Support → SA_NL_MATCH_THRESHOLDEnables the ability to restrict SmartAssistant suggested answers to answers that have the same or closely matching locations in the browse tree. The accepted values are:0 - do not restrict,1 - use answers from any closely matching clusters, and2 - use only best matching clusters.If SA_DM_FREQ is set to 0, the value of SA_NL_MATCH_THRESHOLD will be forced to 0 regardless of the value set here. Default is 1.
28 Suggested Searches EU_SUGGESTED_SEARCHES_ENABLE Using history of end-user searches we use a data-mining technique to establish relationships between similar search phrasesEU_SUGGESTED_SEARCHES_ENABLEEach search phrase suggested to an end-user must pass these testsEach word spelled correctlyPositive SmartSense valueNo words in blacklistBe complementary to current searchSEARCH_SUGGESTIONS_DISPLAY0 no recommendations1 turn on recommended products2 turn on recommended categories4 turn on recommended Browse topicsMAX_SEARCH_SUGGESTIONS
29 Web Like Search Attributes of Search Traditional keyword searching on the internet orwithin an operating system.User’s mental model(Google, Yahoo, MSN)Attributes of SearchIndexes the ‘entire’ corpus of information.Almost never results in a zero matches.User Testing in Jan 08 showed that Google is expected behavior wheneverthe term ‘Search’ is paced next to a text box on the web.
31 External documents search Web pagesAnswersextra
32 What’s an Index?The index is where all the information about what is searchable is storedIndexes are used to speed finding search results by not requiring each document to be scanned during the search processMost search engines (including ours) use an ‘inverted index’ which means that they map words to documents, or words to locations within documents- Similar to the index in the back of a bookVs “find a word with your finger”Indexes are pre-computed when documents are created/edited
33 Example of an Index Index Four score and seven years ago our fathers brought forth on this continent, conceived in Liberty, andScore:A group of 20 items. Hence, four score is 4x20, or 80.IndexyearsunitedstatessevenscorerestrictionnorthmexicolibertyLocationWordLiberty:The condition of being free from restriction or control.The North American Continent consists of the countries: the United States of America, Canada, Mexico,
34 Stopwords and Word Stemming Stopwords are human-language connector words that are not generally useful in information retrievala, an , the, or, on , for, …“To be or not to be”RightNow Feature: multiple editable stop word listsIncidentsAnswersWord StemmingStandard natural language processing techniqueUnique stemmer for each languageCONNECT CONNECTED CONNECTING CONNECTION CONNECTIONS => CONNECT- Generalizes searches (exact matches not considered)
35 Query Processing and Result Ranking How does a search query work?Query is processed via word stemming and removal of stopwordsAliases are added to the search terms (non stopwords, original form)Search terms are looked up in the indexThe total hits are gathered and sorted by document via weighting formula(s)The documents’ attributes (title, link, etc.) are fetched and returned to the browserpostprocessing algorithm may be used before display
37 Word Bias Configuration Some words are relatively more important than others based upon locationWords in the Subject & Keywords field are more important than words in the body of a document or the attachmentsRightNow Configuration OptionsSRCH_KEY_WEIGHT 50 KeywordsSRCH_PROD_WEIGHT 50 Product WordsSRCH_CAT_WEIGHT 50 Category WordsSRCH_SUBJ_WEIGHT 45 Subject/Title WordsSRCH_DESC_WEIGHT 30 Question WordsSRCH_BODY_WEIGHT 4 Answer WordsSRCH_ATTACH_WEIGHT 4 File-Attach. WordsSet these to be the same across interfaces!Make sure to point out to go back to best practices—global changes
38 AND vs. OR Query Processing Do the search results contain ALL words in the search text or just SOME words?All major Internet search engines use ANDWe use OR by default with a heavy multi-word weight bias .. “AND-like ordering”Why do we use OR? AND does not work well for small documents sets (under 10,000 answers).Why does AND perform badly on small document sets? It’s too easy for a user to construct a query with no search results.Add animation
39 Result Focusing and Truncation Dynamic Truncation Bias (Answers)Truncate Search Results to those most scoring bestRNT UI: SEARCH_RESULT_LIMITING – natural breaksRNT UI: ANS_SRCH_THRESHOLD – break by weightRNT UI: ANS_SRCH_SUB_THRESHOLD – avoid 0 resultsConcept-biased SearchFocus Search results based upon matching of query to existing KB learned topicsRNT UI: SEARCH_RELEVANCE_FOCUS (Answers)RNT UI: SA_NL_MATCH_THRESHOLD (SmartAssistant)Make sure to point out to go back to best practices—global changes
41 External documents and tuning No much of content controlspideruses only externally available contentNot much tuning controlTitle and body weightSRCH_KEY_WEIGHT Meta+ products, categoriesSRCH_SUBJ_WEIGHT TitleSRCH_DESC_WEIGHT TextHtDig with CluceneFile Attachment SizeFATTACH_MAX_SIZE Core EngineSearch Pulldowns – Kill themANS_SEARCH_BY_ENABLEDANS_SORT_BY_ENABLED
42 Important Files in the File Manager exclude_answers.txtEnd-user Stopwordsexclude_incidents.txtIncident Stopwordsaliases.txtAlways-On Search Thesaurusthesaurus.txtThesaurus for similar searchsmartsense.txtEmotional Word Ratingsblacklist.txtNo-Show words for Sugg. Searchesuserdic.tlxCustom Dictionary for Spellchecker
44 Aliases Establishes a link between two words to treat them as synonyms for every search typeFBI = Federal Bureau of InvestigationWhiskey = ScotchGo to demo
45 Analytics Keyword Searches report Gap report Frequent searches (important content)Searches with no answers (missing content)Searches with too many answers (configuration and tuning needed)Gap report
47 Information Gap Report Use the Gap Report to identify ‘holes’ in the end-user KB.Compares recent incidents to existing Answers.Gap Report Config Options: GAP_FREQUENCY & GAP_TIME_PERIOD – default 7 days for both.
49 Other Customization EU_BROWSER_SEARCH_PLUGIN - Enables the Answer and External Document search pages to provide an interface for web browsers to query them directly from their built-in search bars, such as those provided by Google or Yahoo!. Default is disabled (No). EU_SYNDICATION_ENABLE – widgets ANS_SORT_BY_ENABLED Enables the Sort By drop-down menu on the Find Answers page. This setting overrides any view settings. Default is disabled (No). – this is the reason to have limited results set!!!! SEARCH_WITH_OPERATORS Enables processing of +, - and ~ operators while searching for answers. Default is enabled (Yes).