Presentation is loading. Please wait.

Presentation is loading. Please wait.

HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

Similar presentations


Presentation on theme: "HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development."— Presentation transcript:

1 HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development

2 HORIZONTAL VERSUS VERTICAL SEARCH HORIZONTALVERTICAL Consumer focusProfessional focus General interestSpecialist interests Average userExpert user Shallow information needDeep information need 2

3 THE PARADOX OF SEARCH The further you get from keyword indexing and retrieval, the harder it is to explain a search result –Professional searchers demand transparency Tool versus appliance You need an explanatory model that people can relate to and understand, even if it is actually just a cartoon of the real process –Examples: Basic PageRank, Collaborative Filtering Such models dont work so well in vertical domains –Links arent always endorsements –Sparsity of data in smaller communities 3

4 RECENT TRENDS IN SEARCH Fragmentation of horizontal search –Media, location, demographics (Weber & Castillo, 2010) More sophisticated models of user behavior –Post-click behaviors (Zhong, Wang, et al, 2010) Practical semantics versus Semantic Web –Maps as search results for local, micro-results Incorporation of domain knowledge into search –Taxonomies, vocabularies, use cases, work flows 4

5 THE EXAMPLE OF LEGAL SEARCH The completeness requirement –Recall as important as precision Less redundancy than on the Web The authority requirement –Court superiority, jurisdiction –Highly cited cases and statutes Supercession by statute or regulation The multi-topical nature of documents –Case may cover many points of law but only cited for one –Citations can be negative as well as positive per topic >These factors also apply to scientific documents 5

6 POWER LAW AND LEGAL TOPICS 6

7 POWER LAW AND WESTLAW USERS 7

8 EXPERT SEARCH In many verticals, there are at least two sources of expertise available for enhancing search –Editors and authors, who generate useful metadata –Users, who generate clickstreams and other data Editorial value addition improves recall especially –Helps find both fat neck and long tail document on a topic Aggregate user behavior mostly improves precision –Power users find most relevant and important documents The model of expert search enables and explains the portfolio of results, rather than individual results 8

9 9 SOURCES OF EVIDENCE: AUTHORS & EDITORS Headnote, KN text text text text citation text citation text text = = = CASE = = = CASE = = = CASE = = = CASE = = = CASE = = = CASE = = = CASE = = = CASE = = = 17201 3(A) 28(B) 205,310 5 (A) 19 (B) Issue: Long arm jurisdiction 12A(Key cases) 54B(Highly Relevant) 35 4 (A) 5 (B) = = = CASE Burger King Corp, V. Rudzewicz

10 Burger King Corp, V. Rudzewicz 10 SOURCES OF EVIDENCE AUTHORS & EDITORS HN1 KN1 HN2KN2 HN3KN2…. …..... HN35KN14 = = = ALR = = = CJS = = = AMJUR = = = CASES = = = CASES = = = CASES = = = CASES = = = CASES = = = CASES = = = Another set of related cases

11 Burger King Corp, V. Rudzewicz 11 SOURCES OF EVIDENCE: USERS (I) = = = CASES = = = Query 1 Query 2 Query 3 Query N CLICK SESSION 1 CLICK SESSION N PRINT = = = CASES = = = ACTIONS Link query language to document language via click, print, and cite checking behaviors Identify documents that are co-clicked, co-printed, etc, with the Burger King case across user sessions CLICK PRINT KEYCITE

12 12 QUERY 1 QUERY N "personal jurisdiction176 "minimum contacts50 "forum selection clause39 personal jurisdiction39 "forum non conveniens32 "choice of law29 IN THE LAST 3 MONTHS SOURCES OF EVIDENCE: USERS (II) Original breach of contract and trademark infringement case turned into a civil procedure case about jurisdiction on appeal Burger King Corp, V. Rudzewicz = = = CASES = = = SESSION 1 CLICK SESSION N PRINT = = = CASES = = = ACTIONS USER ACTIONS: 10417TOTAL SESSIONS: 9758

13 AI & THE RANKING PROBLEM Supervised Machine Learning (Ranker SVM) –Iteratively retrieve and rank documents –Incorporate all available cues: text similarity, classifications, citations, user behavior and query logs –All of this requires lots of data! Training & Validation –Gold data: hand-crafted research reports covering a variety of legal issues –Report contains an issue statement, multiple queries, all seminal, highly relevant documents, some relevant docs > 100K documents judged against ~400 legal issues –System was also tested by an independent 3 rd party 13

14 HADOOP FOR BIG DATA PROCESSING At launch, query logs contained ~ 2 Billion records –Queries & user actions Relied on a Hadoop cluster to –Extract, Transform, and Load processes. –Cluster similar queries together –Extract, normalize, collate citation contexts Dramatic improvement in processing times –From tens of hours to tens of minutes 14

15 COMPUTATIONNORMAL TIMEHADOOP TIME Building complete Westlaw dictionary 2.5 days1 hour Clustering similar Westlaw queries 1.5 days3 minutes Citation extraction from over 10 M documents 1.25 days3 hours HADOOP: TYPICAL SPEED UPS

16 CLUSTER CONFIGURATION: QUERIES 8 machines, each with 16 cores Only 14 cores/machine were available for processing –Giving a total of 112 cores Block size of 64 MB –Each core processes one block at a time Cluster can process 7 GB at each step Latest cluster is twice the size: 224 cores –Almost 1 TB of memory and over 1 PB of storage 16

17 THE POWER OF EXPERT SEARCH Leverages expertise of community: authors, editors, & users –We know why documents are linked –We know exactly who our users are Metadata, authority & aggregated user data all contribute to relevance, importance & popularity Can still benefit from Power Law phenomena so common on the Web Can exploit data parallelism to achieve the same kind of scale as horizontal search 17

18 LESSONS LEARNED Vertical search is not just about search –Its about findability Includes navigation, recommendations, clustering, faceted classification, etc. –Its about satisfying a set of well-understood tasks Usually on enhanced content Usually for expert customers Leveraging human value addition is key –None of the human actors set out to improve search Difficult to design complete solution upfront –Need platform for experimentation and validation at scale 18

19 QUESTIONS? A relevant paper is downloadable from http://labs.thomsonreuters.com 19


Download ppt "HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development."

Similar presentations


Ads by Google