Presentation is loading. Please wait.

Presentation is loading. Please wait.

Redefining Search Technology Solutions for Better Information Access ASIDIC Spring Meeting Eric Bregand, Chief Executive Officer TEMIS Tampa (FL) – March.

Similar presentations


Presentation on theme: "Redefining Search Technology Solutions for Better Information Access ASIDIC Spring Meeting Eric Bregand, Chief Executive Officer TEMIS Tampa (FL) – March."— Presentation transcript:

1 Redefining Search Technology Solutions for Better Information Access ASIDIC Spring Meeting Eric Bregand, Chief Executive Officer TEMIS Tampa (FL) – March 09

2 Copyright © 2009 TEMIS – All rights reserved2 What Have We Learned So Far?  Users back as heart of solutions Engage, Empower, Ease of Use & Trust (E3T)  Information accuracy is key #1 criteria for churn  Web 2.0/3.0 as backbone for information delivery Semantic Search Digital desktop “Give users what they want before they know they want it”

3 Copyright © 2009 TEMIS – All rights reserved3 Where are we?

4 Agenda 1.Introduction to Text Mining 2.Text Mining for information consumers 3.Text Mining for information producers 4.Moving forward >> Text Mining Web Services 5.Summary and Q&A

5 Copyright © 2009 TEMIS – All rights reserved5 Term Entity Fact Knowledge What is Text Mining? Example! ProductDosingActionTargetStateEventAction Potential Adverse Effect Drug = Trimilax Dosing = 500mg Symptom = Tireness When = After administration DrugSymptomCondition Prop.Num.Abrev.Verb /3rdPron.Adj.Prep.Noun Verb Trimilax500makes mefeel afteringestionmg dizzy

6 Copyright © 2009 TEMIS – All rights reserved6 Text Mining? Understand! Title:Google gives drivers a hand at the gas pumps Source:InformationWeek Author:Antone Gonsalves Date: November 7, 2007 Metadata Entities Facts

7 Copyright © 2009 TEMIS – All rights reserved7 Text Mining? Understand! Linux United States Open-source … Google T-Mobile HTC Qualcomm Motorola Atlanta Locations National Association of Conveni… Organizations Lucy Sackett Persons Internet Technologies Gilbarco Veeder-Root Companies InformationWeek Sackett Gilbarco Entities Facts Metadata Product New ServiceGoogle Service

8 Copyright © 2009 TEMIS – All rights reserved8 Text Mining? Understand! Launch GilbarcoGoogle Service GilbarcoNew service Announcement Partnership GilbarcoGoogle SackettInformationWeek Function SackettGilbarco Alliance Google HTC Qualcomm Motorola T-Mobile Entities Facts Metadata Announcement Who: Gilbarco Whom: unknown What: New Service When: unknown Who: Gilbarco What: Google Service When: early next week Launch Who: Sackett Company: Gilbarco Function: spoke woman Function Who: Gilbarco With whom: Google When; unknown State: Negative Partnership Who: Google With whom: T- Mobile, HTC, Qualcom, Motorola When: unknown Alliance Announcement Who: Sackett Whom: InformationWeek When: unknown What: unknown

9 Copyright © 2009 TEMIS – All rights reserved9 entities Company Disease Drug Treatment Tissue Person Effect Time Symptom Compound Product Protein Activation Adverse Effect Acquisition Partnership Co-Development Drug Approval Drug Launch Licensing Protein Inhibition Gene Expression Patent Filing Clinical Trial Text Mining? Create Knowledge! Cell facts

10 Copyright © 2009 TEMIS – All rights reserved10 What is Text Mining?  Text Mining is an information access technology…  Text Mining generates Knowledge  Text Mining serves information consumers & producers Text Mining Back-End Data Repository Text Mining Front-End (Text Analytics)

11 Agenda 1.Introduction to Text Mining 2.Text Mining for Information Consumers 3.Text Mining for Information Producers 4.Moving forward >> Text Mining Web Services 5.Summary and Q&A

12 Copyright © 2009 TEMIS – All rights reserved12 Search Engines Today Scalable (billions docs) Pervasive (any sources) Live (any time)  Dynamic ? Fast (m’sec queries) Simple (list of documents)  Relevant?  Informative ? Document ProcessingQuery Index

13 Copyright © 2009 TEMIS – All rights reserved13 Text Analytics Today Index Text Mining Platform Entities & Concepts Events & Facts Occurrences & Position Search Discover Analyze Scalable (100k docs) Domain-centric Live (any time) Pertinent Collaborative

14 Copyright © 2009 TEMIS – All rights reserved14 Enhanced Searches with Text Mining Enrich Search Index with more & more relevant extracted information Document ProcessingQuery Text Mining Platform Index Business Centric Annotators Pertinent searches  Richer indexes  More relevant information Just better searches  No analysis  No discovery

15 Copyright © 2009 TEMIS – All rights reserved15 Beyond Search ! Discover & Analyze Document Processing Index Entities & Concepts Events & Facts Occurrences & Position Discover Analyze Query Informative  Easy reading with highlighting  Knowledge Discovery within info links Pertinent searches  Richer indexes  More relevant information Text Mining Platform

16 Copyright © 2009 TEMIS – All rights reserved16 Term Entity Concept Pertinence Gains – Beyond Terms… Pertinence Average Good Excellent Administration Federal Federal Drug Administration Regulation Agency Agency Swiss Regulatory Swiss Regulation Agency Drug “Search Regulation Agency” better than “Search FDA or Federal…”

17 Copyright © 2009 TEMIS – All rights reserved17 Term Entity Concept Proximity (Paragraph) Pertinence Gains – Beyond Doc’ts Co- Occurrence (Document) Facts (Sentence) Identify entities near by in document Identify entities near by in paragraphIdentify entities linked by semantic sense Proximit y Buy It was discovered by San Francisco- based Sugen, a biotechnology company that was purchased by pharmaceutical company Pharmacia Corp. …. Five months later, Pfizer bid for Pharmacia. The maker of the popular arthritis drug Celebrex and hair-loss treatment Rogaine… Pertinence Average Good Excellent

18 Copyright © 2009 TEMIS – All rights reserved18 Pertinence Gains – Benchmarks Relevance Average Good Excellent ConceptFacts ProximityEntity Term Co- Occurence Text Mining & Search Engine Standalone Search engine

19 Copyright © 2009 TEMIS – All rights reserved19 Key Feature Benefits  Combined Text Analytics & Search Stay fast & scalable But also become more pertinent & collaborative  End-user benefits = powerful search & discovery 1.Enhanced search 2.Guided navigation 3.Assisted document reading 4.Standardized data analysis and reporting 5.Information discovery 6.Collaborative platform

20 Copyright © 2009 TEMIS – All rights reserved20 1. Enhanced Search Experience Simple recognition of words… From standard keyword search….

21 Copyright © 2009 TEMIS – All rights reserved21 Make comprehensive and precise search Get more relevant documents Find what you don’t know! 1. Enhanced Search Experience … to Entity & Fact search! End-User Benefits

22 Copyright © 2009 TEMIS – All rights reserved22 2. Faceted Navigation From “narrow your search”….

23 Copyright © 2009 TEMIS – All rights reserved23 2. Faceted Navigation Get a quick vision of document content Navigate within context-relevant information Rapidly focus on targeted documents End-User Benefits … to multi-dimensional faceted navigation Self-adjusting filters to refine the search Ability to combine several filters at once (and/or) Point & Click filtering

24 Copyright © 2009 TEMIS – All rights reserved24 3. Assisted Document Reading From raw data display…

25 Copyright © 2009 TEMIS – All rights reserved25 3. Assisted Document Reading Instant spotting of relevant information Guided reading Get additional context (“Smart Link”) End-User Benefits … to targeted information viewing Instant access to relevant information Text Highlighting

26 Copyright © 2009 TEMIS – All rights reserved26 From bug view …. 4. Data Analysis and Reporting

27 Copyright © 2009 TEMIS – All rights reserved27 4. Data Analysis and Reporting … to bird- eye view! Visualize key Entities & Facts (pie/bar charts) Detect Entities & Facts dependencies (matrix charts) Zoom in & out by drilling anywhere End-User Benefits

28 Copyright © 2009 TEMIS – All rights reserved28 5. Information Discovery From flat list of documents ….

29 Copyright © 2009 TEMIS – All rights reserved29 5. Information Discovery … to information network Entities Facts Search Panel Discovery Tools Proofs Search in knowledge, not in documents Get a graphical representation of knowledge Discover information by navigating within Facts End-User Benefits

30 Copyright © 2009 TEMIS – All rights reserved30 6. Collaborative Platform  User Enriched Content Join 2 entities Ex: BASF = BASF Plant Sciences Re-assign entity Ex: Carl Zeiss = Company (instead of person) Remove entity Ex: BUT is not a company (although a French one) Add entity Ex: XyyyZ is a protein Increase information sharing Capitalize on knowledge Improve indexing quality End-User Benefits … to information producer! From information consumer…

31 Copyright © 2009 TEMIS – All rights reserved31 Best of 2 World?  Top-12 Features Keyword-based search****** **** Concept-based search**** **** “Did you mean?” **** **** Similar terms? **** **** “Narrow your search…” ****** **** Document highlighting ****** **** Categorization & Clustering ****** **** Charts & reports on results **** **** Faceted navigation **** **** Proximity analysis** **** **** Knowledge browsing **** **** Text Mining Search Combined

32 Agenda 1.Introduction to Text Mining 2.Text Mining for Information Consumers 3.Text Mining for Information Producers 4.Moving forward >> Text Mining Web Services 5.Summary and Q&A

33 Copyright © 2009 TEMIS – All rights reserved33 Text Mining as Core Component Product Management Web Content Management Text Mining Content Enrichment Related Topics Extraction Smart Linking Sentiment Analysis Trends Analysis & Charting Similarity Detection Content Annotation Metadata Extraction Taxonomy Management Automatic Categorization Entity & Facts Extraction Original Content Journal Scans Expert Interviews Event Reports Visitors & customers Content Editors Editorial & Content Management

34 Copyright © 2009 TEMIS – All rights reserved34 Text Mining Value Proposition 1.Enhance editorial productivity Reduce cost of creating information products Increase product quality and consistency Improve editorial team satisfaction & productivity 2.Enrich content for agile publishing Increase revenue & maximize content monetization Improve customer experience & loyalty Provide agility in creating faster smarter products Text Mining reduces the production costs and accelerates the delivery of information products

35 Copyright © 2009 TEMIS – All rights reserved35 1. Enhancing Editorial Productivity  Content categorization & alerts Content is automatically categorized according to editors’ preferences and expertise  Reduce time in integrating content  Extraction & normalization References, citations and metadata are automatically extracted and normalized  Ensure information consistency  Semantic and topical tagging Semantic tags and topics are suggested for editors’ review and approval  Speed-up the editorial process

36 Copyright © 2009 TEMIS – All rights reserved36 2. Enrich Content for Agile Publishing  Semantic content linking – navigate! Provide more relevant content in context by suggesting similar documents  Create more engaging, longer lasting user visits  Richer content tagging – find! Leverage the powerful content enrichment to better describe the content and then power accurate searches  Richer user experience through accurate answers & facets  Information Analytics – understand! Powerful analytics to slice & dice your content  Quickly assess the feasibility of new product ideas  Reach out to new audiences with smarter products

37 Copyright © 2009 TEMIS – All rights reserved37 Benefits to Information Producers  Create more engaging, longer lasting user visits Richer user experience with context sensitive information Enhanced page views per visits Exposing the “long tail” through suggestions and linking Integrate more content at a fraction of the cost  Establish your web properties as a community gateway “70% of all searches do NOT start on Google/MSN/Yahoo” says Sue Feldman at IDC Research Smart search and navigation are critical to user’s experience Increase stickiness of website to maximize ad revenue or subscription utilization!

38 Copyright © 2009 TEMIS – All rights reserved38 Better Search Capabilities! Example  Peshawar President Bush Islamic union Boycott Benazir Bhutto Pervez Musharraf John McCain Islamabad  Politics  Local  Washington  International  Business  News  Product Launch  Finance  M&A  Stock  Dow/Nasdaq  Deals  People  On the move  Interviews SEARCH Related documents Musharraf to hold early election Talibans positions move Administration reiterates support Benazir calls for resignation (more) In this document People President Bush Benazir Bhutto Organizations White House Locations Peshawar Washington (more) News Today Relevant Topic Extraction Automatic Categorization Pakistan polls boycott would help Musharraf : Bhutto 2 days ago PESHAWAR, Pakistan (AFP) — Former Pakistan prime minister Benazir Bhutto said Sunday an opposition boycott of upcoming polls would only help President Pervez Musharraf legitimise his imposition of emergency rule. Bhutto said she would meet early next week with former rival Nawaz Sharif, who has called for a boycott of the January 8 election, to discuss the issue. "If we all boycott elections, then it will give Musharraf a two-thirds majority in the parliament to validate his provisional constitutional order," she told a press conference in northwestern city of Peshawar, an Islamic political stronghold. "That is why we are saying that we will take part in elections under protest, but we will also leave the door open (to talks on a boycott)." "I am getting conflicting signals from Nawaz Sharif and Qazi Hussain Ahmad about (an) election boycott as they have filed nomination papers and if someone does that it means he is taking part in election," Bhutto told reporters. Pervez Musharraf Google Wikipedia LinkedIn Pervez Musharraf Google Wikipedia LinkedIn Smart Linking Entity Extraction Similarity

39 Copyright © 2009 TEMIS – All rights reserved39 Selected Business Cases Editorial Agile Publishing

40 Copyright © 2009 TEMIS – All rights reserved40 Editorial – Current BIODATA  Objectives Automate primary content acquisition (scientific literature, patents, business wires, sites, …) Automate primary content indexing (protein, genes, diseases, company, people, etc.)  Solution Web harvesting with QL2 Information extraction, categorization and alerting with Luxid® and packaged Annotators (BER, MER, CI)  Benefits Significant cost savings on data gathering and analysis Highly scalable framework covering multi-topics and thousands of sources

41 Copyright © 2009 TEMIS – All rights reserved41 Editorial – LexisNexis  Objectives Automatic categorization & indexation using legal controlled vocabulary Centralized Knowledge Easier access to Content  Solution Mondeca as Legal Ontology Luxid® with legal Annotator (custom made)  Benefits More efficient asset management and update Improved content quality and consistency More efficient search/navigation based on semantics

42 Copyright © 2009 TEMIS – All rights reserved42 Editorial – Nature Publishing Group  Objective Speed up the development of new online products Create high added value by automating tagging of scientific information (chemistry, biology & medicine)  Solution Luxid® 5.0 with Life Sciences Annotators [Chemistry, Medicine and Biology]  Benefits Currently launching a new service using chemical entity recognition Fostering new product ideas to increase web properties value and create new microsites

43 Copyright © 2009 TEMIS – All rights reserved43 Editorial – French Law Dep’t  Objective Streamline editorial process for legislation consolidation Extract automatically the consolidation instructions from the French legislative wire (“Journal Officiel” )  Solution Luxid® with a custom Annotator extracting legal entities and consolidation instructions (substitution/replacement, deletion, addition) Integration with a Documentum CMS  Benefits Time savings in scanning / screening the legislative wire Quality consistency thru semi-automatic consolidation Editors are more focused on added value

44 Copyright © 2009 TEMIS – All rights reserved44 Editorial – Search enhancement  Objective Increase search and retrieval quality with better part-of- speech tagging in German  Solution TEMIS XeLDA® to improve the indexing process Integration with Verity K2  Benefits Increase customer satisfaction by providing more accurate and comprehensive search results

45 Copyright © 2009 TEMIS – All rights reserved45 Editorial – AFP  Objective Build the new AFP cross media platform of information access (B2B « Image Forum » platform).  Solution Luxid® with People, Location, Organization, Company and IPTC codes annotators Integration with an ontology management tool and a search engine  Benefits Uniform access to any AFP content (text, audio, video…) Make information access easier on 10M+ articles in 6 different languages, 10M+ images and between 2 and 3 millions of news articles per year

46 Copyright © 2009 TEMIS – All rights reserved46 Agile Publishing – Elsevier  Objective Develop a revolutionary database indexing the last 28 years in chemistry patent Provide an exceptional users’ experience by using “smart content”  Results ~20 Million Chemistry Patent documents Searchable by chemical reactions, solvents, reactants directly extracted from the documents Released by Elsevier-MDL in Nov  Currently TEMIS distributes the Chemical Entities Relationships Annotator in partnership with Elsevier

47 Copyright © 2009 TEMIS – All rights reserved47 Agile Publishing – Thomson  Objective = Rescue lost-data 49 bound volumes of Biological Abstracts® for 1926 to 1968 digitized using offshore resources Required to make the data searchable with the BIOSIS  Approach Use Luxid® entity extraction to obtain candidate terms from the titles and abstracts Map the extracted entities to the BIOSIS vocabulary Output the resulting indexing as XML for loading to the Content Management System

48 Copyright © 2009 TEMIS – All rights reserved48 Agile Publishing – Springer  Objective Mapping of meaningful words and phrases in journal articles to encyclopedia entries Identification of related documents in a pool of over three million journal articles  Solution Indexing of incoming journal articles to link journal articles with the related encyclopedia entry Creation of semantic fingerprint for each journal article to allow search engine calculate degree of relationship Integration with Springer’s search engine  Benefits Increased product sales by improving content linking

49 Copyright © 2009 TEMIS – All rights reserved49 Agile Publishing – EFL  Objectives Extract numerical data from case law to enhance information access for lawyers.  Solution Luxid® with custom annotators (address, activity, compensation, age, turnover…) Export numerical data as metadata to a search engine.  Benefits Productivity gain to extract and validate metadata Allowing to treat huge amount of case law

50 Agenda 1.Introduction to Text Mining 2.Text Mining for Information Consumers 3.Text Mining for Information Producers 4.Moving forward >> Text Mining Web Services 5.Summary and Q&A

51 Copyright © 2009 TEMIS – All rights reserved51 Market Expectations  On-the-fly annotation services Federated platform (web2.0/3.0) Serving all user/IT tools (browser, office, search, content management, …)  Text Mining Any Where Highly scalable Anytime (24/7) Any documents Any languages (US, European, Asian, Arabic, …)

52 Copyright © 2009 TEMIS – All rights reserved52 Market Expectations  On-the-fly annotation services  High-quality & accuracy Generic entities (people, company, …) Market-specific entities (drug, patient, court cases, …) Generic facts (acquisition, announcements, events, …) Market-specific facts (binding, activation, law suit, …) Disambiguation (Orange! Telco company? Location? Fruit?) Normalization (IBM Corp = IBM = I.B.M)

53 Copyright © 2009 TEMIS – All rights reserved53 Market Expectations  On-the-fly annotation services  High-quality & accuracy  More than just annotations Content enrichment with additional data GPS coordinate for locations, Chemical structure for drugs, … Information linking Content is about hyper linking Semantic mash-up Wikipedia for named entities (people, location, events, …) Google maps for geolocation Patents database for scientific literature …

54 Copyright © 2009 TEMIS – All rights reserved54 Receive annotated documents Text Mining Web Services Send documents Content Annotation Web Services Receive annotated enriched documents Receive annotated enriched & linked documents Persistent Content Repositories Text Mining Services Content Hyper- linking Web Services Text Mining Services Customer Data Public Data Content Enrichment Web Services Text Mining Services

55 Copyright © 2009 TEMIS – All rights reserved55 Back-up Environment TEMIS – Luxid® Web Services Production Environment Create/Update Annotation Plans Create/Update Annotation Workflows TEMIS, Inc. HTTPS WEB SERVICES Create/Update Skill Cartridges™ Create/Update Classification Plans Install/Upload Skill Cartridges™ Luxid® Knowledge Studio On-Demand Annotation Triggered by manual intervention On-the-Fly Annotation Triggered by automatic call Luxid® Administration Console HTTPS Browser HTTPS Browser Secured FTP Remote Administration Monitoring & Administration

56 Copyright © 2009 TEMIS – All rights reserved56 TEMIS – Company Background  TEMIS = TExt MIning Solutions Software company created in 2000 Dual Headquarters in Philadelphia & Paris Acquisition of Xerox Linguistics (20 years of R&D)  Leader in Publishing and Life Sciences Text Mining Over 200 clients in Pharma and B-to-B publishing Founding member of UIMA’s OASIS committee  Flagship software product Top-20 most innovative products across Europe Enable organizations to better interact with their environment by extracting knowledge and making sense of content

57 Agenda 1.Introduction to Text Mining 2.Text Mining empower Search Engines 3.Text Mining for Publishers 4.Moving forward – Text Mining Web Services 5.Summary and Q&A

58 Copyright © 2009 TEMIS – All rights reserved58 Summary  Content Enrichment is critical For End-Users For Publishers For any information consumers and producers

59 Copyright © 2009 TEMIS – All rights reserved59 Summary  Content Enrichment is critical  More than Content Enrichment is expected Content is about linking (Hyper-linking) Semantic mash-up

60 Copyright © 2009 TEMIS – All rights reserved60 Summary  Content Enrichment is critical  More than content enrichment is expected  Text-Mining plays an important role Proven technology Key component in information access technology stack Wide range of services (from basic tagging to semantic linking)

61 Copyright © 2009 TEMIS – All rights reserved61 Summary  Content Enrichment is critical  More than content enrichment is expected  Text-Mining plays an important role  Key business benefits Reduce cost of creating information products Increase revenue & maximize content monetization

62 Copyright © 2009 TEMIS – All rights reserved62 Summary  Content Enrichment is critical  More than content enrichment is expected  Text-Mining plays an important role  Key business benefits  Immediate impacts Improve editorial team satisfaction & productivity Enhance product quality and consistency Increase customer experience & loyalty

63 Questions? Thank you! ASIDIC Spring Meeting Eric Bregand, Chief Executive Officer TEMIS Tampa (FL) – March 09


Download ppt "Redefining Search Technology Solutions for Better Information Access ASIDIC Spring Meeting Eric Bregand, Chief Executive Officer TEMIS Tampa (FL) – March."

Similar presentations


Ads by Google