We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byJordan Roff
Modified over 2 years ago
Text Mining: Tools, Techniques, and Applications Nathan Treloar President AvaQuest, Inc.
© 2002, AvaQuest Inc. Outline Text Mining Defined Foundations of Text Mining Example Applications User Interface Challenges The Future
© 2002, AvaQuest Inc. Mining Medical Literature Medical research Find causal links between symptoms or diseases and drugs or chemicals.
© 2002, AvaQuest Inc. A Real Example Research objective: – Follow chains of causal implication to discover a relationship between migraines and biochemical levels. Data: – medical research papers, medical news (unstructured text information) Key concept types: – symptoms, drugs, diseases, chemicals…
© 2002, AvaQuest Inc. Example Application: Medical Research stress is associated with migraines stress can lead to loss of magnesium calcium channel blockers prevent some migraines magnesium is a natural calcium channel blocker spreading cortical depression (SCD) is implicated in some migraines high levels of magnesium inhibit SCD migraine patients have high platelet aggregability magnesium can suppress platelet aggregability (source: Swanson and Smalheiser, 1994)
© 2002, AvaQuest Inc. Text Mining Defined Discover useful and previously unknown gems of information in large text collections
© 2002, AvaQuest Inc. Search versus Discover Data Mining Text Mining Data Retrieval Information Retrieval Search (goal-oriented) Discover (opportunistic) Structured Data Unstructured Data (Text)
© 2002, AvaQuest Inc. Data Retrieval Find records within a structured database. Database TypeStructured Search ModeGoal-driven Atomic entityData Record Example Information NeedFind a Japanese restaurant in Boston that serves vegetarian food. Example QuerySELECT * FROM restaurants WHERE city = boston AND type = japanese AND has_veg = true
© 2002, AvaQuest Inc. Information Retrieval Find relevant information in an unstructured information source (usually text) Database TypeUnstructured Search ModeGoal-driven Atomic entityDocument Example Information NeedFind a Japanese restaurant in Boston that serves vegetarian food. Example QueryJapanese restaurant Boston or Boston->Restaurants->Japanese
© 2002, AvaQuest Inc. Data Mining Discover new knowledge through analysis of data Database TypeStructured Search ModeOpportunistic Atomic entityNumbers and Dimensions Example Information NeedShow trend over time in # of visits to Japanese restaurants in Boston Example QuerySELECT SUM(visits) FROM restaurants WHERE city = boston AND type = japanese ORDER BY date
© 2002, AvaQuest Inc. Text Mining Discover new knowledge through analysis of text Database TypeUnstructured Search ModeOpportunistic Atomic entityLanguage feature or concept Example Information NeedFind the types of food poisoning most often associated with Japanese restaurants Example QueryRank diseases found associated with Japanese restaurants
© 2002, AvaQuest Inc. Motivation for Text Mining Approximately 90% of the worlds data is held in unstructured formats (source: Oracle Corporation) Information intensive business processes demand that we transcend from simple document retrieval to knowledge discovery. 90% Structured Numerical or Coded Information 10% Unstructured or Semi-structured Information
© 2002, AvaQuest Inc. Challenges of Text Mining Very high number of possible dimensions – All possible word and phrase types in the language!! Unlike data mining: – records (= docs) are not structurally identical – records are not statistically independent Complex and subtle relationships between concepts in text – AOL merges with Time-Warner – Time-Warner is bought by AOL Ambiguity and context sensitivity – automobile = car = vehicle = Toyota – Apple (the company) or apple (the fruit)
© 2002, AvaQuest Inc. The Emergence of Text Mining Advances in text processing technology – Natural Language Processing (NLP) – Computational Linguistics Cheap Hardware! – CPU – Disk – Network
© 2002, AvaQuest Inc. Text Processing Statistical Analysis – Quantify text data Language or Content Analysis – Identifying structural elements – Extracting and codifying meaning – Reducing the dimensions of text data
© 2002, AvaQuest Inc. Statistical Analysis Use statistics to add a numerical dimension to unstructured text Term frequency Document length Document frequency Term proximity
© 2002, AvaQuest Inc. Content Analysis Lexical and Syntactic Processing – Recognizing tokens (terms) – Normalizing words – Language constructs (parts of speech, sentences, paragraphs) Semantic Processing – Extracting meaning – Named Entity Extraction (People names, Company Names, Locations, etc…) Extra-semantic features – Identify feelings or sentiment in text Goal = Dimension Reduction
© 2002, AvaQuest Inc. Syntactic Processing Lexical analysis – Recognizing word boundaries – Relatively simple process in English Syntactic analysis – Recognizing larger constructs – Sentence and Paragraph Recognition – Parts of speech tagging – Phrase recognition
© 2002, AvaQuest Inc. Named Entity Extraction Identify and type language features Examples: People names Company names Geographic location names Dates Monetary amount Others… (domain specific)
© 2002, AvaQuest Inc. Simple Entity Extraction The quick brown fox jumps over the lazy dog Noun phrase Mammal Canidae Mammal Canidae
© 2002, AvaQuest Inc. Entity Extraction in Use Categorization – Assign structure to unstructured content to facilitate retrieval Summarization – Get the gist of a document or document collection Query expansion – Expand query terms with related typed concepts Text Mining – Find patterns, trends, relationships between concepts in text
© 2002, AvaQuest Inc. Extra-semantic Information Extracting hidden meaning or sentiment based on use of language. – Examples: Customer is unhappy with their service! Sentiment = discontent Sentiment is: – Emotions: fear, love, hate, sorrow – Feelings: warmth, excitement – Mood, disposition, temperament, … Or even (someday)… – Lies, sarcasm
© 2002, AvaQuest Inc. Text Mining: General Applications Relationship Analysis – If A is related to B, and B is related to C, there is potentially a relationship between A and C. Trend analysis – Occurrences of A peak in October. Mixed applications – Co-occurrence of A together with B peak in November.
© 2002, AvaQuest Inc. Text Mining: Business Applications Ex 1: Decision Support in CRM - What are customers typical complaints? - What is the trend in the number of satisfied customers in Cleveland? Ex 2: Knowledge Management – People Finder Ex 3: Personalization in eCommerce - Suggest products that fit a users interest profile (even based on personality info).
© 2002, AvaQuest Inc. The Needs: – Analysis of call records as input into decision-making process of Banks management – Quick answers to important questions Which offices receive the most angry calls? What products have the fewest satisfied customers? (Angry and Satisfied are recognizable sentiments) – User friendly interface and visualization tools Example 1: Decision Support using Bank Call Center Data
© 2002, AvaQuest Inc. Example 1: Decision Support using Bank Call Center Data The Information Source: – Call center records – Example: AC2G31, 01, 0101, PCC, 021, , NEW YORK, NY, H-SUPRVR8, STMT, mr stark has been with the company for about 20 yrs. He hates his stmt format and wishes that we would show a daily balance to help him know when he falls below the required balance on the account.
© 2002, AvaQuest Inc. Example 1: Call Volume by Sentiment
© 2002, AvaQuest Inc. The Needs: - Find people as well as documents that can address my information need. - Promote collaboration and knowledge sharing - Leverage existing information access system - The Information Sources: - , groupware, online reports, … Example 2: KM People Finder
© 2002, AvaQuest Inc. Example 2: Simple KM People Finder Relevant Docs Search or Navigation System Name Extractor Authority List Query Ranked People Names
© 2002, AvaQuest Inc. Example 2: KM People Finder
© 2002, AvaQuest Inc. Example 3: Personalized Movie Matcher The Need: – Match movies to individuals based on preference profile The Information: – Written reviews of movies – Users lists of favorite movies. Movie Reviews Sentiment Analysis Typed and Tagged Reviews
© 2002, AvaQuest Inc. Sentiment Analysis of Movies: Visualization (after Evans) absurdity destruction fear horror immorality inferiority injustice insecurity deception death crime conflict 0 1 Action Romance
© 2002, AvaQuest Inc. Commercial Tools IBM Intelligent Miner for Text Semio Map InXight LinguistX / ThingFinder LexiQuest ClearForest Teragram SRA NetOwl Extractor Autonomy
© 2002, AvaQuest Inc. User Interfaces for Text Mining Need some way to present results of Text Mining in an intuitive, easy to manage form. Options: – Conventional text lists (1D) – Charts and graphs (2D) – Advanced visualization tools (3D+) Network maps Landscapes 3d spaces
© 2002, AvaQuest Inc. UI Challenges Simple lists, charts, and graphs not obviously applicable or difficult to work with due to high dimensionality of text Advanced visualization tools can be intimidating for the general community and are not readily accepted
© 2002, AvaQuest Inc. Charts and Graphs
© 2002, AvaQuest Inc. Visualization: Network Maps
© 2002, AvaQuest Inc. Visualization: Network Maps
© 2002, AvaQuest Inc. Visualization: Landscapes
© 2002, AvaQuest Inc. Visualization: 3D Spaces
© 2002, AvaQuest Inc. The Future Different tools and data, but common dimensions Example: – Find sales trends by product and correlate with occurrences of company name in business news articles – Dimensions: Time, Company names (or stock symbols), Product names, Regions
© 2002, AvaQuest Inc. Recent Events February 2002 – Meta Group posts report arguing for need to integrate business intelligence applications with knowledge management portals. March 2002 – SAS, leading provider of business intelligence software solutions, partners with Inxight to introduce true text mining product.
Text Analytics Prof Sunil Wattal. How happy is the world today?
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Copyright © Open Text Inc. All rights reserved. Libraries and Institutional Content Management Systems Carol Knoblauch, Product Manager, Open.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
What Is Text Mining? Also known as Text Data Mining Process of examining large collections of unstructured textual resources in order to generate new.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Text Analytics And Text Mining Best of Text and Data Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Trevor Crum 04/23/2014 *Slides modified from Shamil Mustafayev’s 2013 presentation * 1.
March, 2007RCO LLC, RCO Text Analysis Technologies for information extraction and business intelligence We can tell you everything about.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Shamil Mustafayev 04/16/
©2003 Paula Matuszek CSC 9010: Text Mining Applications Dr. Paula Matuszek (610)
Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010.
Language Recognition… Searching with Precision Santa Clara, CA October 31, 2001 Julian Henkin Vice President, Worldwide Customer Services LexiQuest, Inc.
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Text Mining In InQuery Vasant Kumar, Peter Richards August 25th, 1999.
Flexible Text Mining using Interactive Information Extraction David Milward
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,
Big Data Javad Azimi May First of All… Sorry about the language Feel free to ask any question Please share similar experiences.
Mining: Extracting Collaborative Activities from Akiko Murakami Koichi Takeda.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen D ö rre, Peter Gerstl, and Roland Seiffert.
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
1 Data Mining: Text Mining. 2 Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Loanee: Frank Rizzo Lender:
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
LECTURE 10: TEXT AS DATA April 13, 2015 SDS 136 Communicating with Data Portions of this slide deck adapted from J.Chuang University of Washington.
13 th September 2007 UK e-Science All Hands Meeting Text Mining Services to Support e-Research Brian Rea and Sophia Ananiadou National Centre for Text.
Authors: Jochen Doerre, Peter Gerstl, Roland Seiffert Adapted from slides by: Trevor Crum Presenter: Caitlin Baker Text Mining: Finding Nuggets in Mountains.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Landing the Raven: Positioning the Knowledge Discovery System in the Enterprise Wendi Pohs, Iris Associates
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Best Practices Using Enterprise Search Technology Aurelien Dubot Consultant – Media and Entertainment, Fast Search & Transfer (FAST) British Computer Society.
ACS1803 Lecture Outline 2 DATA MANAGEMENT CONCEPTS Text, Ch. 3 How do we store data (numeric and character records) in a computer so that we can optimize.
The LINDI Project Linking Information for New Discoveries UIs for building and reusing hypothesis seeking strategies. Statistical language analysis techniques.
Search Engines and Information Retrieval Chapter 1.
Google search in general Google Search, commonly referred to as Google Web Search or just Google, is a web search engine owned by Google Inc. It is.
© 2017 SlidePlayer.com Inc. All rights reserved.