From Words to Meaning to Insight Julia Cretchley & Mike Neal.

Slides:



Advertisements
Similar presentations
From Words to Meaning to Insight
Advertisements

From Words to Meaning to Insight Copyright Leximancer 2011.
eClassifier: Tool for Taxonomies
© Megaputer intelligence, Inc. Your Knowledge Partner Survey Analysis using PolyAnalyst TM.
Configuration management
UMB Faculty Research Profiles Overview SciVal ® is a registered trademark of Elsevier Properties S.A.
Fathom Overview Workshop on using Fathom in School Improvement Planning (SIP)
Updated as of July 16, 2013 User Productivity Kit (UPK)
Reprographics In offices many different types of documents have to be copied – This is called REPROGRAPHICS.
© Paradigm Publishing, Inc Word 2010 Level 2 Unit 1Formatting and Customizing Documents Chapter 2Proofing Documents.
Orchard Harvest™ LIS Review Results Training
MINING FEATURE-OPINION PAIRS AND THEIR RELIABILITY SCORES FROM WEB OPINION SOURCES Presented by Sole A. Kamal, M. Abulaish, and T. Anwar International.
Chapter 1 - An Introduction to Computers and Problem Solving
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Extracting data from reports into Excel What is involved in mining report data for Excel? What is involved in mining report data for Excel? Why export.
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
Concepts of Version Control A Technology-Independent View.
Customizing Word Microsoft Office Word 2007 Illustrated Complete.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Basic Concept of Data Coding Codes, Variables, and File Structures.
From Words to Meaning to Insight Julia Cretchley & Mike Neal.
Basic Printing using Word. 3 Components required for printing Computer Computer Printer Printer Paper Paper.
Mining and Summarizing Customer Reviews
Lecturer: Ghadah Aldehim
Information Retrieval – and projects we have done. Group Members: Aditya Tiwari ( ) Harshit Mittal ( ) Rohit Kumar Saraf ( ) Vinay.
An Introduction to Qualitative Data Analysis (QDA) with Atlas.ti Ronald J. Shope Office of Qualitative & Mixed Methods Research Presentation to SSP March.
XP Practical PC, 3e Chapter 10 1 Writing and Printing Documents.
Defining Styles and Automatically Creating Table of Contents and Indexes Word Processing 4.03.
Comparing Printers Computer Concepts Unit B. Comparing Printers What type of printer should I get for my home or school work? If will print text and some.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
CSCI 1101 Intro to Computers 7.1 Learning HTML. 2 Introduction Web pages are written using HTML Two key concepts of HTML are:  Hypertext (links Web pages.
Department of Chemical Engineering Project IV Lecture 3: Literature Review.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
Get insights. Get educated. Get inspired. Enter the title of the presentation here. Name of the speaker. COMPANY. XX/09/2014. Get insights. Get educated.
McGraw-Hill Career Education © 2008 by the McGraw-Hill Companies, Inc. All rights reserved. Office Word 2007 Lab 2 Revising and Refining a Document.
Systems Analysis & Design 7 th Edition Chapter 5.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Final Presentation Industrial project Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,
Part4 Methodology of Database Design Chapter 07- Overview of Conceptual Database Design Lu Wei College of Software and Microelectronics Northwestern Polytechnical.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
Welcome Getting Started with Web Services Presenter: Kirby Fitch, Digital Measures.
Information Retrieval
Chapter 10 Writing and Printing Documents. 2Practical PC 5 th Edition Chapter 10 Getting Started In this Chapter, you will learn: − How word processing.
Concept Mapping: A Graphical System for Understanding the Relationship between Concepts. ERIC Digest.
How Are Computers Programmed? CPS120: Introduction to Computer Science Lecture 5.
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Analysis of Data Qualitative Data Analysis
Content analysis as a method
Inquiry, Pedagogy, & Technology: Automated Textual Analysis of 30 Refereed Journal Articles David A. Thomas Mathematics Center, University of Great Falls,
UNIT 15 Webpage Creator.
Information Retrieval
Text Mining & Natural Language Processing
DATABASES WHAT IS A DATABASE?
Chapter 11: Printers IT Essentials v6.0 Chapter 11: Printers
Data Analysis, Interpretation, and Presentation
Presentation transcript:

From Words to Meaning to Insight Julia Cretchley & Mike Neal

 Content Analysis  What is Leximancer?  Steps to your first analysis  In-depth Leximancer Outline

 Leximancer is a software tool designed for analyzing natural language text data  Uses statistics-based algorithms Initial analysis in minutes  Automatically analyzes a text collection User can direct search, add, remove, merge terms  Extracts semantic (meaning) and relational information (more later)  Outputs include concept map, network cloud, quantitative data, concept thesaurus What Is Leximancer?

Leximancer Overview Text

"We use the Laser 500 printer here at the office. We are pretty happy with it. Once there was a leak and all the toner spilled out of the machine, but a technician came out and fixed the problem for us. We still have to top the toner up often. The printer goes through ink quickly and the cartridges are expensive, but we put up with this because it delivers good results reliably. We are pleased with the quality of rinting we get. The Laser 500 can batch process, and collate the pages to save us time. Sometimes paper gets jammed in the Laser 500. Then we have to open it up to remove the crumpled paper. We have tried other machines in the past, but have not found an alternative that works better for us. ” What is this text about? (one main topic) Let’s Look at Some Text

 Terms around a word indicate its meaning  Word associations discover concepts; language independent  Leximancer concept: A group of related words that travel together in the text Evidence words include synonyms and adjectives  They begin as seed words for coding and evolve to a thesaurus word-like, Name-like (proper nouns), and compounds (United States) Concept Extraction

 A few things to note... Several concepts may be in a single sentence Concept may span multiple sentences Adjustable resolution (default: 2 sentences) Stop lists remove common words (the, and)  Algorithms Threshold of evidence words for a concept must be present to be coded in a block of text Concept can be coded with evidence words, even if the actual seed word (printer) is not present Concept Extraction cont

"We use the laser 500 printer here at the office. We are pretty happy with it. Once there was a leak and all the toner spilled out of the machine, but a technician came out and fixed the problem for us. We still have to top the toner up often. The printer goes through ink quickly and the cartridges are expensive, but we put up with this because it delivers good results reliably. We are pleased with the quality of rinting we get. The laser 500 can batch process, and collate the pages to save us time. Sometimes paper gets jammed in the laser 500. Then we have to open it up to remove the crumpled pages. We have tried other machines in the past, but have not found an alternative that works better for us. ” Leximancer divides into two sentence units (configurable) Concept Extraction Units of Resolution

"We use the Laser 500 printer here at the office. We are pretty happy with it. Once there was a leak and all the toner spilled out of the machine, but a technician came out and fixed the problem for us. We still have to top the toner up often. The printer goes through ink quickly and the cartridges are expensive, but we put up with this because it delivers good results reliably. We are pleased with the quality of rinting we get. The Laser 500 can batch process, and collate the pages to save us time. Sometimes paper gets jammed in the Laser 500. Then we have to open it up to remove the crumpled paper. We have tried other machines in the past, but have not found an alternative that works better for us.” printer concept: paper concept: laser 500, toner, machine, rinting pages, crumpled, jammed printer Laser 500 toner machine machines rinting paper jammed crumpled paper pages Concept Extraction Units of Resolution

 Semantic meaning created through conceptual analysis Presence and frequency of words, phrases Co-occurrence of words make a concept Explicit and implicit concepts identified ( tsunami and earthquake imply Japan)  Relationships created through concept co- occurrence Semantic and Relational Analysis

 Themes Collection of related concepts in close proximity on the map Theme name is most prominent concept  Concept map display Size of dots means frequency of occurrence Line between concepts show relationships Map proximity is by shared friends links (LinkedIn)  Concept map becomes interface to explore underlying text Themes and Concept Map

Laser 500 machine toner rinting Concept and Theme Creation printer pages crumpled jammed Evidence words (thesaurus) Concepts 2 co-occurrences of printer and paper paper

Additional Features  Thesaurus (coding dictionary) automatically generated No manual coding required Profiling and directed coding supported  Analyst can seed their own terms  Sentiment lens feature for affective analysis  Discourse analysis of speakers supported  Survey data analysis supported

 Automated, statistical approach How do you do this manually? No data management, dictionary creation and updates  User does not have to formulate a coding scheme This saves time, and Avoids introduction of researcher bias (grounded theory)  Nuances, subtleties, distinction in expression Word association approach most likely to identify these  Evidence words with links from Leximancer allows deeper exploration, documentation of findings Key Points Summary

Questions?