Interfaces for Intense Information Analysis

Slides:



Advertisements
Similar presentations
Untangling Text Data Mining Marti Hearst UC Berkeley SIMS ACL’99 Plenary Talk June 23, 1999.
Advertisements

Lesson Overview 1.1 What Is Science?.
Lesson Overview 1.1 What Is Science?.
Chapter 3 Doing Sociological Research 1. Sociology & the Scientific Method The research process: 1.Developing a research question 2.Creating a research.
1 Viz Future Directions Marti Hearst UC Berkeley.
Text Tango: A New Text Data Mining Project Text Tango: A New Text Data Mining Project Marti A. Hearst GUIR Meeting, Sept 17, 1998.
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
1 Interfaces for Intense Information Analysis Marti Hearst UC Berkeley This research funded by ARDA.
Text Mining Tools: Instruments for Scientific Discovery Marti Hearst UC Berkeley SIMS Advanced Technologies Seminar June 15, 2000.
Getting Started: Research and Literature Reviews An Introduction.
Text Mining for Bioscience Applications: The State of the Art Marti Hearst University of California, Berkeley.
Untangling Text Data Mining Marti Hearst UC Berkeley SIMS ACL’99 Plenary Talk June 23, 1999.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst October 20, 2004.
UCB CS Research Fair Search Text Mining Web Site Usability Marti Hearst SIMS.
Next Steps in Literature Mining Marti Hearst UC Berkeley ASIST 2003 Literature Mining Panel.
The LINDI Project Linking Information for New Discoveries UIs for building and reusing hypothesis seeking strategies. Statistical language analysis techniques.
1 The BioText Project SIMS Affiliates Meeting Nov 14, 2003 Marti Hearst Associate Professor SIMS, UC Berkeley Projected sponsored by NSF DBI , ARDA.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Literature Review and Parts of Proposal
Chapter 7 Structuring System Process Requirements
Setting Up an RSS Feed 1 Project by iWEBbic.com 1.
Text Mining Tools: Instruments for Scientific Discovery Marti Hearst UC Berkeley SIMS IMA Text Mining Workshop April 17, 2000.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Copyright OpenHelix. No use or reproduction without express written consent1.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Scientific Method. Identify the Problem Finding out what you want answered or what problem you want to solve Finding out what you want answered or what.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Lesson Overview Lesson Overview What Is Science?.
Artificial Intelligence, simulation and modelling.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 1 Research: An Overview.
Report Writing Lecturer: Mrs Shadha Abbas جامعة كربلاء كلية العلوم الطبية التطبيقية قسم الصحة البيئية University of Kerbala College of Applied Medical.
Unit 9– Seminar Analyzing Content: Historical, Secondary, and Content Analysis and Crime Mapping Professor Chris Lim, MA, Ph.D.(ABD)
Yandell - Econ 216 Chap 1-1 Chapter 1 Introduction and Data Collection.
The Scientific Method.
Topic 1: Samples and Populations
Ricardo EIto Brun Strasbourg, 5 Nov 2015
Ch. 2: Planning a Study.
Review of Related Literature
Research Methodologies
Search Engine Architecture
The Desktop Screen image displayed when a PC starts up A metaphor
Experimental Psychology
Searching the Literature
Qualitative research: an overview
Statistical Data Analysis
Systems Analysis and Design in a Changing World, 6th Edition
Text Tango: A New Text Data Mining Project
Human Cells Human genomics
The Steps into creation of research
PubMed Database Interface (Basic Course Module 4 Part A)
Strength of Evidence; Empirically Supported Treatments
What Is Science? Read the lesson title aloud to students.
What Is Science? Read the lesson title aloud to students.
Data Warehousing and Data Mining
Untangling Text Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Frames Icons.
Finding Trends with Visualizations
Web Mining Department of Computer Science and Engg.
Statistical Data Analysis
What Is Science? Read the lesson title aloud to students.
How Do I Evaluate Workflow?
Data Mining: Concepts and Techniques
Lesson Overview 1.1 What Is Science?.
Analyzing and Organizing Information
Chapter 1 The Science of Biology
Put the Lesson Title Here
Lesson Overview 1.1 What Is Science?.
Presentation transcript:

Interfaces for Intense Information Analysis Marti Hearst UC Berkeley This research funded by ARDA

Outline A contrast Goals for three user groups Search vs. Analysis Goals for three user groups Intelligence Analysts Biomedical Researchers Investigative Reporters Our current interface design

Finding hay in a haystack Search vs. Analysis Search: Finding hay in a haystack Analysis: Creating new hay

UIs for Search vs. Analysis A necessary but undesirable step in a larger task UI should not draw attention to itself UI should be very easy to use for everyone Analysis: The larger task UI can be more of a “science project” But UI should have “flow”

General Goals Support hypothesis formation / refutation Flow Easy creation, destruction, and cataloging of connections and coverage Easy movement between multiple views Represent: Multiple supporting clues Conflicting evidence Uncertainty Timeliness Non-monotonicity

Intelligence Analysts

Intelligence Analysts I have recently interviewed several active counter-terrorist analysts Great diversity in Goals Computing environments Biggest problems are social/systemic Many mundane IT problems as well

Mundane IT Problems System incompatibilities Data reformatting Data cleaning Documenting sources Archiving materials

Intelligence Analysts: Problem 1 Look at a series of reports, images, communication patterns; Try to build a model of what is going on Follow leads Compare to previous situations Recent problem: Groups are changing their behavior patterns quickly Very little use of sophisticated software tools

Intelligence Analysts: Problem 2 Given a large collection “Roll around” in the data See what has been “touched” Tools should indicate which parts of the collection have been examined and which have yet to be looked at, and by whom View data in several different ways Data reduction methods such as MDS, SVD, and clustering often hide important trends.

Intelligence Analysts: Problem 2 Don’t show the obvious e.g., Cheney is president Don’t show what you’ve already shown Only show the most recent version Show which info is not present Changes in the usual pattern Something stops happening

Intelligence Analysts: Problem 3 Prepare a very short executive summary for the purposes of policy making Really the culmination of a cascade of summaries Reps from different agencies meet and “pow-wow” to form a view of the situation Rarely, but crucially, must be able to refer back to original sources and reasoning process for purposes of accountability

BioInformatics Researchers

BioInformatics Example 1 How to discover new information … … As opposed to discovering which statistical patterns characterize occurrence of known information. Method: Use large text collections to gather evidence to support (or refute) hypotheses Make Connections Gather Evidence

Etiology Example Don Swanson example, 1991 Goal: find cause of disease Magnesium-migraine connection Given medical titles and abstracts a problem (incurable rare disease) some medical expertise find causal links among titles symptoms drugs results

Gathering Evidence stress CCB migraine magnesium magnesium PA SCD

Gathering Evidence stress CCB PA SCD migraine magnesium

Swanson’s Linking Approach Two of his hypotheses have received some experimental verification. His technique Only partially automated Required medical expertise

BioInformatics Example 2: How to find functions of genes? Have the genetic sequence Don’t know what it does But … Know which genes it coexpresses with Some of these have known function So …infer function based on function of co-expressed genes This is problem suggested by Michael Walker and others at Incyte Pharmaceuticals

Gene Co-expression: Role in the genetic pathway Kall. Kall. g? h? PSA PSA PAP PAP g? Other possibilities as well

Make use of the literature Look up what is known about the other genes. Different articles in different collections Look for commonalities Similar topics indicated by Subject Descriptors Similar words in titles and abstracts adenocarcinoma, neoplasm, prostate, prostatic neoplasms, tumor markers, antibodies ...

Formulate a Hypothesis Hypothesis: mystery gene has to do with regulation of expression of genes leading to prostate cancer New tack: do some lab tests See if mystery gene is similar in molecular structure to the others If so, it might do some of the same things they do

Investigative Reporter Example Looking for trends in online literature Create, support, refute hypotheses

Investigative Reporter Example Clustering Corpus-level statistics, Co-occurrence statistics Contrasting collection statistics What are the current main topics? What are the new popular terms? How do they track with the news?

Investigative Reporter Example Named-entity recognition Creating a list of terms Apply the list to a Subcollection Create regex rules with POS information How long after a new Star Trek series comes on the air before characters from the series appear in stories? How often do Klingons initiate attacks against Vulcans, vs. the converse?

LINDI Summary Term Set Query Analysis Document Set File Help New Merge All terms: * a c u y m z Diseases: emphysema cancer hypertension … Query x x Analysis Document Set All documents: * WHO: organization = world health organization

Thank you! For more information: bailando.sims.berkeley.edu/lindi.html