Presentation is loading. Please wait.

Presentation is loading. Please wait.

Interfaces for Intense Information Analysis

Similar presentations


Presentation on theme: "Interfaces for Intense Information Analysis"— Presentation transcript:

1 Interfaces for Intense Information Analysis
Marti Hearst UC Berkeley This research funded by ARDA

2 Outline A contrast Goals for three user groups
Search vs. Analysis Goals for three user groups Intelligence Analysts Biomedical Researchers Investigative Reporters Our current interface design

3 Finding hay in a haystack
Search vs. Analysis Search: Finding hay in a haystack Analysis: Creating new hay

4 UIs for Search vs. Analysis
A necessary but undesirable step in a larger task UI should not draw attention to itself UI should be very easy to use for everyone Analysis: The larger task UI can be more of a “science project” But UI should have “flow”

5 General Goals Support hypothesis formation / refutation Flow
Easy creation, destruction, and cataloging of connections and coverage Easy movement between multiple views Represent: Multiple supporting clues Conflicting evidence Uncertainty Timeliness Non-monotonicity

6 Intelligence Analysts

7 Intelligence Analysts
I have recently interviewed several active counter-terrorist analysts Great diversity in Goals Computing environments Biggest problems are social/systemic Many mundane IT problems as well

8 Mundane IT Problems System incompatibilities Data reformatting
Data cleaning Documenting sources Archiving materials

9 Intelligence Analysts: Problem 1
Look at a series of reports, images, communication patterns; Try to build a model of what is going on Follow leads Compare to previous situations Recent problem: Groups are changing their behavior patterns quickly Very little use of sophisticated software tools

10 Intelligence Analysts: Problem 2
Given a large collection “Roll around” in the data See what has been “touched” Tools should indicate which parts of the collection have been examined and which have yet to be looked at, and by whom View data in several different ways Data reduction methods such as MDS, SVD, and clustering often hide important trends.

11 Intelligence Analysts: Problem 2
Don’t show the obvious e.g., Cheney is president Don’t show what you’ve already shown Only show the most recent version Show which info is not present Changes in the usual pattern Something stops happening

12 Intelligence Analysts: Problem 3
Prepare a very short executive summary for the purposes of policy making Really the culmination of a cascade of summaries Reps from different agencies meet and “pow-wow” to form a view of the situation Rarely, but crucially, must be able to refer back to original sources and reasoning process for purposes of accountability

13 BioInformatics Researchers

14 BioInformatics Example 1
How to discover new information … … As opposed to discovering which statistical patterns characterize occurrence of known information. Method: Use large text collections to gather evidence to support (or refute) hypotheses Make Connections Gather Evidence

15 Etiology Example Don Swanson example, 1991 Goal: find cause of disease
Magnesium-migraine connection Given medical titles and abstracts a problem (incurable rare disease) some medical expertise find causal links among titles symptoms drugs results

16 Gathering Evidence stress CCB migraine magnesium magnesium PA SCD

17 Gathering Evidence stress CCB PA SCD migraine magnesium

18 Swanson’s Linking Approach
Two of his hypotheses have received some experimental verification. His technique Only partially automated Required medical expertise

19 BioInformatics Example 2:
How to find functions of genes? Have the genetic sequence Don’t know what it does But … Know which genes it coexpresses with Some of these have known function So …infer function based on function of co-expressed genes This is problem suggested by Michael Walker and others at Incyte Pharmaceuticals

20 Gene Co-expression: Role in the genetic pathway
Kall. Kall. g? h? PSA PSA PAP PAP g? Other possibilities as well

21 Make use of the literature
Look up what is known about the other genes. Different articles in different collections Look for commonalities Similar topics indicated by Subject Descriptors Similar words in titles and abstracts adenocarcinoma, neoplasm, prostate, prostatic neoplasms, tumor markers, antibodies ...

22

23 Formulate a Hypothesis
Hypothesis: mystery gene has to do with regulation of expression of genes leading to prostate cancer New tack: do some lab tests See if mystery gene is similar in molecular structure to the others If so, it might do some of the same things they do

24 Investigative Reporter Example
Looking for trends in online literature Create, support, refute hypotheses

25 Investigative Reporter Example
Clustering Corpus-level statistics, Co-occurrence statistics Contrasting collection statistics What are the current main topics? What are the new popular terms? How do they track with the news?

26 Investigative Reporter Example
Named-entity recognition Creating a list of terms Apply the list to a Subcollection Create regex rules with POS information How long after a new Star Trek series comes on the air before characters from the series appear in stories? How often do Klingons initiate attacks against Vulcans, vs. the converse?

27 LINDI Summary Term Set Query Analysis Document Set File Help
New Merge All terms: * a c u y m z Diseases: emphysema cancer hypertension … Query x x Analysis Document Set All documents: * WHO: organization = world health organization

28 Thank you! For more information: bailando.sims.berkeley.edu/lindi.html


Download ppt "Interfaces for Intense Information Analysis"

Similar presentations


Ads by Google