UCB HCC Retreat Search Text Mining Web Site Usability Marti Hearst SIMS.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Information Retrieval Visualization CPSC 533c Class Presentation Qixing Zheng March 22, 2004.
Empirically Validated Web Page Design Metrics Melody Y. Ivory, Rashmi R. Sinha, Marti A. Hearst UC Berkeley CHI 2001.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
SIMS 213: User Interface Design & Development Marti Hearst Thurs, March 3, 2005.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Measuring Information Architecture CHI 01 Panel Position Statement Marti Hearst UC Berkeley.
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
Web TANGO Project Melody Ivory (PhD student) Rashmi Sinha (Postdoc) Marti Hearst (Research Advisor) Undergrads - Steve Demby Anthony Lee Dave Lai HCC Retreat.
Text Mining Tools: Instruments for Scientific Discovery Marti Hearst UC Berkeley SIMS Advanced Technologies Seminar June 15, 2000.
Measuring Information Architecture Marti Hearst UC Berkeley.
IBM Almaden, Oct 2000 Automating Assessment of Web Site Usability Marti Hearst University of California, Berkeley.
Measuring Information Architecture Marti Hearst UC Berkeley.
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Automating Assessment of Web Site Usability Marti Hearst Melody Ivory Rashmi Sinha University of California, Berkeley.
NEC Symposium 2000 Automating Assessment of Web Site Usability Marti Hearst University of California, Berkeley.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
UCB CS Research Fair Search Text Mining Web Site Usability Marti Hearst SIMS.
Measuring Information Architecture Marti Hearst UC Berkeley.
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
The LINDI Project Linking Information for New Discoveries UIs for building and reusing hypothesis seeking strategies. Statistical language analysis techniques.
Overview of Web Data Mining and Applications Part I
Chapter 10: Architectural Design
Overview of Search Engines
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
Overview of the Database Development Process
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Introduction to SDLC: System Development Life Cycle Dr. Dania Bilal IS 582 Spring 2009.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Scent Trails: Integrating Browsing and Searching on the Web Christopher Olson et al. Blake Adams November 4, 2003.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.
Heuristic evaluation Functionality: Visual Design: Efficiency:
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Personalized Search Xiao Liu
MBA7025_01.ppt/Jan 13, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Introduction - Why Business Analysis.
1 CS430: Information Discovery Lecture 18 Usability 3.
Slide 12.1 Chapter 12 Implementation. Slide 12.2 Learning outcomes Produce a plan to minimize the risks involved with the launch phase of an e-business.
MBA7020_01.ppt/June 13, 2005/Page 1 Georgia State University - Confidential MBA 7020 Business Analysis Foundations Introduction - Why Business Analysis.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Recuperação de Informação B Cap. 10: User Interfaces and Visualization , , 10.9 November 29, 1999.
Systems Analysis and Design in a Changing World, Fourth Edition
Text Mining Tools: Instruments for Scientific Discovery Marti Hearst UC Berkeley SIMS IMA Text Mining Workshop April 17, 2000.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Jane Reid, AMSc IRIC, QMUL, 30/10/01 1 Information seeking Information-seeking models Search strategies Search tactics.
National Technical University of Ukraine “Kiev Polytechnic Institute” Heat and energy design faculty Department of automation design of energy processes.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
Unclassified//For Official Use Only 1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
ASSOCIATIVE BROWSING Evaluating 1 Jin Y. Kim / W. Bruce Croft / David Smith by Simulation.
Assess usability of a Web site’s information architecture: Approximate people’s information-seeking behavior (Monte Carlo simulation) Output quantitative.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
TDM in the Life Sciences Application to Drug Repositioning *
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Document Clustering Matt Hughes.
Presentation transcript:

UCB HCC Retreat Search Text Mining Web Site Usability Marti Hearst SIMS

UCB HCC Retreat TileBars Scatter/Gather DynaCat Cat-a-Cone Search Interfaces: Past Projects

UCB HCC Retreat BAILANDO Projects Better Access to Information using Language Analysis and Novel Dynamic Organizations

UCB HCC Retreat Current BAILANDO Projects CHA-CHA: Web Search results in Context LINDI: UI support for Search Text Data Mining TANGO: Automated Web Site Usability

UCB HCC Retreat Search UIs Combine Browsing & Search Place Search Results in Context Large Category Hierarchies

UCB HCC Retreat Cha-Cha Students : Mike Chen, Jamie Laflen, Jason Hong, Jimmy Lin, Shiang Chen

UCB HCC Retreat Medical Category Hierarchy

UCB HCC Retreat DynaCat (Pratt, Hearst, & Fagan 99)

UCB HCC Retreat DynaCat Study Design Three queries 24 cancer patients Compared three interfaces ranked list, clusters, categories Results Participants strongly preferred categories Participants found more answers using categories Participants took same amount of time with all three interfaces Similar results have been verified by another study by Chen and Dumais (CHI 2000)

Cat-a-Cone Interface (Hearst & Karadi 97)

UCB HCC Retreat Improving Search via Large Category Hierarchies How to show intersections across category types? How to preview related categories in a user- tailored, dynamic manner?

UCB HCC Retreat Information retrieval Text Data Mining

UCB HCC Retreat Information retrieval Selection or rejection of existing documents based on a function of word match.

UCB HCC Retreat Text Data Mining Relationships between information in documents can create new facts, not previously known.

UCB HCC Retreat Imagine You are a medical researcher Your patient has spinal inflammation numbness in fingers low TC levels negative results for all tests How can you help her?

UCB HCC Retreat Idea A new way of searching text. Link pieces of information together to formulate hypotheses …

UCB HCC Retreat LINDI Linking Information for New DIscoveries Students: Barbara Rosario, David Blei Three main parts Search UI for building and reusing hypothesis seeking strategies. Statistical language analysis techniques for interpreting the text. Backend for interfacing with various databases and translating different formats.

UCB HCC Retreat Gathering Evidence Spinal Inflammation Numbness in fingers Low TC Levels

UCB HCC Retreat Gathering Evidence Spinal Inflammation Numbness in fingers Low TC Levels Find diseases associated with each

UCB HCC Retreat Gathering Evidence Spinal Inflammation Numbness in fingers Low TC Levels Find unanticipated commonalities

UCB HCC Retreat Supporting Cascaded Search Operations Spinal Inflammation Numbness in fingers Low TC Levels

UCB HCC Retreat

New Language Analysis First use category labels to retrieve candidate documents Then use language analysis to detect causal relationships between concepts Title: Magnesum deficiency implicated in increased stress levels. Interpretation: related-to Use these to find relationships and formulate hypotheses

UCB HCC Retreat Statistical Semantic Parsing Modern statistical techniques Mainly applied to syntactic structure Probabilistic knowledge representation Represent hypotheses with different degrees of certainty.

UCB HCC Retreat Automating Assessment of Web Site Usability

UCB HCC Retreat Why Worry?  Problem: IBM's extranet  Heavy use of help and search  Unhappy users  Solution  Massive web site redesign  Focus on info-organization, not the purchasing process.  Cost: "in the millions"  Results  Not announced or trumped up  Use of "help" decreased 84%  Sales increased 400%

UCB HCC Retreat Web TANGO Tool for Assessing NaviGation & Organization Student: Melody Ivory Goal: automated support for comparing design alternatives How: Assess usability of the information architecture Approximate people’s information-seeking behavior (Monte Carlo simulation) Output quantitative usability metrics

UCB HCC Retreat Anatomy of Web Site Design Courtesy of Mark Newman Information Architecture Navigation Design Information Design Graphic Design

UCB HCC Retreat Usability Evaluation Standard Techniques  User studies  Have people use the interface to complete some tasks  Requires an implemented interface  "Discount" vs. Scientific Results  Heuristic Evaluation  An expert assesses a design or implementation according to certain guidelines

UCB HCC Retreat Automated Usability Evaluation  Logging/capture  Pro: Easy  Con: Requires implemented system  Con: Don't know the user task (web)  Con: Don't present alternatives  Con: Don't distinguish error from success  Analytical Modeling  Pro: doable at design phase  Con: models an expert  Con: academic exercise  Simulation

UCB HCC Retreat Existing Metrics  Web metric analysis tools report on what is easy to measure, e.g.:  Predicted download time  Depth/breadth of site  We want to worry about  Content  User goals/tasks  Not available from logs  We also want to compare alternative designs.

UCB HCC Retreat Monte Carlo Simulation  Have a model of information structure  Have a set of user goals  Want to assess navigation structure  Compare alternatives/tradeoffs  Identify bottlenecks  Identify critically important pages/links  Check all pairs of start/end points  Check overall reachability before and after a change.

UCB HCC Retreat Monte Carlo Simulation  At each step in the simulation  Assume a probability distribution over a set of next choices.  The next choice is a function of:  The current goal  The understandability of the choice  The overall complexity of the set of choices  Prior interaction history  These can use models of "scent"  Varying the distribution corresponds to varying properties of the links  Spot-check important choices

UCB HCC Retreat One Monte Carlo simulation step for Design 1, Task 1. Simulation starts from the home page and the target information is at Renter Support. X

UCB HCC Retreat Monte Carlo simulation results for Design 1, Task 1. Simulation runs start from all pages in the site. Average Navigation times are shown for Tasks 2 & 3. X

UCB HCC Retreat Using Simulator Results Design Decisions Use Design 1 Improve Tasks 1 & 2 Next Steps Analyze results for Tasks 1 & 2 Create new Design 1 Repeat simulation to compare old & new designs Iterate if necessary

UCB HCC Retreat Research Issues: Navigation Predictions Develop IR model for predicting link selection Requirements Information need (task metadata) Representation of pages (page metadata) Method for selecting links (relevance ranking) Maintaining user’s conceptual model during site traversal (scent [Fur97,LC98,Pir97]) One possible approach Information Foraging Theory [PC95,Pir97,PPR96] Functional categorization of pages based on features Prediction of relevance to current page Consider link connectivity, text similarity & usage

UCB HCC Retreat Other HCC-Related Projects Using a large digital desk in design Ame Elliot Using visualization for light design Dan Glaser User interfaces and computer security Prof. Doug Tygar, Rachna Dahmija