Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Evaluation of Microarray Visualization Tools for Biological Insight Presented by Tugrul Ince and Nir Peer University of Maryland Purvi Saraiya Chris.

Similar presentations


Presentation on theme: "An Evaluation of Microarray Visualization Tools for Biological Insight Presented by Tugrul Ince and Nir Peer University of Maryland Purvi Saraiya Chris."— Presentation transcript:

1 An Evaluation of Microarray Visualization Tools for Biological Insight Presented by Tugrul Ince and Nir Peer University of Maryland Purvi Saraiya Chris North Dept. of Computer Science Virginia Polytechnic Institute and State University Karen Duca Virginia Bioinformatics Institute Virginia Polytechnic Institute and State University

2 2 Goals Evaluate five popular visualization tools Evaluate five popular visualization tools Cluster/Treeview Cluster/Treeview TimeSearcher TimeSearcher Hierarchical Clustering Explorer (HCE) Hierarchical Clustering Explorer (HCE) Spotfire Spotfire GeneSpring GeneSpring Do so in the context of bioinformatics data exploration Do so in the context of bioinformatics data exploration

3 3 Goals Research Questions Research Questions How successful are these tools in stimulating insight? How successful are these tools in stimulating insight? How do various visualization techniques affect the users’ perception of data? How do various visualization techniques affect the users’ perception of data? How does users’ background affect the tool usage? How does users’ background affect the tool usage? How do these tools support hypothesis generation? How do these tools support hypothesis generation? Can insight be measured in a controlled experiment? Can insight be measured in a controlled experiment?

4 4 Visualization Evaluations Typically evaluations consist of Typically evaluations consist of controlled measurements of user performance and accuracy on predetermined tasks controlled measurements of user performance and accuracy on predetermined tasks We are looking for an evaluation that better simulates a bioinformatics data analysis scenario We are looking for an evaluation that better simulates a bioinformatics data analysis scenario We use a protocol the focuses on We use a protocol the focuses on recognition and quantification of insights gained from actual exploratory use of visualizations recognition and quantification of insights gained from actual exploratory use of visualizations

5 5 Insights Hard to define what is an “insight” Hard to define what is an “insight” We need this term to be quantifiable and reproducible We need this term to be quantifiable and reproducible Solution Solution Encourage users to think aloud Encourage users to think aloud and report any findings they have about the dataset and report any findings they have about the dataset Videotape a session to capture and characterize individual insights as they occur Videotape a session to capture and characterize individual insights as they occur generally provides more information than subjective measures from post-experiment surveys generally provides more information than subjective measures from post-experiment surveys

6 6 Insights Define insight as Define insight as an individual observation about the data by the participant an individual observation about the data by the participant a unit of discovery a unit of discovery Essentially, any data observation made during the think aloud protocol Essentially, any data observation made during the think aloud protocol Now we can quantify some characteristics of each insight Now we can quantify some characteristics of each insight

7 7 Insight Characteristics Observation Observation The actual finding about the data The actual finding about the data Time Time The amount of time taken to reach the insight The amount of time taken to reach the insight Domain Value Domain Value The significance of the insight. Coded by a domain expert. The significance of the insight. Coded by a domain expert. Hypotheses Hypotheses Hypothesis and direction of research Hypothesis and direction of research Directed vs. Unexpected Directed vs. Unexpected Recall: participants are asked to identify questions they want to explore Recall: participants are asked to identify questions they want to explore Correctness Correctness Breadth vs. Depth Breadth vs. Depth

8 8 Insight Characteristics Category Category Overview – overall distributions of gene expression Overview – overall distributions of gene expression Patterns – identification or comparison across data attributes Patterns – identification or comparison across data attributes Groups – identification or comparison of groups of genes Groups – identification or comparison of groups of genes Details – focused information about specific genes Details – focused information about specific genes

9 9 Experiment Design A 3  5 between-subjects design A 3  5 between-subjects design between-subjects  different subjects for each pair between-subjects  different subjects for each pair Dataset: 3 treatments Dataset: 3 treatments Visualization tool: 5 treatments Visualization tool: 5 treatments

10 10 Experiment Design Participants Participants 2 participants per dataset per tool 2 participants per dataset per tool Have at least a Bachelor’s degree in a biological field Have at least a Bachelor’s degree in a biological field Assigned to tools they had never worked with before Assigned to tools they had never worked with before to prevent advantage to prevent advantage measure learning time measure learning time Categories Categories 10 Domain Experts 10 Domain Experts Senior researchers with extensive experience in microarray experiments and microarray data analysis Senior researchers with extensive experience in microarray experiments and microarray data analysis 11 Domain Novices 11 Domain Novices Lab technicians or graduate student research assistants Lab technicians or graduate student research assistants 9 Software Developers 9 Software Developers Professionals who implement microarray software tools Professionals who implement microarray software tools

11 11 Protocol and Measures Chose new users with only minimal tool training Chose new users with only minimal tool training Success in the initial usage period is critical for the tool’s adoption by biologists Success in the initial usage period is critical for the tool’s adoption by biologists Participants received an initial training Participants received an initial training Background description about the dataset Background description about the dataset 15-minute tool tutorial 15-minute tool tutorial Participants listed some analysis questions Participants listed some analysis questions Instructed to examine the data with the tool as long as needed Instructed to examine the data with the tool as long as needed They were allowed to ask for help about the tool They were allowed to ask for help about the tool Simulates training by colleagues Simulates training by colleagues

12 12 Protocol and Measures Every 15 minutes, participants estimated percent of total potential insight they obtained so far Every 15 minutes, participants estimated percent of total potential insight they obtained so far Finally, assessed overall experience with the tools during session Finally, assessed overall experience with the tools during session Entire session was videotaped for later analysis Entire session was videotaped for later analysis Later, all individual occurrences of insights were identified and codified Later, all individual occurrences of insights were identified and codified

13 13 Show me pictures Here are the tools!!!

14 14 Cluster/TreeView = ClusterView Cluster Cluster to cluster data to cluster data TreeView TreeView Visualize the clusters Visualize the clusters Uses heat-maps Uses heat-maps

15 15 TimeSearcher 1 Parallel Coordinate Visualization Parallel Coordinate Visualization Interactive Filtering Interactive Filtering Line Graphs for each data entity Line Graphs for each data entity

16 16 HCE Clusters data Clusters data Several Visualizations Several Visualizations Heat-Maps Heat-Maps Parallel Coordinates Parallel Coordinates Scatter Plots Scatter Plots Histograms Histograms Brushing and Linking Brushing and Linking

17 17 Spotfire General Purpose Visualization Tool General Purpose Visualization Tool Several Displays Several Displays Scatter Plots Scatter Plots Bar Graphs Bar Graphs Histograms Histograms Pie/Line Charts Pie/Line Charts Others… Others… Dynamic Query Sliders Dynamic Query Sliders Brushing and Linking Brushing and Linking

18 18 GeneSpring Suitable for Microarray data analysis Suitable for Microarray data analysis Shows physical positions on genomes Shows physical positions on genomes Array layouts Array layouts Pathways Pathways Gene-to-gene comparison Gene-to-gene comparison Brushing and Linking Brushing and Linking Clustering capability Clustering capability

19 19 Enough about Tools, Tell me the Results!!!

20 20 ClusterViewTimeSearcher 1HCESpotfireGeneSpring Number of Insights Spotfire: Highest number of insights Spotfire: Highest number of insights HCE: poorest HCE: poorest

21 21 Total Domain Value Spotfire: Highest insight value Spotfire: Highest insight value HCE, GeneSpring: poorer HCE, GeneSpring: poorer ClusterViewTimeSearcher 1HCESpotfireGeneSpring

22 22 Avg. Final Amount Learned Spotfire: high value in learning Spotfire: high value in learning ClusterView and HCE are poor ClusterView and HCE are poor ClusterViewTimeSearcher 1HCESpotfireGeneSpring

23 23 Avg. Time to First Insight ClusterView: very short time to first insight ClusterView: very short time to first insight TimeSearcher 1 and Spotfire are also quick TimeSearcher 1 and Spotfire are also quick ClusterViewTimeSearcher 1HCESpotfireGeneSpring

24 24 Avg. Total Time Total time users spent using the tool Total time users spent using the tool Low Values: Efficient or Not useful for insight Low Values: Efficient or Not useful for insight ClusterViewTimeSearcher 1HCESpotfireGeneSpring

25 25 Unexpected Insights HCE revealed several unexpected results HCE revealed several unexpected results ClusterView provided a few ClusterView provided a few TimeSearcher 1 for time series data TimeSearcher 1 for time series data Spotfire contributed to 2 unexpected insights Spotfire contributed to 2 unexpected insights Hypotheses A few insights led to hypotheses A few insights led to hypotheses Spotfire  3 Spotfire  3 ClusterView  2 ClusterView  2 TimeSearcher 1  1 TimeSearcher 1  1 HCE  1 HCE  1

26 26 Tools vs. Datasets

27 27 Insight Categories Overall Gene Expression Overall Gene Expression Overview of genes in general Overview of genes in general Expression Patterns Expression Patterns Searching patterns is critical Searching patterns is critical Clustering is useful Clustering is useful Grouping Grouping Some users wanted to group genes Some users wanted to group genes GeneSpring enables grouping GeneSpring enables grouping Detail Information Detail Information Users want detailed information about genes that are familiar to them Users want detailed information about genes that are familiar to them

28 28 Visual Representations and Interactions Although some tools have many visualization techniques, users tend to use only a few Although some tools have many visualization techniques, users tend to use only a few Spotfire users preferred heat-maps Spotfire users preferred heat-maps GeneSpring users preferred parallel coordinates GeneSpring users preferred parallel coordinates Lupus dataset: visualized best with heat-maps Lupus dataset: visualized best with heat-maps Most users preferred outputs of clustering algorithms Most users preferred outputs of clustering algorithms HCE not useful when a particular column arrangement is useful HCE not useful when a particular column arrangement is useful

29 29 Running out of time, So, wrap up Use a Visualization tool (that’s why we’re here!) Use a Visualization tool (that’s why we’re here!) Spotfire: best general performance Spotfire: best general performance GeneSpring: Hard to use GeneSpring: Hard to use Dataset dictates best tool! Dataset dictates best tool! Time Series data: TimeSearcher Time Series data: TimeSearcher Others: Spotfire, GeneSpring? Others: Spotfire, GeneSpring? Interaction is the key Interaction is the key Grouping and Clustering are necessary features Grouping and Clustering are necessary features

30 30 Critique In all fairness, measuring insights is really hard! Here are some possible issues In all fairness, measuring insights is really hard! Here are some possible issues Subjectivity Subjectivity Experiment relies on users always thinking aloud Experiment relies on users always thinking aloud Also, depends on a domain expert to evaluate insights Also, depends on a domain expert to evaluate insights Results may vary widely based on participants expertise (only two per tool-dataset pair) Results may vary widely based on participants expertise (only two per tool-dataset pair) Some insight characteristics are inherently subjective Some insight characteristics are inherently subjective Domain Value Domain Value Breadth vs. Depth Breadth vs. Depth

31 31 Critique How do one count insights? How do one count insights? Assumes honest reporting by participants Assumes honest reporting by participants Some insights may be of no great value Some insights may be of no great value What if a discovery just reaffirms a known fact? Is that an insight? What if a discovery just reaffirms a known fact? Is that an insight? Measuring time taken to reach an insight Measuring time taken to reach an insight Maybe instead of measuring from beginning of session we should measure from last insight Maybe instead of measuring from beginning of session we should measure from last insight


Download ppt "An Evaluation of Microarray Visualization Tools for Biological Insight Presented by Tugrul Ince and Nir Peer University of Maryland Purvi Saraiya Chris."

Similar presentations


Ads by Google